Do People Understand Spatial Concepts: The Case of First-Order Primitives

Golledge, R. G. Theories and Methods of Spatio-Temporal Reasoning in Geographical Space, vol. 639 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, Germany, 1992, ch. Do People Understand Spatial Concepts: The Case of First-Order Primitives, pp. 3–21. [pdf]

———

The author hypothetize in this paper that most individuals develop only a “common sense” configurational understanding of spatial phenomena, which accounts for incomplete and fuzzy cognitive representations of environments, and partly accounts for many spatially irrational behaviors. The author explain how human possess different degrees of spatial abilities like the ability to think geometrically, the ability to image complex spatial relations, the ability to recognize spatial patterns, etc, all of which are task dependent. Reading a map requires skills in symbol identification and orientation.

The paper reports an experiment where the author wished to find out if people become aware of functional distributions (e.g., shops, schools) and their spatial properties when asked to learn about an environment.

Results shows that even simple first order geographic primitives such as the idea of pair proximity or nearest neighbor, is not necessarily well understood in the complex map situation tested. More in details, the two distributions were not regarded as being similar, and that even performing common tasks on each distribution produced significant differtent results. No difference even between geographers and non-geographers.

One reason for a lack of good performance on the cue location reproduction task for example might be that people regionalized the initial map and that this interfered with their ability to comprehend the functional distribution as a single entry.

Golledge Map-Reading

the way to pilgrimage

…it is not surprising that the “Way to Santiago” has been sometimes considered as an example of the Church’s pilgrimage on its journey towards the heavenly city. It is a path of prayer and penance, of charity and solidarity; a stretch of the path of life where the faith, becoming history among mankind, also converts culture into something Christian. The churches and abbeys, the hospitals and shelters of the Way to Santiago still speak of the Christian adventure of making pilgrimage in which the faith becomes life, history, culture, charity and works of mercy.

John Paul II, Pastoral Journey to Santiago de Compostela and Asturias on the occasion of Fourth World Youth Day, 1989.

Grounding spatial language in perception: An empirical and computational investigation

Regier, T., and Carlson, L. A. Grounding spatial language in perception: An empirical and computational investigation. Journal of Experimental Psychology: General 130, 2 (2001), 273–298. [pdf]

——–

This paper presents an empirical validation of the attention vector sum model (AVS), which predict zone of acceptability for spatial expressions confronted to perceptual stimuli. Their research question was the following: What perceptual or cognitive structures are reflected in linguistic judgments? Does spatial perception shape spatial languagein this instance?

The author presented a number of computational models developed in previous research as well as the AVS: The Bounding Box model, the Proximal and Center of Mass model, The Hybrid Model. AVS is based on the fact that human apprehension of spatial relations involves attention, and that in several neural subsystems, overall direction is represented as the vector sum od a set of a constituent directions.

In the model, an attentional beam is focused on the landmark. In particular, the beam is focused on that point of the landmark top that is vertically aligned with the trajector or closest to being so aligned. Parts of the landmarks near the center of this beam are strongly attended, whereas more distant parts of the landmark receive less attention. This yields a distribution of attention across the landmark object. The attentional beam radiates out to illuminate different parts of the landmark at different strenghts, depending on the distance from the focus.

The authors tested empirically the accuracy of predictions of the different models by presenting the model with the experimental stimuli, recording the model’s output, and determining through the regression how well the model output predicted the empirically obtained acceptability rating.

The study confirmed the predictions of the AVS model: first, spatial terms ratings are influenced by the proximal and center-of-mass orientations. Second, ratings are sensitive to the grazing line (the horizontal line grazing the very top of the landmark). Third, ratings are affected by distance. The model provides a preliminary grounding of linguistic spatial categories in nonlinguistic perception: linguistic spatial categories can be explained in terms of underlying structures that are not linguistic in character.

Regier Avs

Grounding language in perception: from “saying” to “saying and acting”

Coventry, K. R., and Garrod, S. C. Saying, Seeing and Acting: the Psychological Semantics of Spatial Prepositions. Psychology Press, East Sussex, Great Britain, 2004, ch. Grounding language in perception: from “saying” to “saying and acting”, pp. 37–70.

———

This chapter opens with the argument that it is extremely difficult to pin down what expressions like “higher than” mean because natural language only encode a limited number of spatial relations between objects and these have to cover the whole range of possibilities. The authors propose a ‘functional geometric framework’ that enables to better comprehend spatial prepositions because it involves both geometric constraints and extra-geometric constraints.

The chapter first reviews the major contributions to the modelization of geometry of spatial relations. Cohn et al. (1996) developed a qualitative geometry of space called the region connection calcolus. Ullman (1996) argued that perceptual processing requires visual routines different from the basic process of basic vision. Visual routines serve functional perception. They are optimal and subject to attention control.

Logan and Sadler (1996) claim that spatial templates underlie the apprehension of spatial relations and spatial prepositions. A template is a representation that is centered on the reference object and aligned with the reference frame imposed on the reference object.

Coventry Template

Regier and Carlson (2001) developed the attention vector sum model, a computational model to compute the relations captured by the spatial template theory. AVS takes into account the tole of attention in determining a spatial relation and has much the same character as one of Ullman’s visual routines. The model works by focusing an attentional beam on the reference object at the point that is vertically aligned with the closest part of the located object. Parts of the reference object nearest to the located object are masimally attended and more distant parts are attended less.

Coventry Avs

Finally the authors stress out the importance of including extra-geometric relation in order to ammeillorate the AVS framework. The spatial prepositions refer to the position of objects in space but also to tha content of a spatial preposition contributes to the meaning of an expression (the contained object moves toward the container).

The functional geometric framework aims to capture the representation of spatial relations not just in terms of how viewers see such relations, but also in terms of how they act on the world they see, and in terms of how objects meaningfully interact in that world. In synthesis the authors argues that what objects are fundamentally influences how one talks about where they are located.

[More]

Human performance on visually presented travelling salesperson problems with varying numbers of nodes

Dry, M., Lee, M. D., Vickers, D., and Hughes, P. Human performance on visually presented travelling salesperson problems with varying numbers of nodes. The Journal of Problem Solving 1, 1 (2006), 20–32. [pdf]

————

This article shows that despite the apparent intractability of the TSP (Traveling Salesman Problem), research into human performance upon visually presented TSPs has indicated that participants are capable of solving the problems to nes optimal accuracy with minimum cognitive effort.

Participants appear to spend a roughly contant time per node, implying the the total time required to arrive to a solution is a linear function of the number of nodes. Second, there are many algorythms that yield approximate solutions to TSP instances. However, no known algorithm predicts a simple linear relationship between solution time and number of nodes.

The authors presented random problems to participants. A white stimuly with black dots. The goal was to join the dots in such a way that the path was closed and that the path lenght was minimal.

Using an optimal solution as benchmark, it was possible to measure the mean participant deviation from optimality for each problem in the TSP condition. Participants solution lenght closely approximate the estimated optimal solution lenghts, with deviation asymptoting around 0.11.

Author discuss that perception of the convex hull (a boundary so that no line joining any two nodes in the array can fall outside it) is a form of figure-ground segregation. They also argue that we need to take a distinction between the spontaneous parallel processes involved in the initial perception of the structure in TSP arrays and the serial processes of linking individual nodes or clusters of nodes.

Dry Tsp Experiment

building instructions and user mistakes

Yesterday, I had a nice discussion with Mark on how to build effective building instructions. I am a great fan of LEGO bricks and I am used to read their building instructions that represents the model in a three-dimensional perspective and show at each step the parts that need to be included.

Mark was highlighting that instructions should prevent user’s mistakes before they arise. Many times it happened to me that a wrong interpretation of the drawing brought me to a mistakes that had consequences and was detected only a couple of steps afterwards. Then the way to fix is to go back those steps and try again. However, this is something possible with LEGO bricks. When the building parts are made of wood and iron then a mistake can have permanent consequences. In these cases, preventing a mistake is even more important.

IKEA does a great work in designing readable building instructions for their furniture elements. I remember spotting many times details added to prevent mistakes. In the example below, the user is cautioned to align the two parts in such a way that the right hole is filled with the metallic screw.

Ikea Instruction-Mistakes

More examples here.

Where am I looking? The accuracy of video-mediated gaze awareness

Gale, C., and Monk, A. F. Where am I looking? The accuracy of video-mediated gaze awareness. Perception and Psychophysics 3, 62 (2000), 586–595. [pdf]

———-

The experiments reported in this paper demonstrate that full gaze awareness is possible with sufficient accuracy to be used as a resource in face-to-face and video-mediated communication. Knowledge of what someone is looking at is used habitually in everyday life. The estimators were still very accurate when they could not see the hand-and-eye movement to the gazed-at object.

The next step in this research would be to demonstrate that a video configuration that include a single view of the face and the objects being gazed rather than these things on their own.

Participants worked in pairs, with one person gazing at a flat horizontal stimulus between them. The other participant estimated where the gazer was looking. Experiment 1 used linear scales as gaze targets. The mean root mean square error of estimation equates to 3.8 degrees of head-and-eye pan and 2.6 degrees of tilt. This small error of estimation was essentially the same in a video-mediated condition and in one in which a procedure that did not allow the estimator to see the head-and-eye movement to the target position was used. Experiment 2 obtained comparable gaze estimation performance in face-to-face and video-mediated conditions, using a combined pan-and-tilt grid. It is concluded that people are very good at estimating what someone else is looking at and that such estimations should be practical during video-mediated conversation.

Spatial Perspective in Descriptions

Tversky, B. Language and Space. MIT Press, Cambridge, MA, USA, 1996, ch. Spatial Perspective in Descriptions, pp. 463–491.

————

Thaking other points of view is essential for a range of cognitive functions and social interactions, from recognizing an object from a novel point of view to navigating an environment to understand someone else’s position. This chapter tries at first to reconciliate different perspectives from different disciplines on spatial perspectives.

There are three bases for spatial reference: the viewer, other objects and external sources. These three bases seems to correspond to deictic, intrinsic and extrinsic uses of language. An interesting point is that deictic uses cannot be accounted for by the language alone. They require additional knowledge of the interactional situation in which they are produced.

Depending on the complexity of the task, the speaker can decide to take his own perspective, the perspective of the addressee or a neutral perspective, using a landmark, referent object, on the extrinsic system as a basis for the spatial reference.

The world is multidimensional but speech is linear. To describe the world linearly, it makes sense to choose an order. A natural way of conveying an environment is through a mental tour. These tours can differenciate between gaze tours and walking tours. In gaze tours the noun phrases are usually headed by objects and the verbs express states. In a walking tour, the noun phrases are headed by the addressee and the verbs express actions.

The choice of the perspective taken and the particular trategy chosed to encode the spatial situation will depend closely by the number of mental transformations required to produce or to understand an utterance. It stands to reason that speakers would avoid cognitively difficult tasks.

The chapter also report an interesting study on map descriptions. After learning a map, subjects were asked to describe them from memory. Two possible descriptions were found: a route description takes the reader on a mental tour. It uses a changing view and locates landmarks in respect to the addressee. A survey description, in contrast, takes a static view from above the environments and locates landmarks with respect to each other.

Going beyond perspective is critical to spatial cognition.

Perspective Taking and Ellipsis in Spatial Descriptions

Levelt, W. J. M. Language and Space. MIT Press, Cambridge, MA, USA, 1996, ch. Perspective Taking and Ellipsis in Spatial Descriptions, pp. 77–107. [pdf]

———-

This chapter recalled the distinction between macroplanning and microplanning. In macroplanning we elaborate our comunicative intention, selecting information whose expression can be effective in revealing our intentions to a pratner in speech. We decide on what to say, linearizing what goes first and next. In microplanning, or “thinking for speaking”, we translate the information to be expressed in some kind of “prepositional” format, creating a semantic representation, or message, that can be formulated. Applied to spatial discourse, we can say that macroplanning involves selecting referents, relata, and their spatial relations for expression. Instead, microplanning consists of applying some perspective systems that will map spatial directions/relations onto lexical concepts.

This chapter contains also a discussion of advantages and disadvantages of deictic, intrinsic and absolute systems for spatial reasoning. For instance deictic perspective is safe from conversness and transitivity mistakes.

Finally the paper offer a good example of visual patterns used to test perspective taking. The author hilights critical moves that leads to potential encoding problems.

The first part of this paper reviews some major properties of three perspectives language users can take in mapping spatial relations onto linguistic expressions, the deictic, intrinsic and absolute systems. Although the intrinsic system is most widely used in the languages of the world, its mathematical properties are a hindrance to spatial reasoning (it lacks transitivity and converseness). The second part of the paper explores whether ellipsis in spatial expressions (for instance in “you go right to a church and then to a bridge”) precedes or follows perspective taking. An analysis of Levelt-type network description data reveals that ellipsis is pre-perspective taking, which is a non-Whorfian conclusion.

Levelt Perspective-1