Underwood, G., Jebbett, L., and Roberts, K. Inspecting pictures for information to verify a sentence: Eye movements in general encoding and in focused search. The Quarterly Journal of Experimental Psychology 1, 57A (2004), 165–182. [pdf]
This article sheds some light on the following question: when we see combinations of text and graphics, such as photographs and their captions in printed media, how do we compare the information in the two components? The author employed a sentence verification task in which they used the subject to observe a picture with a caption and decide whether the sentence was correcly describing the scene.
They interpreted longer fixations as an indication of more difficult processing. The characteristic inspection pattern or scanpath started with a fixation near to the center of the picture. Within three fixations, typically, their eyes would saccade to the sentence, and they then read the sentence completely before inspecting the picture and made the decision immediately following this second visit to the picture (p.173).
The participants moved their eyes a number of times between the pictures and the sentence, but the decision of the validity of the sentence was taken not while reading the sentence but while viewing the picture. Sentences attracted more fixations than pictures.
The author discussed briefly also the priming effect (Sanocki and Epstein, 1997): the perception of a scene can be facilitated by prior presentation of a priming scene that makes the layout available early.
In the discussion, the authors question the assumption that performance of the sentence verification task requires the construction of a comparable abstract prepositional forms from the sentence and the picture. Larkin and Simon (1987) have argued that although they may contain the same information, the processing operations required to extract the information will not necessarily be equivalent: pictures and diagrams have advantages over textual descriptions. This ease of recognition of relationship from a picture was not reflected in fixation durations. Therefore the author conclude that the richer representations of information in pictures require extensive encoding durations which are comparable to the encoding of information from text.