R. E. Kraut, S. R. Fussell, and J. Siegel. Visual information as a conversational resource in collaborative physical tasks. Human-Computer Interaction, 18:13–49, 2003. [pdf]
Visual information plays two interrelated roles in collaborative work: first it helps people maintin up-to-date mental models or situational awareness (Endsley, 1995) of the state of the task and others’ activities; second it helps people communicate about the task, by aiding conversational grounding (Clark, 1981). The authors’ assumption is that the usefulness of a video system for remote collaborative work depends on the extent to which the video configuration makes the same visual cues available to collaborators that they use when performing the task when co-located.
Communication media limits the visual information that can be shared, with resulting effects in the collaboration process and performance. To test the hypotheses raised by this intuition, the uathors conducted two experiments using a bike-repair task whern an expert was guiding a novice repairing a bike with various communication settings. The authors varied the presence of a visual channel and used a control condition given by face-to-face interaction.
Psysical tasks can be performed most efficiently when a helper is physically co-present. Having a remote helper leads to better performance than working alone, but having a remote helper is not as effective as having a helper working by one’s side. The visual information was valuable for keeping haware of the changing state of the task.
Communication was more efficient in the side-by-side condition, where the helper spent more time telling the worker what to do. In the mediated condition, not only the dialogues longer, but their focus shifted: more speaking turns are devoted to acknowledging the partners’ messages.
One of the limits of this study is that they used an head-mounted video camera to show to the worker the focus of attention of the helper. This might not give the right information as the worker’s camera view might still contain too much information to be effectively used.
The authors conclude with four implications for design that I report below:
(1) Provide people with a wide field of view, including both task objects and the wider environment, so that they can more easily maintain task awareness and ground conversations;
(2) Clarify what is part of the shared visual space. All parties to the task should have a clear understanding of what one another can see; that is, the contents of the shared visual space should be part of participants’ mutual knowledge or common ground;
(3) Provide mechanisms to allow people to track one another’s focus of attention. When people can see where each person is looking, it is easier to establish common ground;
(4) Provide support for gesture within the shared visual space. Talking about things is most efficient when people can use a combination of deictic expressions and gestures to refer to task objects.