Speech-based annotation and retrieval of digital photographs

T. J. Hazen, B. Sherry, and M. Adler. Speech-based annotation and retrieval of digital photographs. In Proceedings of INTERSPEECH 2007, the 8th Annual Conference of the International Speech Communication Association, pages 2165–2168, Antwerp, Belgium, August 27-31 2007. [PDF]


This paper presents an application for supporting pictures retrieval on mobile phones using voice annotations. The authors’ basic assumption is that speech is more efficient than text for operating a mobile device and in general more efficient for conveying complex properties.

The core of the application they proposed is a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars.

The paper present the evaluation of the application with a combination of a field deployment and a lab study where participants were asked to retrieve a set of pictures which were captured by themselves or by other participants. The retrieval was measures as the number of succesfull attempts to retrieve with the first query and whithin 5 queries.

Results indicated that users’ knowledge of the subject matter of the photographs was not playing a role in the retrieval process.

