T. Tan, J. Chen, P. Mulhem, and M. Kankanhalli. Smartalbum: a multi-modal photo annotation system. In MULTIMEDIA ’02: Proceedings of the tenth ACM international conference on Multimedia, pages 87–88, New York, NY, USA, 2002. ACM. [PDF]
Applications supporting annotation of pictures with voice: SmartAlbum (Tan et al., 2002) that unifies two indexing approaches, namely content-based and speech, Chen et al. (2001) proposed the use of a structural speech syntax to annotate photographs in four different fields, namely event, location, people, and date/time. Show&Tell (Srihari et al., 1999), which uses speech annotations to index and retrieve both personal and medical images, and FotoFile (Kuchinsky et al., 1999), which extends annotation to a more general multimedia object.
This demonstration presents a novel application (called SmartAlbum) for photo indexing and retrieval that unifies two different image indexing approaches. The system uses two modalities to extract information about a digital photograph; i.e. content-based and speech annotation for image description. The result is a powerful image retrieval tool that has capabilities beyond what current single-mode retrieval systems can offer. We show on a corpus of 1200 images the interest of our approach.