This paper describes the Mobile Media Metadata System, which creates automaticaly a description of the content of pictures taken with SmartPhones from the context in which the picture was taken. Two features of the system are of particular interest: a) the annotation of the images is collaborative; b) the contextual metadata is built from the social interaction the user has with his/her peers.
In this context one of the most interesting aspect of this study is that a mean for an image can be taken from the way a picture is shared and reused in the group. For instance it is possible to infer the social context if the peers happens o be in the same cell at the same time and use this information in combination with others to know if the one of the peer is likely to be in the picture.
Again, is possible to infer spatial context from the cell id and time stamp of a picture. For a pedestrian, for example, is likely that two pictures represents the same area if these are taken with a couple of minutes of difference to one another.
Unfortunately the article does not go in details of the implemented algorithm. So it is not possible to know which strategies were used and how much to rely on those. Certainly they did not implement and machine learning algorithm.
Interesting link to the work of Toyama from which they took the idea of bootstraping contextual information from the header of the JPEG image.