Nov. 9, 2006 — A computer able to label digital images in real time, as they are uploaded to the Internet, could greatly improve a photographer's ability to share, organize and retrieve images.
And while the initial market may be the digital camera consumer, others could benefit, too, including museum and gallery curators and scientists needing to organize everything from satellite to pathology images.
The software is called ALIPR (for Automatic Linguistic Indexing of Pictures—Real Time), and it was developed by James Wang and Jia Li, associate professors at Pennsylvania State University in University Park.
Wang and Li knew what anyone knows who used an online search engine to scout for an image: The results are often ranked according to the file name, for example "sunset.jpg."
However, the name of a photo doesn't always provide information about the content. "Sunset.jpg" could be an ordinary dusk vista, or it could be a picture taken on New Year's Eve of the sunset over the snowcapped peak of Mount Washington, with your friend Mary standing in the foreground.
Photo management software does allow users to annotate images with text labels, known as tags, which describe the image in more detail. So for "sunset.jpg," a person might attach words such as "Mount Washington," "winter," "Mary," and "New Year's Eve."
Tags allow the user to later search for all images of "Mount Washington," or "Mary."
But labeling each image is a tedious, time-consuming process that only the most dedicated digital photographers bother with.
"Nobody wakes up saying. 'Today I'm going to annotate all of my images. It's gonna be great,'" joked Mor Naaman, team lead for the Media in Context group at Yahoo! Research Berkeley in Berkeley, Calif.
But as photo management software and file sharing Web sites such as Flickr.com, which allows people to upload and vote on images, become more popular, the need for annotation grows.
"We see on sites like Flickr that there is a big community that revolves around those tags," said Naaman. "Flickr brings out the social and artistic aspects of photography. The photos gets viewed and commented on and float to the top of popularity."
His team is working on a prototype called Zonetag, which uses GPS and cell phone tower data to annotate cell phone images based on location.
ALIPR annotates images based on content.
First, it has to learn to recognize the meaning of a tag before suggesting the correct labels.
As part of the learning process, the researchers feed the computer hundreds of images of the same topic, for example "sunset." The computer analyzes the pixels and extracts information related to color and texture. It then stores a mathematical model for "sunset" based on the cumulative data.
Later, when a user uploads a new picture of a sunset, the computer compares the pixel information from the pre-computed models in its knowledge base. In just about a second, ALIPR suggests a list of 15 possible tags. The user has only to check off those appropriate.
"About half the time, the computer's first tag out of the top 15 tags is correct, and a vast majority of images have at least one correct tag," said Wang.
If ALIPR does not suggest a helpful tag, the user can type a word in a space provided, improving the computer's ability to identify tags for future images.