New software that
responds to written questions by retrieving digital images has
potentially broad application, ranging from helping radiologists
compare mammograms to streamlining museum curators’ archiving of
artwork, say the Penn State researchers who developed the technology.
Dr. James Z. Wang, assistant professor in Penn State’s School of
Information Sciences and Technology and principal investigator, says
the Automatic Linguistic Indexing of Pictures (ALIP) system first
builds a pictorial dictionary, and then uses it for associating images
with keywords. The new technology functions like a human expert who
annotates or classifies terms.
"While the prototype is in its infancy, it has demonstrated great
potential for use in biomedicine by reading x-rays and CT scans as well
as in digital libraries, business, Web searches and the military," said
Wang, who holds the PNC Technologies Career Development Professorship
at IST and also is a member of the Department of Computer Science and
Engineering.
ALIP processes images the way people seem to. When we see a new
kind of vehicle with two wheels, a seat and a handlebar, for instance,
we recognize it as "a bicycle" from information about related images
stored in our brains. ALIP has a similar bank of statistical models
"learned" from analyzing image features.
The system is detailed in a paper, "Learning-based Linguistic
Indexing of Pictures with 2-D MHMMs," to be given today (Dec. 4) at the
Association of Computing Machinery’s (ACM) Multimedia Conference in
Juan Les Pins, France. Co-author is Dr. Jia Li, Penn State assistant
professor of statistics.
Unlike other content-based retrieval systems that compare features
of visually similar images, ALIP uses verbal cues that range from
simple concepts such as "flowers" and "mushrooms" to higher-level ones
such as "rural" and "European." ALIP also can classify images into a
larger number of categories than other systems, thereby broadening the
uses of image databases.
Other advantages include ALIP’s abilities to be trained with a
relatively large number of concepts simultaneously and with images that
are not necessarily visually similar.
In one experiment, Wang and Li "trained" ALIP with 24,000
photographs found on 600 CD-ROMs, with each CD-ROM collection assigned
keywords to describe its content. After "learning" these images, the
computer then automatically created a dictionary of concepts such as
"building," "landscape," and "European." Statistical modeling enabled
ALIP to automatically index new or unlearned images with the linguistic
terms of the dictionary.
Wang tested that dictionary with 5,000 randomly selected images to
see if the computer could provide meaningful keyword annotations for
the new images. His conclusion: The more specific the query for an
image, the higher the system’s degree of accuracy in retrieving an
appropriate image.
Wang and Li are using ALIP as part of a three-year National Science
Foundation research project to develop digital imagery technologies for
the preservation and cataloguing of Asian art and cultural heritages.
This research aims to bypass or reduce the efforts in the
labor-intensive creation and entry of manual descriptions or artwork.
Eventually, the system is expected to identify the discriminating
features of Chinese landscape paintings and the distinguishing
characteristics of paintings from different historical periods, Wang
notes.
The researchers’ progress in the first year of that project is
discussed in the paper, "Interdisciplinary Research to Advance Digital
Imagery Indexing and Retrieval Technologies for Asian Art and Cultural
Heritages." The research will be presented on Dec. 6 at in a special
session of ACM’s Multimedia Conference in France.
Further research will be aimed at improving ALIP’s accuracy and speed.
ALIP’s reading of a beach scene with sailboats yielded the keyword
annotations of "ocean," "paradise," "San Diego," "Thailand," "beach"
and "fish." Even though the computer was intelligent enough to
recognize the high-level concept of "paradise," additional research
will focus on making the technology more accurate, so that San Diego
and Thailand will not appear in the annotation of the same picture,
Wang says.
"This system has the potential to change how we handle images in
our daily life by giving us better and more access," Wang says. Wang
and Li’s latest research builds on their earlier efforts at Stanford
University. Sun Microsystems provided most of the equipment used in the
project.
|