Washington, Nov 2: US researchers have successfully taught computers to interpret images using a
vocabulary of up to 330 English words.
The
development is a boon for millions of Internet users, as the system can
be used to automatically annotate entire online collections of
photographs that are uploaded. This means that internet users can save
significant time as otherwise they have to manually tag or identify
their
images.
The system will also help
major search engines
as they currently rely on uploaded tags of text to describe images.
There is a huge collection of photographs on the web which are not
annotated.
The
computer can now describe a photograph of two polo players, for instance, "sport," "people," "horse," "polo."
The
system called ALIPR system-Automatic Linguistic Indexing of
Pictures-Real Time provides text tags to images making those untagged
images visible to Web users.
ALIPR does this by analyzing the pixel content of images and comparing that against a stored
knowledge base
of the pixel content of tens of thousands of image examples. The
computer then suggests a list of 15 possible annotations or words for
the image.
"By inputting tens of thousands of images, we have
trained computers to recognize certain objects and concepts and
automatically annotate those new or unseen images. More than half the
time, the computer's first tag out of the top 15 tags is correct," said
James Wang, associate professor in the Penn State College of
Information Sciences and Technology, and one of the
technology's two inventors.
However
the system also has a limitation as computers trained with their
algorithms have difficulties when photos are fuzzy or have low contrast
or resolution; when objects are shown only partially; and when the
angle used by the
photographer presents an image in a way that is different
than
how the computer was trained on the object. More training images as
well as improving the training process may reduce these limitations.
The
system is described in a paper, "Real-Time Computerized Annotation of
Pictures," and authored by Jia Li, associate professor, Department of
Statistics, and Wang.