Projects > Media Annotation > Story Picturing Engine
Did you ever have a story, lecture, or essay that you wanted to add images to but could not find images on the topic? The system allows a user to input a story or other words that they want images to be applied to. It then processes the story and pulls out keywords. Images are pulled from the image database that relate to those keywords. These candidate images then go through a pairwise image comparison and reinforcement ranking so that the most relevant images are outputted first.
This story picturing engine has the possibility to be very useful especially in education. Professors could input their lecture notes and get related images to provide their students with a visual representation of the topic. Researchers may use the system simply to increase the readability of their project and students to show that they fully understand the subject. Many times those in education simply do not have the time to go searching for images or they are not successful in their search. With the story picturing engine, relative images can be found quickly and easily to illustrate a story.
The project has been conducted by Dhiraj Joshi (Ph.D. candidate), James Z. Wang (faculty member), and Jia Li (faculty member) of The Pennsylvania State University.
In this project, we present an approach towards automated story picturing based on mutual reinforcement principle. Story picturing refers to the process of illustrating a story with suitable pictures. In our approach, semantic keywords are extracted from the story text and an annotated image database is searched to form an initial picture pool. Thereafter, a novel image ranking scheme automatically determines the importance of each image. Both lexical annotations and visual content of an image play a role in determining its rank. Annotations are processed using the Wordnet to derive a lexical signature for each image. An integrated region based similarity is also calculated between each pair of images. An overall similarity measure is formed using lexical and visual features. In the end, a mutual reinforcement based rank is calculated for each image using the image similarity matrix. We acknowledge NSF for the funding and equipment support, and Q.-T. Luong and AMICO for their help.