If a name is ambiguous and given without context, humans struggle to understand the meaning, so you can imagine the struggle computers have.
When Germans read the last name "Merkel" without context, they will not know if it refers to the Chancellor of Germany Angela Merkel or tsoccer coach Max Merkel. And
If it is a drawback for people, that means it is an even bigger drawback for web search engines. Programs capture character strings like "Angela Merkel" but may not pay attention to attributes like "German Chancellor" or "Germany's First Lady" at all, and common auto-fill settings means that after the word "Merkel" is entered, search engines provide information about a lot of people with the same last name, starting with the most popular first.
Researchers at the Max Planck Institute for Informatics have now developed a program that enables accurate disambiguation of named entities by analyzing them with the help of Wikipedia, which obviously brings up its own limitations. Their software named AIDA establishes connections between the mentions in the text and potential persons or places.
"The more references exist between a mention and a specific person in Wikipedia, the more words of the person's Wikipedia article can also be found in the input text, and the higher the score the mention-entity edge receives. AIDA checks this score and selects the mention-entity edge with the highest score as the accurate mapping," explains Johannes Hoffart, who co-developed AIDA at the Max Planck Institute for Informatics.
This could be a problem with an entry like Science 2.0, where the entry is not only not right, it isn't even right enough to be wrong, but it is proof of concept.
To demonstrate their novel technique, the researchers have created a search engine based on their approach, which makes it possible not only to combine the search for strings with the search for specific objects like persons and locations, but also to search on categories.
In this way, the search for "Angela Merkel + phone call + Ukrainian politicians" results in texts dealing with the German Chancellor within the context of Ukrainian politicians like "Yulia Tymoshenko" and the string "phone call". Currently the researchers use AIDA to analyze the text corpus of the German National Library to combine the search for keywords with the search for specific objects. "The search results are more precise this way," Hoffart says.
"With our new technique we can not only build better search engines, but also make computers understand texts almost as a human does, in an efficient way," explains Gerhard Weikum, Scientific Director at the Max Planck Institute for Informatics in Saarbrücken.
The approach also opens new possibilities for automatically generated recommendations and the analysis of datasets, says Weikum. "Whoever is a fan of the soccer coach Merkel will receive recommendations for his books. Those more interested in the Chancellor get referred to books dealing with her and her way of governing Germany."