Main menu

Background

SiteMap

Fri, 07/03/2009 - 21:06 -- cjadmin

RDFa (“Resource Description Framework in attributes”) is having its five minutes of fame: Google is beginning to process RDFa and Microformats as it indexes websites, using the parsed data to enhance the display of search results with “rich snippets.” Yahoo!, meanwhile, has been processing RDFa for about a year. With these two giants of search on the same trajectory, a new kind of web is closer than ever before. The web is designed to be consumed by humans, and much of the rich, useful information our websites contain, is inaccessible to machines. People can cope with all sorts of variations in layout, spelling, capitalization, color, position, and so on, and still absorb the intended meaning from the page. Machines, on the other hand, need some help. A new kind of web—a semantic web—would be made up of information marked up in such a way that software can also easily understand it. Before considering how we might achieve such a web, let’s look at what we might be able to do with it. Improved search Adding machine-friendly data to a web page improves our ability to search. Imagine a news story that says “today the prime minister flew to Australia,” in reference to Britain’s prime minister, Gordon Brown. The article might not call the prime minister by name, but it’s still pretty easy to ensure that this news story shows up when someone searches for “Gordon Brown.” If the news story in question dates from 1940, however, we wouldn’t want this document to appear when users search for “Gordon Brown”—but we would want it to appear when they search for “Winston Churchill.” To accomplish this using the same technique as the Gordon Brown example—i.e., by mapping one set of words to another—our search engine must know the start and end dates of the premierships of all British prime ministers, and then cross-reference those with the publication date of the newspaper article. This wouldn’t be completely impossible, but what if the article is a piece of fiction, or if it’s actually about the Australian prime minister? In these cases, a simple list of dates won’t help us. The indexing algorithms that try to deduce necessary context from the text are sure to improve in the coming years, but extra markup that makes information unambiguous can only make search more accurate.