Context-Sensitive Search Via Blogs

By now most of us are familiar with Google’s PageRank algorithm, or at least the principle behind it, whereby a web page is ranked based on who else is linking to it. One key aspect of blogs is that, while a few cover just about everything under the sun, most blogs have specific areas of focus, be it art, news, politics or what have you. Such information is potentially valuable in the context of search because a blog can announce its areas of focus — keywords, in effect — that can be taken into account by search engines, which would then know what topics a specific site tends to cover.

Using existing meta tags within HTML, it would be pretty easy to create a de facto standard in which tags are used to place a blog, as well as individual posts within it, into categories or sets. For example, I used to publish a site, Telephony Design, that was specifically about telecom products and services, which would have been tagged with keywords like telecom, telecommunications, telephone systems, phone, etc.

Imagine if most blog- and site-hosting services asked you to self-describe your site with up to a few dozen keywords. Of course, you can already do this with tags, but it’s unclear to what extent search engines use this information.

In this scenario, when searching via a search engine that recognizes your tags, you could issue a query like “gore (climate)” to get search results that are optimized based on link weights from sites and blogs that describe themselves as climate-related. This isn’t the same as saying “gore AND climate,” because someone who blogs at a climate site might write something about Al Gore that’s not, strictly speaking, climate-related. Essentially this is a way of searching for a topic, as ranked by people (primarily bloggers) who write on several topics or areas of focus.

This isn’t a new idea, of course, since “meta keyword=” has been with us since the earliest days of the web. The trick is to create a subtle variation on search query syntax in which you’re asking to, in effect, “Find X within sites that are usually about Y.” It’s kind of poor man’s approach to the semantic web, but if enough sites and blogs used it, and popular search engines introduced a simple way to filter or weight search results based on it, the method should work.

An important point is that you’re not using the “meta” tag to emphasize a keyword, so your site isn’t more likely to show up if I do a search on “climate.” Instead, what the tag says is that you usually blog about “climate,” among other things. The actual keyword search is based on content elsewhere in the page, so the meta tag is just used in describing a limit set of keywords the content is usually about. Another important point is that if spiders only recognize a limited number of these tags, maybe 20 or so per domain, it will be difficult to spam search engines by stuffing hundreds of tags in a page header.

Is this a Google killer? Hardly. It seems like the kind of thing that could be added to existing search engines, Google included, pretty easily. This might seem like a trivial thing, but it should make search a lot smarter without burdening webmasters with the need to comply with an overly complex semantic web approach. This is also a simple and easily learned query style, so just as users have learned to combine keywords to improve search accuracy, they can use this approach to narrow search results by the type of source, in what amounts to a kind of fuzzy boolean search.

When it came to web services, REST won out over SOAP because of its simplicity. I think the same thing could happen here. After all, this is something even a novice web master could do in a minute — all that’s needed are a few lines of HTML.