The semantic web is the vision of a web of interconnected data and meaning. This global web of knowledge would be something computers could understand and therefore provide us with a new frontier of information retrieval and intelligent agents.
After two decades of failed attempts, semantic web has become a dirty word with investors and consumers. So what exactly went wrong? Why are we still so far away from the web of data? Here’s my take on it.
The web of Obsoledge
Most attempts at creating a knowledge repository have involved converting “expert knowledge” into a web of data. The result is an inherently boring web of data. Google’s Knowledge Graph promotional video is a great example of how boring this web can be. “Let’s say you’re searching for Renaissance Painters”…. Really? Who searches for that?
More accessible technology is causing an explosion of information. This has the effect of making the shelf-life of knowledge shorter and shorter. Alvin Toffler has – in his seminal book Revolutionary Wealth – coined the term Obsoledge to refer to this increase of obsolete knowledge.
If we want to create a web of data we need to expand our definition of knowledge to go beyond obsolete knowledge and geeky factoids. I really don’t care what Leonardo DaVinci’s height was or which Nobel prize winners were born before 1945. I care about how other people feel about last night’s Breaking Bad series finale. How did they find the ending? What other series or movies might I enjoy based on those experiences?
We are living in the Now. The Now is eating ever greater quantities of our attention. It’s drowning out the obsolete past. Human attention, sentiment and emotion are key elements to today’s information age. They cannot be ignored. They need to be at the very core of any web of data.
Documents are dead
Deriving structured information from Wikipedia documents – a common practice – is fundamentally flawed. Not only does this create a web of boring facts, it assumes that documents are the source of knowledge somehow. They’re not. They are only a small sliver of the stuff that matters. And it’s the underlying conversation and activity that matters.
There is a sea change happening in the web and how we use it. It?s an evolution to a second phase of the web – the real-time web, or what I call the “Stream.” In the Stream, the focus is on messages not web pages. These vast amounts of messages are generated by social interaction, by conversation, by attention, by ideas, by little chunks of thought unleashed into a gigantic stream of data.
This also changes the way machines communicate with each other. Machines are still programmed by humans, and humans – especially programmers – are going to be lazy. They will use the easiest most pragmatic way to get machines to communicate. They aren’t going to spend days learning complicated RDF or OWL specs. They will use simple communication using JSON. And all the cool kids have abandoned XML.
Information should be pushed, not pulled
One less obvious problem is one of information retrieval. For the past two decades we’ve gotten so used to keyword search that google became an actual verb. Unfortunately, keyword search is now fundamentally broken. The more information is out there, the worse keyword search performs.
Advanced query systems like Facebook’s Graph Search or Wolfram Alpha are only marginally better than keyword search. Even conversation engines like Siri have a fundamental problem. No one knows what questions to ask.
We need a web in which information (both questions and answers) finds you based on how your attention, emotions and thinking interconnects with the rest of the world.
Meet the synaptic web
Keyword search is broken and we’re drowning in an unstoppable stream of information. The need for a next generation of information retrieval is now higher than ever. Is the semantic web going to be that next paradigm? I don’t think so. Not unless we radically revisit what a ‘web of data’ means.
It’s time to ditch the old paradigms of documents, knowledge and keyword search. We live in a world of big data, real-time streams and human emotions. It’s time for a revolution in information retrieval. We need a web that’s dynamic and centered around humans. A web in which data flows in a smarter way. A web that understands you and makes the proper data find you. This web doesn’t look like a database or a graph. It’s a web that’s intelligent, dynamic and sometimes chaotic. It’s the digital equivalent of the human brain. I call it the Synaptic Web.
Dominiek ter Heide is the CTO and co-founder of Bottlenose, which combines big data technologies with specialized data mining to make sense out of streams.