Creating order from digital chaos

In the modern era, information overload has become an even larger problem than information scarcity. Data is generated by the ton, and most of it is not remotely relevant or useful. The way we search has even created a gray market in this thin veneer of content, often referred to as “faux content.” Estimates vary but all point to evidence that a great percentage of the web today is simply manufactured sites created specifically to scoop up visitors in search of ad dollars. The effect isn’t just a nuisance, and makes sifting through the ever growing tons of online data even more confounding.

For instance, a search for something as simple as “How to build a brick wall” might bring up many pages with that exact phrase. Great. Fantastic. I look at the first link and see that its headlines, topic paragraphs and images are all about building a brick wall (or so it seems). I’m in luck! A click and then I start reading, and my heart sinks. The content is as thin as the piece of paper I would print it on (if it were worth the time).

What’s worse, it’s often written by someone from a low-cost offshore labor pool, in far-flung locales like Kenya or Haiti, who are hired by a content farm like Demand Media to write a page based specifically on that search phrase. Since search engines reveal their search queries, enterprising companies are able to reverse-engineer the search process and thus manufacture “content” to fulfill common (and uncommon searches) searches.

The back-link game, or the process by which websites can purchase inbound links — Google’s original secret sauce that generated results based on the “authority” of a web page — has become vital to generating superior search results, and the multibillion-dollar search-engine optimization industry is built on reverse-engineering the actual search algorithms for commercial gain. Only if you are actually going to build that brick wall would you understand that the directions offered by the guys on, say, This Old House, are actually the useful and practical ones. Others are just trying to get you to click on the link to see the ads they run, hoping that the clicks on their ads will generate enough revenue to offset the pennies they paid to create their original content. This three-card monte allows the site to monetize your attention with the ads it serves up on that page.

Even if the vast majority of visitors leave for a useful site, the few who click on a relevant ad (made more relevant by Google’s AdSense and others) make the investment in the faux content worthwhile. Ironically, on most of these pages the ads are actually the most relevant content, so the same tool that Google created to help content publishers sell ads on their pages has provided economic incentives to fund the proliferation of spam that clogs their own search results! Rich Skrenta, the CEO of the spam-free search engine Blekko, frames this de-evolution in an interesting way: “Today, the Web has become a tragedy of the commons, a social system ruled by spam — over 90 percent of URLs today are pure junk!”

Fortunately, there is a growing band of innovators who have taken up the challenge and are tackling those issues — with startlingly similar approaches. Their universal mission is to employ relevant, expert-based pattern recognition to generate a useful consumer outcome. So for instance in the way we discover music using a service like Pandora, the emerging forms of predictive modeling and expert recommendation architecture in the next wave of discovery engines will have enormous implications, and offer a systematic approach for extracting useful knowledge and wisdom buried within the cluttered world of big data.

For these passionate discovery engineers, the goal is not to find a needle in a haystack, but instead to present a haystack of needles, an array of potential valuable answers to a growing list of useful and impactful questions.

Note: Jim Hornthal is a Venture Partner with CMEA, and CMEA is an investor in Blekko. This partial excerpt is from the his TED ebook, “Haystack Full of Needles: Cutting Through the Clutter of the Online World to find a Place, Partner, or President.”

Photo courtesy of Flickr user Khouri