Why big data might be more about automation than insights

Despite all the talk about companies using big data to uncover insights, maybe automation is the real reason the world is so excited about big data. What makes the big data era so significant isn’t that people are using data to inform their decisions, but that there’s just too much data of too many different types. In many cases, keeping up isn’t so much a matter of changing mindsets as it is about getting better tools.

Last week, New York Times reporter Steve Lohr wrote about the possibility of a big data bubble forming because people rely too much on data at the expense of experience and intuition. It got me thinking about all the technologies and algorithms I’ve covered, about all the discussions I’ve had about why a data scientist is more than just a statistician who can write MapReduce jobs. Nearly everywhere, it seems to me (save for, as Lohr cites, unique uses such as algorithmic trading), big data really is less about replacing human intuition than it is about augmenting the human experience by making it easier, faster and more efficient.

Like the purpose-built robots that have revolutionized manufacturing, today’s methods for processing and analyzing data are fast, scalable and precise, but they don’t yet (in most cases) make our decisions. Big data can make life and business a lot more efficient, but for the time being, human judgment and willpower are still very much in control.

Offloading grunt work to the machines

We’ve recently covered some obvious examples of this. Take, for example, recent university research demonstrating how media researchers could use machine learning and natural-language processing to save themselves the work of manually reading and coding every piece of text they wish to analyze as part of a study. Algorithms — like robots in manufacturing — are doing the mindless, repetitive tasks of discerning subject matter, keywords and sentiment, but researchers are still the ones poring over those results and telling us what it all means.

A couple months ago, I spoke with Recommind CEO Bob Tennant about how attorneys are using software to pore through terabytes worth of electronic documents during the discovery process. Predictive coding, as it’s called, frees them up to focus more on case strategy than on the tedium of analyzing every single PDF and email message to figure out if it’s relevant to a case. However, he noted, although the software typically does a better job than a person alone would do, most law firms still use a hybrid man-machine approach to leverage the strengths of both and ensure nothing gets missed. And the software certainly doesn’t assess a document’s relative legal relevance in light of a case’s facts and craft an argument around it.

A screenshot of the Analyst Overview

A screenshot of the Analyst Overview

Even software products such as BeyondCore, which aim to minimize human involvement in the data analysis process as much as possible, are actually just about making business people more efficient. In this case, people are only integral to the first and final steps — selecting the metric with which they’re concerned and then interpreting the statistical correlations, respectively. The messy middle step of asking the right questions is (in theory) eliminated by software that analyzes all the possible correlations and scores and presents them accordingly.

In this sense, one of the better descriptions I’ve heard about actually using data in the corporate world came from ClickFox CEO Marco Pacelli, who compared it to figuring out which few of dozens of cockroaches to kill when the light comes on. Big data, like the flick of the light switch, can show people what’s really going on under the surface. But a smart executive still must figure out how to best solve the problem, capitalize on the opportunity or just put the situation into perspective.

Algorithms can only be so human

Of course, those examples are easy and largely ignore the world of really big data that exists on the web and presents its own its own challenges. Lohr, for example, citing Eli Pariser’s “The Filter Bubble: What the Internet Is Hiding From You,” noted a particular fear “that the algorithms that are shaping my digital world are too simple-minded, rather than too smart.” That’s an astute observation in a world of hyper-personalization, where one could easily find himself snowblind by the content, products, etc., he’s supposedly interested in, making it all the more difficult to gain visibility into the broader world.

But perhaps we’re just expecting the web to be smarter than it is and, really, smarter than any service built on the idea of scale probably should be. For example, web and mobile apps, ranging from Amazon Web Services (s amzn) to Instagram (s fb), are only able to automate processes for potentially billions of users because they offer fairly generic services (subscription req’d). Broadly applicable features and non-negotiable terms of service (however problematic) mean companies can focus on building great products rather than wasting time negotiating features and terms with every user.

You want data security or site reliability? Figure it out yourself or wait for your service provider to do it on its own time.

A sample interest graph from Gravity.

A sample interest graph from Gravity.

Why should personalization algorithms be any different? They can do a heck of a job automating the discovery of stuff we’re interested in, but creating a model intelligent enough to know when any given individual wants to — or needs to — view content outside their their typical interests could prove incredibly challenging for services that deliver personalization in part by identifying broad patterns in user behavior. It’s just not what they’re designed to do.

The web is an expansive place: If we as web users really don’t want to be slaves to algorithms and our usernames, maybe it’s up to us to log out, clear our caches and go do some anonymous digging.

Melding man and machine

That being said, the people tasked with creating the algorithms that power so many web services do seem to understand the need for human input in the model-building process, at least. Even machine learning — a term that conjures up images artificial intelligence and self-aware computer networks — is often just a tool to make data scientists’ lives easier through automation.

Smart data scientists knows know they can’t trust the machines alone, which is why companies doing everything from predicting the content you’ll like to predicting your credit risk have figured out how to make machines work for humans instead of replacing them. Yes, machine learning algorithms and big data technologies analyze a volume of data points that humans could never do, uncovering complex relationships the naked eye could never spot. But once the heavy lifting is done, humans come in and use their subject-matter expertise and logic to prune off bad connections, add context and maybe even inject a little serendipity into the final algorithms.

Whether it’s corporate business intelligence or the consumer web, though, all of this is about automation. Data-minded people have always used data to aid in decision-making without ignoring their instincts. Big data just lets them learn a lot more, a lot faster.

We’ll be talking a lot more about these issues and more at Structure: Data, from March 20-21 in New York, so feel free to mark your calendars. In the meantime, here’s a clip from last year’s event with lots of discussion about machine learning, including how humans will continue to play a role.
[protected-iframe id=”7e7d5edffc8c980b6c05e04243fd822c-14960843-25766478″ info=”http://cdn.livestream.com/embed/gigaombigdata?layout=4&clip=pla_4c8781fa-a80f-42d4-af95-5f539524ad0f&color=0xe7e7e7&autoPlay=false&mute=false&iconColorOver=0x888888&iconColor=0x777777&allowchat=true&height=350&width=604″ width=”604″ height=”350″ frameborder=”0″ style=”border: 0; outline: 0;” scrolling=”no”]

Watch live streaming video from gigaombigdata at livestream.com

Feature image courtesy of Shutterstock user Nataliya Hora.