What happens when too few databases become too many databases?

So here’s some irony for you: For years, Andy Palmer and his oft-time startup partner Michael Stonebraker have pointed out that database software is not a one-size-fits-all proposition. Companies, they said, would be better off with a specialized database for certain tasks rather than using a general-purpose database for every job under the sun.

And what happened? Lots of specialized databases popped up, such as Vertica (which Stonebraker and Palmer built for data warehouse query applications and is now part of HP). There are read-oriented databases and write-oriented databases and relational databases and non-relational databases … blah, blah, blah.

The unintended consequence of that was the proliferation of new data silos, in addition to those already created by older databases and enterprise applications. And the existence of those new silos pose a next-generation data integration problem for people who want to create massive pools of data they can cull for those big data insights we keep hearing about.

Meet new-style ETL

In an interview, Palmer acknowledged that we’ve gone from one extreme — not enough database engines —  to too many options, with customers getting increasingly confused. And in that complexity, there is opportunity. And Palmer’s latest startup with Stonebraker — called Tamr — as well as other young companies like ClearStory, Paxata, Trifacta are attacking the task of cleaning up data in a process traditionally called Extract Transform Load or ETL.

Tamr combines machine learning smarts with human subject matter experts to create what Palmer calls a sort of self-teaching system. The startup is one of seven winners of our Structure Data Awards will be on hand at the Structure Data event next month to discuss the new era of ETL along with other trends in data.

The data sharing economy

As more companies share select information with supply chain and other trusted partners, ensuring that key data is clean will become more important. According to a new Accenture survey of 2,000 IT professionals, 35 percent of those surveyed said they’re already using partner APIs to integrate data and work with those partners while another 38 percent said they plan do that.

Per the survey:

One example is Home Depot, which is working with manufacturers to ensure that all of the connected home products it sells are compatible with the Wink connected home system – thereby creating its own connected home ecosystem and developing potential new services and unique experiences for Wink customers.

And, 74 percent of those respondents said they are using or experimenting with new technologies that integrate data with digital business partners.  Also from the Accenture report:

 “Rapid advances in cloud and mobility are not only eliminating the cost and technology barriers associated with such platforms, but opening up this new playing field to enterprises across industries and geographies.”

As the velocity and types of data flowing to and from applications increases “old style careful ETL curation doesn’t work anymore but [the data] still needs to be cleansed and prepped,” said Gigaom Research Director Andrew Brust.

In other words big data is big, no doubt. But in some cases, the old adage “Garbage In, Garbage Out” holds true even in the era of big data. If you really want the best insights out of the information you have, getting that data cleaned and spiffed up, can be a very big deal.