What happens when too few databases become too many databases?

So here’s some irony for you: For years, Andy Palmer and his oft-time startup partner Michael Stonebraker have pointed out that database software is not a one-size-fits-all proposition. Companies, they said, would be better off with a specialized database for certain tasks rather than using a general-purpose database for every job under the sun.

And what happened? Lots of specialized databases popped up, such as Vertica (which Stonebraker and Palmer built for data warehouse query applications and is now part of HP). There are read-oriented databases and write-oriented databases and relational databases and non-relational databases … blah, blah, blah.

The unintended consequence of that was the proliferation of new data silos, in addition to those already created by older databases and enterprise applications. And the existence of those new silos pose a next-generation data integration problem for people who want to create massive pools of data they can cull for those big data insights we keep hearing about.

Meet new-style ETL

In an interview, Palmer acknowledged that we’ve gone from one extreme — not enough database engines —  to too many options, with customers getting increasingly confused. And in that complexity, there is opportunity. And Palmer’s latest startup with Stonebraker — called Tamr — as well as other young companies like ClearStory, Paxata, Trifacta are attacking the task of cleaning up data in a process traditionally called Extract Transform Load or ETL.

Tamr combines machine learning smarts with human subject matter experts to create what Palmer calls a sort of self-teaching system. The startup is one of seven winners of our Structure Data Awards will be on hand at the Structure Data event next month to discuss the new era of ETL along with other trends in data.

The data sharing economy

As more companies share select information with supply chain and other trusted partners, ensuring that key data is clean will become more important. According to a new Accenture survey of 2,000 IT professionals, 35 percent of those surveyed said they’re already using partner APIs to integrate data and work with those partners while another 38 percent said they plan do that.

Per the survey:

One example is Home Depot, which is working with manufacturers to ensure that all of the connected home products it sells are compatible with the Wink connected home system – thereby creating its own connected home ecosystem and developing potential new services and unique experiences for Wink customers.

And, 74 percent of those respondents said they are using or experimenting with new technologies that integrate data with digital business partners.  Also from the Accenture report:

 “Rapid advances in cloud and mobility are not only eliminating the cost and technology barriers associated with such platforms, but opening up this new playing field to enterprises across industries and geographies.”

As the velocity and types of data flowing to and from applications increases “old style careful ETL curation doesn’t work anymore but [the data] still needs to be cleansed and prepped,” said Gigaom Research Director Andrew Brust.

In other words big data is big, no doubt. But in some cases, the old adage “Garbage In, Garbage Out” holds true even in the era of big data. If you really want the best insights out of the information you have, getting that data cleaned and spiffed up, can be a very big deal.


Boston is a database hub. Here are 5 startups to watch

The metro Boston area has good database DNA dating back to Digital’s Rdb. Those good genes are resurfacing in a fresh crop of database startups clustered in the area. Here are five hot database startups to watch in the Boston-Cambridge-Waltham nexus..

It’s not the big data, it’s the right data

Big data’s fine; the right data’s a game changer. Serial database entrepreneur Andy Palmer — who co-founded Vertica Systems and VoltDB — sees this massive amount of diverse big data as table stakes. The real, compelling value lies in “big analytics,” he says.

Why start up in Boston?

It may not be Silicon Valley but the Boston-Cambridge metro area has a lot going for it — infrastructure expertise, a deep talent pool, and VC funding. Facebook famously went elsewhere, but here’s why other local companies started here (and will stay put.)

Do BYO data centers make sense anymore?

In this era of cheap-and-reliable renta-data centers run by Amazon, Rackspace, and others, does it make sense for a company to build a new data center on it’s own? Unsurprisingly, Amazon’s own James Hamilton doesn’t think so. More surprisingly, other IT pros agree.

Twitter to open source Hadoop-like tool

Attention webscale aficionados, Twitter plans to open source its Hadoop-like real-time data processing tool known as Storm. The social service nabbed the code through its acquisition last month of BackType, and says it’s a better tool for processing streams of data.