Why the big data startup boom will likely be short-lived

There has been a remarkable flowering of companies over the past year or two, all riding a wave of developer and investor enthusiasm for the loosely defined concept of “big data.” Ovum‘s Tony Baer is one of several industry analysts offering impressive figures for adoption of the tools that big data companies offer. But having talked to ReadWriteWeb this week, IDC analyst Dan Vesset suggests that these new companies’ days are numbered.

Cloudera, Hortonworks, MapR. These and other companies have emerged in recent years to help customers manage a growing volume of structured and unstructured data. Wrapping support and professional services around open-source components such as Hadoop, these companies have done much of the work to bring big data to a paying audience.

But GigaOM’s Derrick Harris was moved to comment back in June that “evidence suggests a presently paltry revenue base for the software Hortonworks, Cloudera and EMC peddle.” Indeed, while money continues to be invested in these firms (Cloudera more than doubled the amount of money it has attracted, with a $40 million Series D round last month), Derrick’s core point stands: The big data startup market is probably overvalued and headed for a lot of consolidation.

Why? Although undeniably powerful, their tools are raw, somewhat unpolished and typically focus on specific use cases and types of data. As Scott Fulton puts it in the ReadWriteWeb piece, they are screwdrivers rather than platforms. And screwdrivers are fine in certain situations, but few tradesmen will stay in business for long tackling every job with one. In a similar vein, few enterprises will find many of their data processing requirements met by the current generation of raw big data tools.

Companies like Cloudera and Hortonworks certainly recognize that their tools need some polish, which is why they are investing in support, simplified installers and nurturing a community of loyal developers. This work serves to take their products from what Everett Rogers might describe as innovators to early adopters. But polish alone is not enough to see a technology cross the chasm for adoption by nonenthusiasts. For that, it must be genuinely useful (most big data tools are) and either indispensable or applicable to a range of problems (few big data tools are).

Hence the dilemma. Big data is capturing attention and generating buzz.¬†Investors recognize that there is money to be made here.¬†Most companies could identify one or more “big data problems,” but they would typically find themselves having to deploy more than one tool to meet their needs. We have already seen some convergence among the startups in the space, but it remains difficult for these small companies to compete with the likes of Oracle, HP and IBM. Big enterprise players can offer complete solutions to meet a customer’s needs. They may not always be the best products in the space, and they are rarely the cheapest. But they typically sell well and are well-supported and designed to meet a broad range of needs. Why buy half a dozen products when one will do almost as well?

For many of these young startups, their best hope may well prove to be acquisition, some of which is already happening. HP swallowed Vertica in February, EMC grabbed Greenplum in 2010 and IBM gobbled up Netezza the same year. More recently, Oracle trumpeted its newfound enthusiasm for big data. In these and other examples, startups with an interesting idea disappeared inside far larger companies. Perhaps they had to, in order to be taken seriously by mainstream customers beyond the bleeding edge of adoption.

Over the next few years, we will see several more big data startups acquired. We will probably also see several disappear or pivot, as they fail to generate enough revenue (or investment) to stay afloat. Whether we see one or a handful survive to stand against the might of the enterprise solution vendors remains to be seen, but the prospects cannot be good.

Question of the week

Will any of the current big data companies survive as independents?