Commercializing Big Data

With the web and cloud computing generating new data sources and consumption patterns, a fresh crop of software solutions and companies have emerged to tackle big data. And while there’s been rapid adoption of big data tools on the part of large web properties, the business models around building, delivering and monetizing big data solutions are still taking shape. In order to better understand the trends, let’s take a look at some of the popular solutions.

Many of these solutions are focused on a NoSQL approach, which relaxes restrictions of earlier database models in order to achieve a significantly higher degree of scalability. And while these solutions may not be ideal for bank transactions, they’re often perfect for dealing with voluminous amounts of web application data.

There are currently three broad categories for commercializing big data. The first is to sell professional services for open-source software, in which companies rely on existing software distribution mechanisms, such as the Apache Foundation, and provide commercial support on a per-node or per-site basis. The next model is to sell software licenses for a product, or to sell support for a custom distribution. The third is to sell software as as service, accessible in the cloud.

In no particular order, here are several companies that have formed a commercial entity around a software product. Most have also successfully raised venture financing.

  •, which under parent company Relaxed focuses on Apache CoucheDB. It raised $2 million from Redpoint Ventures in December 2009.
  • 10gen, which offers commercial support, training and services for NoSQL document database MongoDB and has raised $3.4 million in Series B round from Union Square Ventures and Flybridge Capital.
  • Basho Technologies, provider of RIAK, a distributed data store, in both an open-source and paid commercial version. The company recently filed for a $2 million debt and options offering after raising $2 million from Harbor Island Equity Partners and the Wilmington Investor Network.
  • Cloudera, an early entrant offering professional services for Hadoop, now positions itself as an enterprise platform for the popular big data engine. The company has secured $11 million in two financing rounds.
  • Neo Technology, developer of Neo4j, an open-source graph database, raised $2.5 million from Sunstone Capital and Conor Venture Partners. Graph databases are particularly useful when it comes to storing models of network-connected information, including everything from social networks to cellular tower networks.
  • Loggly, which provides log management as a service, recently raised $4.2 million, following a small seed investment late last year.
  • Hypertable, provider of commercial support for the C++ implementation of a scalable key value store similar to BigTable from Google. The software is in use by several large properties overseas including Baidu and, India’s largest web property.
  • CitrusLeaf, whose elastic, fast-transaction, distributed database targets web, mobile and social networking applications.

And some are finalists in the LaunchPad competition at GigaOM’s upcoming Structure conference:

  • Riptano, which recently emerged to provide commercial support and services for Cassandra, a popular distributed database that originated at Facebook and is now used by Twitter, Digg, SimpleGeo and others.
  • Cloudant, a Y Combinator company that offers CouchDB in the cloud. CouchDB is a scalable document-oriented database written in Erlang, a programming language used at Ericsson and now undergoing a renewed level of popularity.
  • Datameer, which brings spreadsheet intuitiveness to solving big data problems with Hadoop. The company recently secured $2.5 million in seed financing from Redpoint Ventures.
  • NorthScale, provider of a distribution of memcached, the popular open-source caching framework. It also has its own proprietary Membase server in the works, which will offer tunable persistence, pluggable storage engines and configurable replication — all important for handling big data. NorthScale has raised $15 million in two rounds from Mayfield Fund, Accel Partners and North Bridge Venture Partners.

Of course there are popular big data projects not listed here such as HBASE, which has yet to attract a commercial entity providing support. There are also semi-corporate-sponsored projects like Project Voldermort at LinkedIn or Redis, now sponsored by VMware. There are undoubtedly others as well. If you know of a new company that has formed to provide support or software solving big data issues, please leave it in the comments.

Few doubt the impact big data is now having on the design and implementation of web and cloud applications. But the opportunity to monetize the solutions to those problems is still open and the leaders are still emerging. Be sure to check out the two big data panels at Structure 2010 — “Scaling the Database in the Cloud” and “Dealing with the Data Tsunami” — to get the latest scoop.

Gary Orenstein is host of The Cloud Computing Show.