In part one, we looked at the forces driving a proliferation of new database solutions, loosely ordered within an emerging Hadoop ecosystem. Examples of these specialized analytical engines include:
1) Databases optimized for cloud scaling. In the Gigaom Research report, What to know when choosing database as a service, George looks at the database solutions—such as VoltDB, Clustrix, and NuoDB—that are bringing a SQL interface to scalable database clusters.
2) Databases optimized for archiving. As he describes in the Gigaom Research report, How to manage big data without breaking the bank, databases such as RainStor are able to leverage up to 40-times compression of archive data to bring new cost effectiveness and accessibility to the storage of data records.
3) Open source NoSQL databases, such as MongoDB and CouchDB. These databases, which George expects to migrate more fully to the Hadoop ecosystem, are optimized for the frequent product updating required for mobile and web environments.
4) Graph databases, such as Neo Technology’s Neo4J, that specialize in tracking and optimizing the multipoint networks found in shipping, transportation, and telecommunications, computer networks and similar environments.
5) The Gnu-project statistical language and environment, R. This is a preexisting language for statistical analysis that will be used for stats-oriented databases within Hadoop.
6) Splunk, with its machine log data and analysis that currently provides two-way integration with Hadoop and other data environments.
7) Microsoft’s massively parallel data warehouse and Hadapt’s implementation of SQL on Hadoop. These products provide alternative routes to Hadoop database access that combine a SQL interface with very low-cost and high performance improvements over the traditional data warehouse.
Not all of these types of products are presently operative or fully functional within Hadoop. But Gigaom Research analyst George Gilbert expects they will be options within a larger Hadoop ecosystem as the IT industry undergoes a period of increasing database options and complexity under the increasingly unifying Hadoop umbrella.
In part three, we will look at how this market of largely startup and open source alternatives will mature—and be made practical for the average enterprise organization.