LexisNexis open sources code for Hadoop alternative

HPCC Systems, the division of LexisNexis Risk Solutions dedicated to big data, has released the open source code of its data-processing-and-delivery software it’s positioning as a better version of Hadoop. The High Performance Computing Cluster code is available on Github, and it marks the commencement of HPCC Systems’ quest to build a community of developers underneath Hadoop’s expansive shadow.

“We’re now really open source,” LexisNexis CTO Armando Escalante told me, responding to early criticism that the company was dragging its feet on releasing the code. He said he’s excited, but nervous because the code is now exposed to reviews and comments after years of operating privately within LexisNexis Risk Solutions.

The HPCC architecture includes the Thor Data Refinery Cluster and the Roxy Rapid Data Delivery Cluster. As I explained when covering the HPCC Systems launch in June, “Thor — so named for its hammer-like approach to solving the problem — crunches, analyzes and indexes huge amounts of data a la Hadoop. Roxie, on the other hand, is more like a traditional relational database or database warehouse that even can serve transactions to a web front end.” Both tools leverage the company’s Enterprise Control Language, which Escalante describes as easier, faster and more efficient than Hadoop MapReduce.

Aside from the open source Community version, HPCC Systems also offers a paid Enterprise version of the HPCC product. The core code is the same, Escalante explained, with the major differences being additional enterprise-grade capabilities such as management tools and support and services.

It will be a tall order to displace Hadoop — which has growing vendor, project and developer ecosystems — but Escalante is confident HPCC can do it. According to Escalante, Hadoop needs a large community because it’s a growing project, whereas HPCC is already mature because it has been serving large customers for a decade. It’s like trying to evolve a microbe into a human being instead of just starting with a human being off the bat. The challenge, he thinks, will be spreading that message to web startups already sold on and experienced with Hadoop.

However, Escalante doesn’t think most enterprises are locked into Hadoop at this point, if they’ve even used it at all. And with its track record and Enterprise Edition features, HPCC is arguably more geared toward enterprises anyhow. For companies spending big money on traditional hardware systems, Escalante says HPCC has to look even better.

“We haven’t killed Hadoop [yet] … but we have killed mainframes,” he explained. By mainframes, he means all the remnant legacy data centers, such as large, expensive storage systems, data warehouses and OLAP systems. Because of Roxie’s capabilities running on commodity hardware, Escalante said LexisNexis was able to get rid of millions of dollars worth of legacy gear. As large enterprise’s data volumes keep growing, he said, they’ll have to pay through the nose to buy traditional systems big enough to handle the load.

With Hadoop, companies must maintain separate data warehouse environments, although startups Hadapt, and to some degree, Platfora, aim to change that.

HPCC Systems, as well as Microsoft (s MSFT) with its Dryad project, has an outside chance to steal some of Hadoop’s thunder with developers, but as Escalante acknowledged, its best chance is probably with large customers that will be moved by its enterprise-readiness. HPCC Systems is touting Sandia National Laboratories and the Georgia Tech Research Institute as two big-data-savvy users already sold on HPCC, and Escalante promises some big-name customers wins in the next few months.

Feature image courtesy of Flickr user opensourceway.