This week in big data: Clouds, collaboration and Cassandra

O’Reilly Media’s Strata conference kicked off on Tuesday, which means a lot of big data companies announced new products this week. We already covered MapR’s new lineup of Hadoop features, Red Hat’s partnership with Hortonworks and Splice Machine’s funding, but here are some of the week’s other big data highlights.
While much of this week’s news is based at the infrastructure layer, all that infrastructure helps underpin new applications and new types of data analysis. We’ll be talking about those applications, and some of the infrastructure, in depth at our Structure Data conference next month in New York. We’ll have the CEOs of big data vendors Cloudera, Hortonworks and Pivotal, users ranging from Ford Motor Company to Turner Broadcasting, and experts in applying data to everything from human rights to artificial intelligence.

Cassandra’s looking better than ever

Although it didn’t seem so on the surface, the handful of partnerships that NoSQL startup DataStax announced on Tuesday around the Cassandra database is actually quite telling. It’s one thing when a cloud provider Google gets on board (which it has), but something else when smaller, more conservative providers like GoGrid and large-enterprise consultants such as Accenture do. That DataStax is working with those latter companies suggests their customers are running or interested in running Cassandra (and other NoSQL data stores, in the case of GoGrid) and are asking for support.
Actually, Christophe Bisciglia, co-founder and CEO of predictive-modeling startup WibiData, confirmed this theory during an interview this week. WibiData — which was created around HBase as the the foundation of its machine learning platform and actually created an open source project called Kiji around simplifying its use — now partners with DataStax and also supports Cassandra as a back-end service. The company is gaining traction among retail customers (including some quite large ones) that want to do online recommendations and increasingly, Bisciglia said, it’s running into prospects that wanted to use WibiData “but had already … bought themselves some DataStax.”
So WibiData’s new strategy is no longer just about HBase, but rather “to obviate the decision about key-value stores.” It has re-engineered some of the Kiji APIs to work with Cassandra like they do with HBase, and will open source them over the next few weeks. Customer demand will dictate which other databases WibiData supports and when, but the goal is to provide a uniform process for building WibiData applications regardless of where the data is being stored.

Alpine Data Labs does data science collaboration

Predictive modeling startup Alpine Data Labs announced a new feature called Chorus that helps data science teams collaborate on projects and makes it easier for people to find existing data sources and models from which they can start their work. Most companies don’t have a team of data scientists as classically defined, but rather a group of people with one- or two-thirds of that skill set, so a platform that can improve the collaboration process is important.
If this sounds a lot like Pivotal (formerly EMC Greenplum) Chorus, it is. Alpine Chief Product Officer Steve Hillion was at Greenplum when Chorus launched in 2011. Greenplum open sourced the technology in 2012.

More Storm love

A stream-processing engine, Storm was created by a web startup called Backtype, which Twitter acquired in 2011 before open sourcing the Storm technology shortly thereafter. It’s still very popular with web companies but is gaining mainstream attention, as well — hence the attempt by Hadoop startup Hortonworks to develop an enterprise-class version, and Amazon Web Services’ embrace of both stream processing and Storm via its Kinesis service.
Now, Storm is finding supporters even in smaller vendors. Data-management vendor Pentaho announced native support for YARN and Apache Storm in its data-integration software, meaning companies can feed streaming data into their analytics environments without first running it through batch transformation processes. And SQLstream, which provides streaming SQL analytics (as its name suggests), has developed a processor for Storm that will let Storm users query data using SQL as it crosses the wire.

Zettaset gets a Hadoop patent

The USPTO granting Zettaset a patent for its high-availability Hadoop architecture probably wouldn’t be too interesting by itself — companies get patents all the time — but Zettaset’s pending trade-secret litigation with Intel adds some context. If Intel’s Hadoop distribution and management software is as identical to Zettaset’s as it claims — and if the company is dead set on righting its alleged wrong — it’s conceivable a patent-infringement lawsuit could follow.

Cybereason launches to battle ‘malops’

It is a great time to be in the security space if you have a good story to tell, and Cybereason seems to have that. It was founded by a former Israeli intelligence officer and uses machine learning techniques to detect signs of malicious attacks that have already occurred. The company’s premise is that trying to prevent attacks — like the ones that hit large enterprises and even governments with increasing regularity — is essentially futile, so the best defense is to spot the infiltration and stop it before any real damage is done. Cybereason has raised $4.6 million in venture capital from Charles River Ventures.
Feature image courtesy of Shutterstock user zOw.