DataTorrent’s Hadoop stream-processing engine is now for sale

DataTorrent, a startup building a stream-processing engine for Hadoop that it claims can analyze more than 1 billion data events per second, announced on Tuesday that its flagship product generally available. Stream processing is becoming more important as we move into an era of connected devices, ubiquitous sensors and fast-paced web platforms such as Twitter. Data is flowing into systems faster than ever, and many companies would like to get some use out of it in real time; in some cases, even hours-old data could be considered stale. Other products and projects addressing stream processing on Hadoop include Apache StormSpark Streaming and Samza, and Amazon Kinesis.

Hortonworks co-founder Baldeschwieler now advising DataTorrent

Eric Baldeschwieler, the founding CEO of Hortonworks and former Yahoo VP who led the company’s Hadoop development efforts, is now a strategic adviser to Hadoop startup DataTorrent. The company, which won the Structure Data Readers’ Choice award for infrastructure startups, sells stream-processing software designed to run in Hadoop environments (on top of YARN). Baldeschwieler also advises the white-hot Apache Spark startup Databricks. He left Hortonworks, where he was most recently CTO, in August 2013.

Netflix open sources its data traffic cop, Suro

Netflix has open sourced a tool called Suro that collects event data from disparate application servers before sending them to other data platforms such as Hadoop and Elasticsearch. It’s more big data innovation that hopefully finds its way into the mainstream.

On the path to personalization

http://open.blogs.nytimes.com/2013/11/15/on-the-path-to-personalization/

This post from the New York Times‘ Open blog talks about the architecture and algorithms underpinning its content-personalization engine. Its experience speaks to some larger trends around companies moving from batch to stream processing and to cloud services overall. The Times’ recommendation engine used to rely on MapReduce jobs that ran every 15 minutes, but now relies on a homegrown real-time system. It used to run on Cassandra, but now runs on Amazon’s DynamoDB service.

Hortonworks has big plans to make Storm work for the enterprise

Hortonworks is working to integrate the Storm stream-processing engine with its Hadoop distro, and hopes to have it ready for enterprise apps within a year’s time. It’s the latest non-batch functionality for Hadoop thanks to YARN, which lets Hadoop run all sorts of processing frameworks.