DataTorrent, a startup building a stream-processing engine for Hadoop that it claims can analyze more than 1 billion data events per second, announced on Tuesday that its flagship product generally available. Stream processing is becoming more important as we move into an era of connected devices, ubiquitous sensors and fast-paced web platforms such as Twitter. Data is flowing into systems faster than ever, and many companies would like to get some use out of it in real time; in some cases, even hours-old data could be considered stale. Other products and projects addressing stream processing on Hadoop include Apache Storm, Spark Streaming and Samza, and Amazon Kinesis.
Eric Baldeschwieler, the founding CEO of Hortonworks and former Yahoo VP who led the company’s Hadoop development efforts, is now a strategic adviser to Hadoop startup DataTorrent. The company, which won the Structure Data Readers’ Choice award for infrastructure startups, sells stream-processing software designed to run in Hadoop environments (on top of YARN). Baldeschwieler also advises the white-hot Apache Spark startup Databricks. He left Hortonworks, where he was most recently CTO, in August 2013.
A Huntsville, Ala., company is moving from the machine-to-machine world into cloud platforms and big data. Here’s how it did it and how it thinks its work could actually end up saving lives.
There has been a lot of data industry news this week coming out of the Strata conference, and elsewhere. Here are some of the highlights.
The service, announced in November as a tool for customers who want to process data in a timely fashion, gives Amazon a rival to Apache Storm.
Netflix has open sourced a tool called Suro that collects event data from disparate application servers before sending them to other data platforms such as Hadoop and Elasticsearch. It’s more big data innovation that hopefully finds its way into the mainstream.
This post from the New York Times‘ Open blog talks about the architecture and algorithms underpinning its content-personalization engine. Its experience speaks to some larger trends around companies moving from batch to stream processing and to cloud services overall. The Times’ recommendation engine used to rely on MapReduce jobs that ran every 15 minutes, but now relies on a homegrown real-time system. It used to run on Cassandra, but now runs on Amazon’s DynamoDB service.
Dataminr, a startup dedicated to analyzing the Twitter firehose of real-time tweets, is using today’s BlackBerry news as proof of its value. The company claims it gave users a 3-minute advantage in which time to start selling BalckBerry shares.
Hortonworks is working to integrate the Storm stream-processing engine with its Hadoop distro, and hopes to have it ready for enterprise apps within a year’s time. It’s the latest non-batch functionality for Hadoop thanks to YARN, which lets Hadoop run all sorts of processing frameworks.
Samza is LinkedIn’s take on Twitter’s Storm engine for stream processing, only built on top of LinkedIn’s own Kafka messaging system. It’s the latest in a growing line of open source efforts from LinkedIn, and another notch in the belt for Hadoop.