AT&T Labs, Continuuity will open source a Hadoop streaming engine called jetStream

Hadoop application-platform startup Continuuity and AT&T Labs are working on a stream-processing technology called jetStream that they intend to open source by the third quarter of this year. It’s an integration of Continuuity’s BigFlow real-time data-processing framework with AT&T’s streaming analytic database technology, all sitting atop Hadoop and managed by YARN.

According to Continuuity Co-founder and CEO Jonathan Gray, AT&T’s technology will deliver super-high throughput and a SQL interface for analytics, while BigFlow is simpler, more durable version of Apache Storm for data processing. He compares the AT&T database to a complex-event processing system or other continuous querying system. A sample application of jetStream might be something like managing smart grid data, where events could be analyzed in real time before trickling into BigFlow to be processed and ran against more-complex but higher-latency models.

jetstream diagram

Another Continuuity co-founder, Nitin Motgi, laid out the benefits of jetStream in a blog post:

  • Direct integration of real-time data ingestion and processing applications with Hadoop and HBase and utilization of YARN for deployment and resource management
  • Framework-level correctness, fault tolerance guarantees, and application logic scalability that reduces friction, errors, and bugs during development
  • A transaction engine that provides delivery, isolation and consistency guarantees that enable exactly-once processing semantics
  • Scalability without increased operational cost of building and maintaining applications
  • Develop pipelines that combine in-memory continuous query semantics with persistent, procedural event processing with simple Java APIs

Continuuity has been big on open sourcing its technology since launching in 2012, but the same can’t be said about AT&T historically. However, Gray said AT&T has been trying to contribute more back to the community — especially the Hadoop community where it’s investing more resources   — and its knows it needs some help engaging the community and getting its code ready for open source.

Speaking of jetStream, in particular, Gray said, “We wanted to establish a new standard for streaming on Hadoop.” That might be easier said than done given numerous other open source efforts efforts around Storm and Apache Spark, and the emergence of startups such as DataTorrent specializing in stream processing for Hadoop. However, jetStream is certainly another reason developers might consider building their Hadoop applications atop Continuuity’s Reactor platform rather than trying to cobble together various components on their own.

Sample Nanocubes visualizations.

Sample Nanocubes visualizations.

jetStream was announced during an AT&T keynote at the Hadoop Summit on Tuesday, where the company also announced, or highlighted at least, a couple other open source projects it’s working on. One is a technology called Nanocubes for visualizing massive spatiotemporal datasets in a web browser. Another is called RCloud, a web-based platform for analyzing and collaborating on data using the R statistical programming language.