[soundcloud url=”https://api.soundcloud.com/tracks/156092093″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]
If you’re into big data, you probably know about Spark, sort of the Swiss Army knife of big data analytics in that it can handle all sorts of queries of all sorts of data types.
On this week’s Structure Show, Matei Zaharia, one of the brains behind the Apache Spark project and CTO of Databricks, a company built to commercialize the technology, explains how this multi-faceted query tool could help democratize the use of big data — a key claim in a world where the demand for data scientists far outstrips the supply.
But first, Derrick Harris catches us up on the big data and cloud news out of Google I/O including its new Dataflow tool that claims to make it much easier to write data processing pipelines that can utilize both batch and stream-processing. Dataflow is Google’s response to Amazon’s Elastic MapReduce and Kinesis. The big data analytics feature war is fully upon us.
Hosts: Barb Darrow and Derrick Harris