Big data startup Mortar expands beyond Hadoop using tech created at Spotify

New York-based big data startup Mortar is moving beyond Hadoop and will now support, via a cloud service, complex data-processing pipelines that might touch any number of tools or data stores. The new capabilities are made possible thanks to an open source technology called Luigi, which was developed by music-streaming company Spotify.

[company]Mortar[/company] began it’s life offering a simple framework for writing and launching Hadoop jobs that run on Amazon Web Services’ Elastic MapReduce service, and in 2012 it began open sourcing much of its code and sharing examples of how to build certain job types. In 2013, it teamed with a handful of well-known data scientists to help users build recommendation engines and, hopefully, establish a repeatable process for building them on the Mortar platform.

Now, Co-founder and CEO K Young said, it’s time to take things a step further by making it easier to use Mortar for applications that need to touch more than just Hadoop. Customers love Hadoop, “but there’s a lot of important data that doesn’t require Hadoop, or isn’t even a natural fit,” he said. Using Luigi as a cloud service, Mortar users can now use Python to build and visualize pipelines that might send data any number of databases or other processing environments that are reachable via API.

A visualization of a Luigi pipeline. Source: Spotify

A visualization of a Luigi pipeline. Source: Spotify

Young hopes providing a managed version of Luigi will help users get their applications from prototype to production faster because the process will be easier, and also because it will be more reliable. It’s not a sign that that Mortar is moving away from Hadoop (which Young says “is still as hard as it ever was”) as much as it is a realization that modern data applications will likely touch multiple environments, and developers could use a simpler way to manage that process.

“It’s not a lack of data scientists or even technological complexity [that hampers many big data applications],” Young said. “… The biggest thing that just stops these projects is getting data from where it is to where it needs to be.”