Meet Myriad, a new project for running Hadoop on Mesos

Hadoop vendor MapR and data center automation startup Mesosphere have created an open source technology called Myriad, which is supposed to make it easier to run Hadoop workloads on top of the popular Mesos cluster-management software. More specifically, Myriad allows the YARN resource scheduler — the linchpin of Hadoop 2.0 that lets the platform run processing frameworks other than MapReduce — to run on top of Mesos, effectively creating an auto-scaling Hadoop cluster that’s relatively future-proof.

“Before, you had to make a choice, and now you can just run YARN on Mesos,” explained Mesosphere founder and CEO Florian Leibert. “… I think the goal here is to have more workloads in a shared environment.”

What he means is that companies will no longer have to run Hadoop on one set of resources, while running the web servers, Spark and any other number of workloads on other resources managed by Mesos. Essentially, all of these things will now be available as data center services residing on the same set of machines. Mesos has always supported Hadoop as a workload type — and companies including Twitter and Airbnb have taken advantage of this — but YARN has appeal as the default resource manager for newer distributions of Hadoop because it’s designed specifically for that platform and, well, is one of the foundations of those newer distributions.

The old static partition.

The old static partition.

With Myriad, YARN can still manage the resource allocation to Hadoop jobs, while Mesos handles other tasks as well as the task of scaling out the YARN cluster itself. So instead of the current state of affairs, where YARN clusters are statically defined and new nodes must be manually configured, Mesos can spin up new YARN nodes automatically based on the policies in place and the available resources of the cluster.

Mesosphere engineer Adam Bordelon said Myriad works now and that eBay and Twitter have been testing it out. eBay actually contributed quite a lot to the first version of the code. However, he noted, Myriad still early in its development and needs quite a few more features, including around security.

“I imagine within a month or two,” he said, “it should be in production somewhere.”

Despite the fact that two commercial companies are driving Myriad at this point, Bordelon said the goal is definitely to build a community around the project. It’s currently hosted in the Mesosphere GitHub repository, but the team is currently working on a proposal to make it an Apache Incubator project.

“It is definitely a community effort,” he said.

The new YARN-on-Mesos architecture.

The new YARN-on-Mesos architecture.

Jim Scott, MapR’s director of enterprise strategy and architecture, said that Hadoop was pitched in part as a tool for eliminating data silos. However, he added, “As we start see those data silo walls come down, we’re starting to see other walls come up.” One of those walls is the relegation of Hadoop to its own dedicated cluster far away, logically at least, from everything else.

“This is the enabling function, in my mind,” he said, “that makes it so people can tear that wall down.”

MapR CEO John Schroeder will be among many speakers talking about the evolution of Hadoop and big data architectures at our Structure Data conference in New York next month. Others include Cloudera CEO Tom Reilly, Hortonworks CEO Rob Bearden, Google VP of Infrastructure Eric Brewer, Databricks CEO Ion Stoica and Amazon Web Services GM of Data Science Matt Wood.

And for more on Mesos, Mesosphere and why they have some engineers so excited, check out our May 2014 Structure Show podcast interview with Mesosphere CEO Leibert.

[soundcloud url=”https://api.soundcloud.com/tracks/151905825″ params=”color=ff5500&auto_play=false&hide_related=false&show_artwork=true” width=”100%” height=”166″ iframe=”true” /]

Download This Episode 

Subscribe in iTunes

The Structure Show RSS Feed

With a $50M line of credit, DigitalOcean will build more data centers

DigitalOcean, the cloud provider that’s a hit with developers, said today that it’s landed a $50 million credit facility provided by the investment firm Fortress Investment Group. The new credit line follows the startup’s recent $37.2 million Series A funding round led by Andreessen Horowitz.

DigitalOcean’s co-founder and CEO Ben Uretsky told the Wall Street Journal that the startup plans to use the loan to build out new global data centers with one slated for Frankfurt, Germany. The startup said in a news release that the credit line will help it lease more equipment at better rates as it attempts to build more international facilities.

Data centers aren’t exactly the cheapest things to build out, so taking a credit line makes sense for DigitalOcean. For example, [company]Google[/company] is aiming to spend $772 million on a giant data center in the Netherlands and Facebook’s data center in Altoona, Iowa was supposed to be a $1.5 billion investment. While DigitalOcean will more than likely not build the type of data centers seen at Facebook and Google, the company will still be plunking down a good amount of cash.

The New York-based startup’s unique pricing model — which involves “droplets” of compute, storage and networking resources all bundled together — has helped it carve a niche among developers looking for an easier way to get into the cloud as opposed to studying the rosetta stone that is the Amazon Web Services pricing matrix.

For a more in-depth look at what DigitalOcean has been doing to distinguish itself in the highly competitive world of cloud providers, be sure to listen to Uretsky chat it up with Gigaom last July on The Structure Show.

[soundcloud url=”https://api.soundcloud.com/tracks/157059336?secret_token=s-L6M2I” params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

 

Google wants to show the world how sexy cluster management really is

A partnership between Google and Mesosphere further’s Google’s strategy to sell the world on its way of automating applications and resources. Cluster management is important — even sexy when wrapped in the lore of Google or Facebook — and now Google claims it’s easier than ever.

Mesosphere raises $10.5M to push virtualization à la Google

Inspired by Google’s famous approach to resource management, Apache Mesos is the open source software that manages the large pools of servers and cloud instances at companies such as Twitter and Airbnb. Mesosphere, a company trying to commercialize it, has raised $12.75 million since launching.