Meet Myriad, a new project for running Hadoop on Mesos

Hadoop vendor MapR and data center automation startup Mesosphere have created an open source technology called Myriad, which is supposed to make it easier to run Hadoop workloads on top of the popular Mesos cluster-management software. More specifically, Myriad allows the YARN resource scheduler — the linchpin of Hadoop 2.0 that lets the platform run processing frameworks other than MapReduce — to run on top of Mesos, effectively creating an auto-scaling Hadoop cluster that’s relatively future-proof.

“Before, you had to make a choice, and now you can just run YARN on Mesos,” explained Mesosphere founder and CEO Florian Leibert. “… I think the goal here is to have more workloads in a shared environment.”

What he means is that companies will no longer have to run Hadoop on one set of resources, while running the web servers, Spark and any other number of workloads on other resources managed by Mesos. Essentially, all of these things will now be available as data center services residing on the same set of machines. Mesos has always supported Hadoop as a workload type — and companies including Twitter and Airbnb have taken advantage of this — but YARN has appeal as the default resource manager for newer distributions of Hadoop because it’s designed specifically for that platform and, well, is one of the foundations of those newer distributions.

The old static partition.

The old static partition.

With Myriad, YARN can still manage the resource allocation to Hadoop jobs, while Mesos handles other tasks as well as the task of scaling out the YARN cluster itself. So instead of the current state of affairs, where YARN clusters are statically defined and new nodes must be manually configured, Mesos can spin up new YARN nodes automatically based on the policies in place and the available resources of the cluster.

Mesosphere engineer Adam Bordelon said Myriad works now and that eBay and Twitter have been testing it out. eBay actually contributed quite a lot to the first version of the code. However, he noted, Myriad still early in its development and needs quite a few more features, including around security.

“I imagine within a month or two,” he said, “it should be in production somewhere.”

Despite the fact that two commercial companies are driving Myriad at this point, Bordelon said the goal is definitely to build a community around the project. It’s currently hosted in the Mesosphere GitHub repository, but the team is currently working on a proposal to make it an Apache Incubator project.

“It is definitely a community effort,” he said.

The new YARN-on-Mesos architecture.

The new YARN-on-Mesos architecture.

Jim Scott, MapR’s director of enterprise strategy and architecture, said that Hadoop was pitched in part as a tool for eliminating data silos. However, he added, “As we start see those data silo walls come down, we’re starting to see other walls come up.” One of those walls is the relegation of Hadoop to its own dedicated cluster far away, logically at least, from everything else.

“This is the enabling function, in my mind,” he said, “that makes it so people can tear that wall down.”

MapR CEO John Schroeder will be among many speakers talking about the evolution of Hadoop and big data architectures at our Structure Data conference in New York next month. Others include Cloudera CEO Tom Reilly, Hortonworks CEO Rob Bearden, Google VP of Infrastructure Eric Brewer, Databricks CEO Ion Stoica and Amazon Web Services GM of Data Science Matt Wood.

And for more on Mesos, Mesosphere and why they have some engineers so excited, check out our May 2014 Structure Show podcast interview with Mesosphere CEO Leibert.

[soundcloud url=”https://api.soundcloud.com/tracks/151905825″ params=”color=ff5500&auto_play=false&hide_related=false&show_artwork=true” width=”100%” height=”166″ iframe=”true” /]

Download This Episode 

Subscribe in iTunes

The Structure Show RSS Feed

With $10M, HashiCorp launches its first commercial product

Building applications in today’s world involves a lot of work assembling, managing and monitoring all of those various components that need to come together across myriad environments. To help with this chore, HashiCorp is rolling out an application development hub called Atlas, its first commercial product based on its various open-source technology. The startup is also announcing a $10 million series A funding round from Mayfield Fund, GGV Capital and True Ventures (see disclosure).

HashiCorp’s biggest claim to fame is its open-source Vagrant tool that helps developers quickly spin up virtual environments so they can build and test their software projects before they see the light of day.

Over time, the startup developed other open-source tech to help coders with all aspects of the software-development process; from Serf, which handles cluster management and makes sure those developer environments don’t fail, to Consul, which helps users discover and configure all the services running in their coupled-together applications.

Atlas diagram

Atlas diagram

With Atlas, the startup is bundling up all of its open-source software into one package and throwing in a dashboard that will supposedly let coders see how their application is performing in both public and private clouds or hybrid environments.

The Atlas software-as-a-service is now available in beta and will be available to the public in the first quarter of 2015; the company will explain pricing by then and will unveil an on-premise version.

Diagram provided by HashiCorp

Disclosure: HashiCorp is backed by True Ventures, a venture capital firm that is an investor in the parent company of Gigaom.

CoreOS CEO: We’re not out to replace Docker, just its containers

There was a major shakeup in the world of container-based computing this week when operating system provider CoreOS decided to get into the container space with a new open source project called Rocket. It’s a container runtime environment as well as a set of specifications for how App Containers — what CoreOS calls its container images — are built and function. But the bigger news industry-wide was the suggestion from CoreOS that it built Rocket because developer darling Docker isn’t living up to expectations.

CoreOS Co-founder and CEO Alex Polvi came on the Structure Show podcast this week to clarify that message and to explain the rationale behind Rocket and everything CoreOS does. If you’re interested in the future of containers, distributed systems and even cloud computing, both business-wise and technologically, it’s a must-listen interview. Here are some highlights, but there’s a lot more good stuff.

[soundcloud url=”https://api.soundcloud.com/tracks/179884272?secret_token=s-mb5PC” params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

Download This Episode

Subscribe in iTunes

The Structure Show RSS Feed

We’re fine with Docker, really!

If there’s one point that Polvi really wants to get across, it’s that CoreOS didn’t build Rocket because it doesn’t like Docker — either the technology or the company. He called that notion — expressed by the media, as well as, in numerous fora, Docker founder and CTO Solomon Hykes — “fundamentally flawed.”

The rationale behind Rocket is simple, Polvi explained. Docker is turning into more of a platform, adding in features around cluster management, networking and booting cloud servers, and CoreOS wanted to make sure that the original, simple container component didn’t get lost to the world as that happens. In fact, he says he’s fine with the idea of a Docker platform:

[blockquote person=”” attribution=””]”That’s a fine product, the private cloud is an open territory right now still. So the Docker platform is a product that needs to exist. We just want the simple composable building block to also exist for people that have their own platforms or they’re trying to build their own platform to use as a reusable component.”[/blockquote]

Although, below the surface, it might not be the mutual respect society the companies would like everyone to think it is. Later, while comparing Docker’s move away from containers to VMware’s move away from virtual machines, Polvi noted, “There is a debate as to whether the technology warrants another company like VMware to emerge.”

CoreOS CEO Alex Polvi

CoreOS CEO Alex Polvi

We build what we have to

When you consider the CoreOS business strategy, the reasons for Rocket begin to look a little more clear. Polvi calls the CoreOS lineup of technologies, which also includes a database, registry service, cluster management and other pieces, “a platform for platform builders.” It’s building the “primitives” that people need to build next-generation distributed systems and platforms, as opposed to actually building the platforms (think Heroku or CoreOS partner Deis) where people ultimately deploy applications.

“We are never trying to just take somebody else’s solution and build it,” Polvi said. “We’re trying to fill in the white space and build something that’s technically sound in an area we think is an open problem.”

He contrasts this with Docker, which he says is now becoming more akin to cluster (and container) management plays such as Mesosphere and the Kubernetes project, or VMware. Those technologies might use containers and let users move them around and manage them, but they’re far more about the management aspect than about the containers, or any other pieces of infrastructure, themselves.

Kubernetes works levels above the container, which isn't mentioned on this diagram from Microsoft.

Kubernetes works levels above the container, which isn’t mentioned on this diagram from Microsoft.

In fact, despite the fact that CoreOS has its own cluster-management tool, called Fleet, Polvi said the company actually contributes quite a bit to the Google-led Kubernetes project because it really likes the technology and the trajectory the project is on.

“Docker was a similar thing early on,” he added. “We used it for a year, we collaborated heavily with that community, but then it became clear they were on a trajectory that was no longer what we needed — and what a lot of people needed, not just us.”

Still, Polvi noted, technically, there’s no reason why Docker containers and Rocket can’t coexist provided Docker is willing to work within CoreOS’s container specifications or collaborate with CoreOS to develop a standard container format.

Structure 2010: Sebastian Stadil – CEO, Scalr; William “Skip” Bacon – VP of Products and CTO, Virtual Instruments; Michael A.Jackson – Co-Founder, President, and COO, Adaptive Computing; Jagan Jagannathan – Founder and CTO, Xangati; Alex Polvi – CEO and Co-Founder, Cloudkick; Javier Soltero – CTO for Management Products, SpringSource

A younger Polvi (far left) talking cloud a Structure 2010.

A quick thought on the cloud

We also asked Polvi about the world of cloud computing, where he used to work after Rackspace acquired his last startup, CloudKick, and where many CoreOS workloads will likely run. Maybe old allegiances just die hard, but Polvi thinks Rackspace is actually in a pretty good position as bigger cloud providers such as Amazon Web Services, Google and Microsoft continue to drive down prices.

“Now, because of the competitive pressure of the cloud providers, compute on infrastructure will go asymptotically to free over time, as well,” he said. “If you think about it, what’s left after the hard parts of software are free and the compute itself is relatively free, or free enough? … I think it’s service, that’s how you do it. You help people use all this stuff.”

Facebook redesigned the data center network: 3 reasons it matters

Earlier this month, Facebook announced a new data center networking architecture that it calls, fittingly, “data center fabric.” We had Facebook Director of Network Engineering Najam Ahmad on the Structure Show podcast this week to talk about the new fabric in more detail.

Mesosphere raises $10.5M to push virtualization à la Google

Inspired by Google’s famous approach to resource management, Apache Mesos is the open source software that manages the large pools of servers and cloud instances at companies such as Twitter and Airbnb. Mesosphere, a company trying to commercialize it, has raised $12.75 million since launching.

Pepperdata launches and raises $5M to manage Hadoop clusters

A startup called Pepperdata launched on Tuesday along with $5 million in series A venture capital from Signia Venture Partners and Webb Investment Network. The company, which was co-founded by two Yahoo and Microsoft veterans, says its technology sits above the Hadoop cluster and monitors the resource usage of each running task. Cluster management is very important, but Pepperdata’s challenge will be proving its approach is good enough to justify paying a third party rather than using existing vendor software (including from Cloudera) or open source projects such as Apache Ambari and Apache Mesos.

Continuuity open sources Loom for devops on big data clusters

Big data startup Continuuity has open sourced a tool called Loom that’s designed to make deploying and managing large clusters a push-button experience. These types of tools are important as data-driven applications become more common, but the infrastructure remains a challenge.

Airbnb is engineering itself into a data-driven company

Like all most web companies, Airbnb is trying to provide a better user experience by analyzing lots and lots of data. Here’s how the company built its big data infrastructure atop Amazon’s cloud and how all that data manifests itself in products.