How Spotify is ahead of the pack in using containers

In late December, CoreOS CEO and container guru Alex Polvi proclaimed in a tweet that he believes 2015 will be the year of the production-ready container, which would be a testament to how fast companies are adopting the technology that promises more portability and less overhead than virtual machines.

For music streaming service Spotify, however, containers are already a way of life. The streaming-music provider has been using containers in production on a large scale, according to Mats Linander, Spotify’s infrastructure team lead.

This is a big deal given that it seems only a few companies beyond cloud providers like Google or Joyent have gone public with how they are using container technology in production. Indeed, when Ben Golub, CEO of the container-management startup Docker, came on the Structure Show podcast in December and described how financial institutions are experimenting with containers, he said that they are generally doing pilots and are using Docker containers “for the less sensitive areas of their operations.”

Ever since Docker rose to prominence, developers have been singing the praises of containers, which have made it easier to craft multicomponent applications that can spread out across clouds. Container technology is basically a form of virtualization that isolates applications and services from each other within virtual shells all while letting them tap into the same Linux OS kernel for their resources.

For many companies as well as government agencies, it’s not just the benefits to the software development process that has them interested in containers — it’s how they can assist their operations. If containers truly are less bulky than virtual machines (Golub told me over the summer that using containers in production can lead to 20-to-80 percent lighter workloads than only using VMs), then it’s clear organizations stand to benefit from using the tech.

But you can’t simply embed containers into the architecture of your application and expect a smooth ride, especially if that application is a hit with the public and can’t afford to go down. It takes a bit of engineering work to see the benefits of containers in operations and there have been people saying that Docker has caused them more headaches than happiness.

Spotify, which has 60 million users, runs containers across its four data centers and over 5,000 production servers. While it runs containers in its live environment, Spotify had to do a little legwork to actually see some gains.

These containers will help beam Beyonce to your playlist

One of the ways the streaming-music company uses containers is to more efficiently deploy the back-end services that power the music-streaming application. With the addition of a home-grown Docker container orchestration service called Helios, the team has come up with a way to control and spin up multiple clusters of containers throughout its data centers.

Out of 57 “distinct backend services in production” that are containerized, Linander considers 20 of them as being significant. All of these containerized services share space with “more than 100 other services” churning each day, he explained.

These containers house stateless services, which basically means that these services don’t require constant updating from databases and they can be safely restarted without causing problems.

Linander said he didn’t “want to go into deep detail” on what all those services are doing, but he did explain that “view-aggregation services” are a good fit for containerization. These kinds of services are responsible for spooling the data from Spotify’s data centers that contain information pertaining to an individual’s playlist — including the name of an artist, album images, and track listings.

Spotify playlist beyonce

Bundling these services inside containers helps Spotify because instead of relying on a client that needs to send separate requests per each service to obtain the necessary information from the databases, Spotify can essentially deploy a cluster of containers that contain an aggregate of the services and thus not have to send so many requests. As a result, the application is less “heavy and bulky,” he said.

It also helps that if Spotify restarts a container it will start fresh from the last time it was spun up. That means that if something crashes, users won’t have to wait too long to see Beyonce’s mug appear on their playlists along with all of her hits.

As Spotify infrastructure engineer Rohan Singh explained during a session at last year’s Dockercon, before the company was using Docker containers, Spotify’s hardware utilization was actually low because “every physical machine used one service” even though the company has a lot of machines.

Spotify slide from Dockercon explaining its older architecture before Docker containers

Spotify slide from Dockercon explaining its older architecture before Docker containers

By running a fleet of containers on bare metal, Spotify was able to squeeze more juice out of the system because that cluster contains more than one service.

Say hello to Helios

Spotify’s Helios container orchestration framework (which the company open sourced last summer) is crucial to making sure that the deployed containers are running exactly the way Spotify wants them to run.

Right around the time Spotify first started experimenting with lightweight containerization, Docker was starting to raise eyebrows, Linander said. The Spotify team then met with Docker (Spotify is also a member of the Docker Governance Advisory Board) to discus the technology, which looked promising but at the time lacked orchestration capabilities in which the containers could be linked together and deployed in groups. It should be noted that as of early December, Docker has now made orchestration services available in its product.

Because container orchestration services weren’t really out there during the time Spotify was investigating the use of Docker, Linander said he decided “we could build something in house that could target our use case.”

For Linander, a lot of the benefits of containers come to fruition when you add an orchestration layer because that means teams can now “automate stuff at scale.”

“When you have several thousands of servers and hundreds of microservices, things become tricky,” Linander said, and so the Helios framework was created to help coordinate all those containers that carry with them the many microservices that make Spotify come alive to the user.

The framework consists of the Helios master — basically the front-end interface that resides on the server — and the Helios agents, which are pieces of software related to the Helios master that are attached to the Docker images.

Slide of Helios from a Spotify talk during Dockercon

Slide of Helios from a Spotify talk during Dockercon

Working in conjunction with the open-source Apache Zookeeper distributed configuration service, Spotify engineers can set a policy around how they want the containers to be created in the Helios master and “Zookeeper distributes the state to the helios agent” to make sure the containers are spun up correctly, said Linander.

During Dockercon, Singh explained that Helios is great at recognizing when a “container is dead” and if a person accidentally shuts down an important container, Helios can be configured to recognize these mission-critical containers and instantly load one back up.

“We just always have this guarantee that this service will be running,” Singh said last summer.

New orchestration options and new container technology

Of course, Helios is no longer the only orchestration system available as there are now several of these frameworks on the block, including Google’s Kubernetes, Amazon’s EC2 container service, the startup Giant Swarm’s microservice framework and Docker’s own similar services.

Now that there’s a host of other options, Spotify will be evaluating possible alternatives, but don’t be surprised if the company sticks with Helios. Linander said the main reason Spotify is currently using Helios is because “it is battle proven” and while other companies may be running containers in production through the use of other orchestration services, no one really knows at what scale they may be operating at.

But what about other new container technology that may give Docker a run for its money, like CoreOS and its Rocket container technology? Linander said he doesn’t have a “strong opinion” on the subject and even if Spotify sees “a bunch of potential” with new container tech, the company isn’t going to drop everything it’s doing and implement the latest container toy.

As for ClusterHQ and its Flocker container-database technology that the startup claims will let users containerize datasets all inside the Docker Hub, Linander said “It looks cool to me, personally,” but it’s still too early to tell if the the startup’s technology lives up to what it says it can deliver. Besides, he’s finding that Cassandra clusters are getting the job done just fine when it comes to storing Spotify’s data.

“We are always considering options,” said Linander. “[We are] building the best music service that ever was and will ever be.”

Mats Linander, infrastructure team lead at Spotify

Mats Linander, infrastructure team lead at Spotify

It’s clear from speaking with Linander that having a well-oiled orchestration service helps take a load off of engineers’ plates when it comes to tending to those container clusters. It seems like a lot of the ease, automation and stability of spinning up clusters of containers comes from the orchestration service that coordinates the endeavor.

However, not every company possesses the engineering skills needed to create something akin to Helios, and while the service is open source, it’s still a custom system designed for Spotify so users will have to do some tweaking to get it functional for themselves.

For 2015 to truly be the year of the production-ready container, organizations are going to have to be up-to-speed with using some sort of orchestration service and that service is going to have to scale well and last a long time without something causing it to go awry.

At this point, it’s just a question of whose orchestration technology will gain the most traction in the marketplace since most organizations will more than likely be trying out new tech rather than creating new tech, unless they are as ambitious as Spotify and other webscale companies. With the plethora of new options now available — from Kubernetes to Docker to CoreOS’s Fleet — the public’s now got a lot of choices.