With Flocker, ClusterHQ wants to bring data management to Docker

ClusterHQ, a big data startup that wants to make a name for itself atop the ever-popular Docker ecosystem, released an open source data manager called Flocker on Wednesday. It’s designed to deal with the headache created when moving and updating large datasets hosted in Docker across different cloud services or their own internal data centers.

While Docker’s container management system has been helpful for developers who want to load up their application source code in containers so they can be deployed to bare-metal machines or to various cloud service providers, Docker’s not quite ready for prime time when it comes to doing the same for big databases like MongoDB, according to Michael Ferranti, ClusterHQ’s vice president of marketing. Containerizing datasets is important for developers who want to build out more complex data-intensive applications that rely on relational databases or NoSQL; if one could link together a containerized dataset to a containerized application, the two could speak to each other while also being able to be moved around and deployed together to various Linux environments.

As of now, the problem with housing data sets in Docker containers is that it’s a huge chore to have to do operational upkeep on the containerized databases. For example, if a new version of MongoDB comes out and you want to upgrade your older version that’s now trapped inside multiple containers, it’s a big hassle to have to do so because as of now, there hasn’t been a tool available on Docker to ease the burden.

Flocker architecture

Flocker architecture

At the heart of Flocker is the Sun Microsystems-designed Zettabyte file system (ZFS), which made it much easier for engineers to replicate data and create backups, said ClusterHQ co-founder and vice president of engineering Rob Haswell. With the ZFS, Flocker can monitor a containerized database and keep track of when changes occur; once a change like a database update happens, the ZFS can automatically reproduce the change across other containers that house datasets.

Currently, ClusterHQ’s data management system can’t be tied together with some of the new container orchestration services such as Fig or Panamix, which help spin up multiple containers containing application source code. This limits its utility because while Flocker can ensure that containerized databases are being updated, these databases can’t be hooked up to work in tandem with the spun-up application containers in an automatic fashion for heavy duty workloads.

The UK-based startup hopes to eventually integrate with these services and plans are in the works to launch a commercial support model around the open source Flocker tool; something Docker is working on for its own open source platform.

Post and thumbnail images courtesy of Shutterstock user voyager624.