The VMotion Myth

Among the many innovations that virtualization has brought to the data center is server mobility, or the ability to live-migrate virtual machines (VMs) across physical servers. With it comes a marketing story that dynamically moving VMs inside a single data center or between two data centers is a seamless process. While at some point that will undoubtedly be true, it’s far from an operational reality today. In the meantime, there are numerous opportunities for startups to offer solutions that will help make such seamlessness a reality.

Currently, moving a VM from a one physical machine to another has two important constraints. First, both machines must share the same storage back end, typically a Fibre Channel/iSCSI SAN or network-attached storage. Second, the physical machines must reside in the same VLAN or subnet. This means that inside a single data center, one can only move a VM across a relatively small number of physical machines. Not exactly what the marketing guys would have you believe.

While this might be a small inconvenience to an enterprise data center, think about it from the perspective of a cloud provider. Their entire reason for existence is to maximize revenue earned from every dollar spent on servers. If the network limits their server utilization by constraining VM mobility, then it’s costing them revenue. Maybe they should try asking for that money back from their current networking vendor?

Many networking vendors talk about the flattening of the data center network as a cure-all, and would advise simply building a very large VLAN. But such an approach has a host of problems that have been around for years and are why routers are deployed in the first place: limiting multicast traffic, Spanning trees issues with multipathing, and of course the fact that network operations staff use VLANs for very real purposes like segmenting traffic for security, compliance, PCI, etc. These and a number of other limitations of current data center networks are on the radar of vendors and standards bodies; meanwhile, the research community has responded with lots of papers with funny names like VL2 (PDF) and PortLand (PDF), among others. What we need now are startups to come up with some creative, and practical, solutions to these problems. For if networking continues to play second fiddle to compute and storage, the cloud vision will never be fully realized.

And while server mobility inside a single data center is tough, inter-datacenter and data center-to-the-cloud server mobility is even tougher. Cisco and VMware have published (PDF) papers about it a few times, but the solutions they’re proposing seem more like hero experiments than practical solutions at this point.

For long-distance applications storage becomes the big problem, which EMC highlighted just last month with the release of VPLEX (GigaOM Pro, sub req’d). It turns out that replicating VM data across the WAN is doable, but really expensive and even more bandwidth-consumptive. Also, if you’re using Fibre Channel there are distance limitations due to the FC protocol. Oh, and with the various flavors of storage over IP, you’d better not have any packet loss. On the other hand, you could chose to move just the VM state while keeping storage in the original data center, but that will impact application performance. In other words, when all is said and done, getting this to work is extremely complicated, and probably only feasible if money is no object.

The typical use cases cited for VMware’s LD VMotion are for disaster recovery and avoidance, workload balancing, etc. The concept of moving from disaster recovery to disaster avoidance is a compelling shift and can add additional layers of reliability on top of existing features from VMware, including High Availability and Fault Tolerance. There seems to be a clear need for inter-data center and private-to-public cloud migration. And those use cases –- which I believe are set to take off –- should be the real call to arms to make such a capability operationally efficient and much less expensive.

Server mobility is a powerful tool in the modern data center, but it currently has numerous limitations. In these limitations I see opportunity for startups, especially when it comes to fixing the networking issues and helping to decrease the storage costs. Let me know if you do, too.

To hear more about server mobility and similar topics, attend Structure on June 23 & 24 in San Francisco.

Alex Benik is a principal at Battery Ventures