The 7 commandments for making software-defined storage a success

The history of enterprise computing is replete with hype cycles – from video servers and object databases, to AI and machine learning, fads have charmed the media and industry analysts alike for decades. One such froth is building around software-defined storage (SDS), an emerging entrant into the larger software-defined datacenter discussion. To compete against startups in the data storage domain, large incumbents are announcing SDS slideware 12 to 18 months ahead of actual solutions in order to stave off customer defections and protect their once reliable revenue streams. In turn, startups are challenging the status quo, and are hitting public companies where it hurts the most: Wall Street predictability.

In a nutshell, storage companies are attempting to exploit the fundamental principle – and associated hype – of the adjacent software-defined networking (SDN) market. That is the idea of decoupling the control plane (e.g., QoS, diagnostics, discovery) from the data plane. Unfortunately, no one has yet explained exactly what data storage problem this decoupling will solve.

All of this does not mean, however, that software-defined storage should be outright dismissed as mere marketing jargon. Unlike in the networking industry, where systems from different vendors have demonstrated interoperability to create nearly enterprise-wide fabrics, storage arrays reside in dispersed silos supporting disparate workloads. There is a genuine need to bring data management functionality into a coherent and holistic control framework. But without truly understanding what a control plane commonly means across storage array vendors, it is almost inane to apply SDN concepts to SDS vernacular.

Right now we are only scratching the surface of what SDS is capable of delivering. While time and customer experience will undoubtedly improve the industry’s grasp, there are seven crucial emerging fundamentals that will solidify the definition of software-defined storage:

Software-defined controller A storage controller must be provisionable via software orchestration. A new instance of a storage controller can be instantiated on a hypervisor just like a virtual machine — on-demand, using APIs, or within a few mouse clicks. This is only possible if the controller runs on top of the hypervisor – which is the de facto OS of the next-generation datacenter.

Zero hardware crutch A software-defined controller must be free of any proprietary hardware. That means no dependence on special-purpose technology, such as FPGAs, ASICs, non-volatile RAM, battery backup, UPS, modem, and so on. Instead, dynamic, HTTP-based tunnels should be employed rather than modems, and  inexpensive flash should be used instead of an ASIC or NVRAM.

x86-based convergence  Storage as a datacenter service must run on the same hardware as other datacenter services. Only then can it share CPU, memory, and network resources with adjacent services, including next-gen firewalls, WAN optimization controllers, application delivery controllers, and other business applications.

Virtual hardware With aforementioned x86-based hardware sharing, storage controllers can be provisioned along with virtual hardware resources (e.g., vCPU, vRAM, virtual ports, vSwitch, QoS). A storage vendor need not go back to “taping out” a new array with larger memory, faster CPUs, or faster networks. If performance problems do arise, one can simply use the hypervisor “knobs” to reserve more resources for faster storage. Additionally, software-based flexibility means that if offline compression needs to kick in at night when load on other services is low, compute resources can be dynamically assigned to the storage tier.

Factory-defined nothing  No data management feature should be factory-stitched. It should be SDS heresy to ship dual-controller arrays such that every workload gets high availability (HA). Non-persistent virtual desktops or test and development VMs, for example, do not require HA. Similarly, it’s heresy to ship an array with RAID-6, such that every workload gets erasure encoding. While read-intensive workloads embrace RAID-6, other more write-intensive workloads abhor them.

Mechanism, policy, and late binding  A corollary of factory-defined nothingness is VM-aware Everything. Every data management service is defined at a VM-level. Factory ships with mechanism, while deployment is concerned about policies. Policies should never be hardcoded in the factory. That captures the beauty of undifferentiated hardware married to software-based differentiation.

It is also the true essence of cloud computing, where policies are late-bound to software (virtual) constructs like VMs, not early bound to hardware in the factory (or to coarse-grained storage entities such as LUNs or volumes). It is preposterous to talk of software-defined anything without decoupling mechanism from policy, and without applying policies to virtual (software-based) constructs. Of course, to invoke mechanisms and to configure policies require next-generation RESTful APIs.

Active systems (liveness) Storage up until now has been a passive bit keeper of data, a glorified byte shuttler between the network and the disk. In the past decade, vendors that did anything intelligent in the background (e.g., auto-tiering) were handsomely rewarded by customers. In contrast, hardware-based storage is passive with little value add.

Software-defined storage is live and active. It is constantly reflecting on I/O and data access patterns to create system tasks that move hot data closer to compute, and cold data away from compute. Intelligence that keeps sequential workloads away from flash, and random workloads journaled on flash. SDS wakes up a sleepy storage tier, brings software to a world of hardware, and brings life to data.

SDS is a promising shift in datacenter architectures that results from separating mechanism from policypolicies that are late binding in software. These are the same concepts that encapsulate the true meaning of software-defined anything, including the datacenter itself.

Dheeraj Pandey is founder and CEO of virtualization infrastructure company Nutanix. Follow him on Twitter @trailsfootmarks.

Have an idea for a post you’d like to contribute to GigaOm? Click here for our guidelines and contact info.