Last week I was at NetApp Insight and it was confirmed, not that it was necessary, that S3 protocol is the key to connecting and integrating different storage tiers. You know, I’ve been talking about the need of a two tier architecture for modern storage infrastructures for a long time now (here a recent paper on the topic) and I also have strong opinions about object storage and its advantages.
The missing link
The main reasons for having two storage tiers is cost and efficiency. $/GB and $/IOPS (or better $/Latency today) are the most important metrics while, on the other hand, efficiency in terms of local and distributed performance (explained here in detail) are other fundamental factors. All the rest is taken for granted.
The challenges come when you have to move data around. In fact, data mobility is a key factor in achieving high levels of efficiency and storage utilization contributing, again, to lower $/GB. But, primary and secondary storage use different protocols for data access and this makes it quite difficult to have a common and consistent mechanism to move data seamlessly in the front-end.
Some solutions, like Cohesity for example, are quite good in managing data consolidation and its re-utilization by leveraging data protection mechanisms… but it means adding additional hardware and software to your infrastructure, which is not always possible either because of cost or complexity.
S3 to the rescue
It seems several vendors are finally discovering the power of object storage and the simplicity of RESTful-based APIs. In fact, the list of primary storage systems adopting S3 to move (cold) data to on-premises object stores or to the public cloud is quickly growing.
Tintri and Tegile have recently joined Solidfire and DDN in coming up with some interesting solutions in this space, and NetApp previewed its Fabric Pools at Insight. I’m sure I’ve left someone out, but it should give you an idea of what is happening.
The protocol of choice is always the same (S3) for multiple and obvious reasons, while the object store in the back-end can be on-premises or on the public cloud.
Thanks to this approach the infrastructure remains simple, with your primary storage serving front-end applications while internal schedulers and specific APIs are devoted to supporting automated “to-the-cloud” tiering mechanisms. It’s as easy as it sounds!
Depending on each specific implementation, S3 allows to off-load old snapshots and clones from primary storage, make a copy of data volumes in the cloud for backup or DR, automated tiering, and so on. We are just at the beginning, and the number of possible applications is very high.
Closing the circle
It is pretty much clear that we are going to see more and more object-based back-ends for all sorts of storage infrastructures with the object store serving all secondary needs, no matter where the data comes from. And in this case we are not talking about Hyper-scale customers!
In the SME it will primarily be cloud storage, even though many object storage vendors are refocusing their efforts to offer options to this kind of customer: the list is long but I can mention Scality with its S3 Server, OpenIO with it’s incredible easy of use, NooBaa with its clever “shadow storage” approach, Ceph and many others. All of them have the ability to start small, with decent performance, and grow quickly when needed. Freemium license tiers (or open source versions of the products) are available, easing the installation on cheap (maybe old and used) x86 servers and minimizing adoption risks.
In large enterprises object storage is now part of most private cloud initiatives, but it is also seen as a stand-alone convenient storage system for many different applications (backup, sync&share, big data archives, remote NAS consolidation, etc).
Originally posted on Juku.it