Who Will Cache in on Cloud Storage?

As data moves into the cloud, storage companies are taking advantage of virtualization and adding more memory to the data center. Techniques such as storage virtualization can improve the usage of existing storage hardware and make provisioning easier, while adding memory to the data center can make accessing information faster. [digg=http://digg.com/tech_news/Who_Will_Cache_in_on_Cloud_Storage]

Many companies are evaluating their use of memory in the data center as they try to strike a balance between easily accessible cache memory powered by flash and slower-to-access disk memory powered by hard drives. At the same time, they’re trying to make their storage easier to provision and more reliable by looking at some form of virtualization. Both trends will change the dynamic for large storage vendors in the years to come.

As you move along the storage technology continuum, you’re trading price for speed. Getting information stored on tape, which is cheap, can take hours or days while accessing something on flash, which costs a pretty penny, takes microseconds. Plus, solid-state drives using flash can’t possibly store all of the data people are creating. There’s also the question of how reliable it is.

Given this, most companies requiring huge storage arrays rely on expensive machines from the likes of EMC or HP. Or they make their own “storage cloud” using commodity disk drives and a proprietary layer of software. By allowing companies to allocate and provision the storage in a software layer, it virtualizes the storage array. It’s essentially the same model that underpins the storage services offered by Amazon S3 and Nirvanix.

Meanwhile, tier-one storage equipment vendors companies such as EMC, IBM and HP have recognized that cloud storage is the future of computing, and are attempting to ride that wave without cannibalizing their high-margin box business. For example, EMC is offering services for SMBs through its Mozy acquisition. IBM last year purchased XIV, which makes the software that can be used to virtualize storage. Large companies such as NetApp and 3Par are attempting virtualize storage as well.

But once the cloud is in place, there’s still the issue of calling up data and delivering it relatively quickly. For certain applications, such as those requiring instantaneous access to large quantities of data like seismic graphing or historical financial analysis, cloud storage may never replace a spinning drive connected to a sever via Fibre Channel.

But for many applications, including media delivery and most application delivery, tweaking storage for the cloud means adding faster cache memory or optimizing the storage infrastructure by geographic location. Nirvanix, the startup providing hosted storage in competition with Amazon’s S3, touts its multiple storage clusters as a way to deliver faster access to stored content. It’s also looking to provide nodes on the customer premise called “NAS heads” that will basically allow for frequently called up “hot data” to be stored there.

Alternatively, or possibly in conjunction with such a setup, a customer interested in amping up the speed of cloud storage might buy equipment from startups providing different levels of cache to aid in hasty data retrieval. We’ve covered some before, such as Atrato, which actually offers a box of disks attached to a controller that runs software designed to access and configure the hundreds of spinning disks. The result is the reliability of spinning disks with a faster information retrieval speed. Others that rely strictly on intelligently routing needed data to cache included Gear6 and Xiotech Corp.

Storage being served via the cloud is a forgone conclusion. It only remains to be seen if a startup like Nirvanix can grow to compete with the big players in storage or hosted computing, and how the larger storage vendors will walk the line of creating cloud products without jeopardizing their hardware business.

A far more interesting trend to watch will be how the growing amount of stored data is kept and delivered in the fastest amount of time. For proof that storage is relevant check out Facebook’s hardware. A little more than 8% of their servers are devoted to the distributed caching system, memcached. The entire purpose of those servers is to speed delivery of information for the social network. In this age of instant gratification, we may find that cache is king.