Last year, while reading Tom Trainer’s GigaOM Pro report “The Future of Data Center Storage,” it became clear to me that data deduplication (dedupe) technology was emerging as a green storage must-have. Today, efforts to cement its place in the enterprise data centers are in full swing.
As its name implies, data deduplication reduces storage overhead — typically via software or hardware appliances — by eliminating identical copies of the same data while maintaining “pointers” to that data so that applications can still access it. As a result, fewer storage systems are required, which reduces energy and cooling costs. The technology helped spark a bidding war for Data Domain last year that was eventually won by data storage giant EMC, who beat out NetApp’s offer for one of biggest deals in IT last year, valued at $2.1 billion.
Despite the high drama (at least what passes for high drama in IT) the entire episode was just a precursor of things to come. First, let’s take a look at last year’s storage market. Back then, I wrote the following over at Earth2Tech:
…companies today are missing out on the technology’s full potential because vendors don’t implement de-duplication technology across all of their storage resources. However, EMC’s recent acquisition of Data Domain will help proliferate deduplication tech because their combined technologies will cover more parts of the storage value chain, according to Trainer.
Fast-forward a year, and now IT firms are rushing to strengthen every single link in the storage value chain. These firms provide data center administrators with the tools to optimize more parts of their growing storage infrastructures while keeping their power, cooling and hardware acquisition costs in check. Among them is Permabit, which Derrick Harris spotlighted in his Infrastructure Overview for Q2 2010 (page 43).
The Cambridge, Mass.-based company’s new Albireo product tackles one of dedupe’s lingering weaknesses: primary data. As Derrick points out, “Deduplication is popular for backup storage, but is widely considered too CPU-intensive and non-scalable for primary data.” If Albireo lives up to its promise of high-speed optimization of primary data, which typically resides on pricey, high-powered storage systems, it stands a good chance of becoming a green storage staple.
Dell this week did its part to keep the dedupe buzz going by snapping up Ocarina Networks for an undisclosed sum. Key to the deal is Ocarina’s content-aware technology that can analyze files and “choose the best way to compress and dedupe the data from over 100 possible algorithms.” This sort of automated, self-optimizing product is exactly what storage vendors need to win over IT execs that find themselves lording over expansive and complex storage infrastructures.
Finally, Nimbus Data, a maker of high performance storage arrays comprised of solid-state drives, is touting integrated dedupe in its new S1000 model right alongside its new longer-lived, low-power enterprise SSDs. This proves that data dedupe is transitioning from niche storage technology to spec sheet bulletpoint territory.
Where does this leave the storage industry? Established players like EMC already got the message: Dedupe or risk losing those big enterprise storage contracts to rivals. For storage startups and smaller outfits, recent history indicates that having data deduplication in your IP portfolio makes you an attractive acquisition target. In eWeek’s recent gallery — or shopping list, for our purposes — of “under the radar” storage firms, Exar and the aforementioned Nimbus Data stand out as dedupe specialists now that Ocarina is off the table. (Nimble Storage, founded by former Data Domain engineers, sells arrays that have dedupe-like attributes, but the company stresses that it’s a form of compression and not dedupe.)
Bottom line: It’s a good time to be in the dedupe biz.