Come on baby compress my data! [with apologies to The Doors].

UPDATE 29 July 2010 – IBM enter into a definitive agreement to acquire Storwize.

If there is one thing in the world that absolutely makes my teeth itch and I would pay just about anything not to have to do, it is packing and unpacking for extended trips.  It would seem that I am not the only one as, prior to the recession, there were companies popping up which would come to your house, pack for you, and then ship your bags to your holiday destination …where one of their representatives would unpack for you!  Probably not a sustainable business model as they’ve since disappeared, but it was an intriguing idea …and, whilst pricey, still cheaper to the live in butler I’ve always secretly wished for.

Extravagant you say?  Perhaps, but it would could help avoid the inevitable rows in Case PL as, whenever Mrs. PL and I go on hols with PL Junior, we end up having a very full and frank discussion regarding how much we need to take.

I would be more than happy to go on holiday with nothing more than a carry on.  Now that I have my geek lair at home setup such that I can access my personal data from anywhere with a WiFi connection, all I really need for a fortnight’s holiday is my wash bag, MacBook Air, iPod iTouch, Sony eBook Reader …and possibly a couple of pairs of knickers, tshirts, and shorts.  I’d of course wash them well prior to them standing up and walking on their own.  Mrs. PL rolls her eyes, notes my objections to wanting to take anything more than this …and proceeds to tell me not to be ‘ridiculous’ and get on with packing what seems to be every stitch of clothing I’ve ever owned.  And don’t get me started on what Mrs. PL ends up packing for herself and PL Junior.  Do you really need to pack clothing which you ‘might feel like wearing’?  Nor do I think it the remotest possibility that Her Highness will have selected the same resort in Malta and invite us round for high tea, thus necessitating us to pack our finest …’just in case’.  But, as with all disagreements in Casa PL, Mrs. PL humours me just long enough for me to realise that she is right, state ‘yes dear’ …and just get on with it.

In fairness, there has been a bit of a truce called on this front and a reasonable  compromise struck.  We now use vacuum bags to compress our packing and thus fit 25%-40% more than we could have otherwise.  Et voilà, Mrs. PL gets to take virtually our whole wardrobe …just in case …although the toothpaste made rather a mess when it got compressed this year.

What has this got to do with Data Storage and Protection?

Data deduplication has been a very prevalent buzz word in the storage industry for the past few years with the major vendors scrambling to introduce deduplication into their solutions through either invention or acquisition.  The IBM acquisition of Diligent in April 2008 for $200 million and the very public tussle in July 2009 between EMC and NetApp over the acquisition of Data Domain …with EMC eventually winning but at a costly $2.4 billion …are among the more interesting.

Why the rush and what would cause a $2.4 billion struggle?  Well, just as I’m not over the moon about taking everything we own on holiday and would prefer to leave the unneeded bits and bobs at home, our customers have a similar challenge as data storage requirements has continued to grow and, by extension, so to has the need to backup that data.  Problem is, not only are we storing lots of duplicate and dormant data …when we try to back it up we can see both the time to backup and the, perhaps more importantly the cost to backup …rise exponentially.  Data deduplication allows us to quickly investigate the data to be backed up at the block level …the zeroes and ones of data, essentially, as opposed to the file level, i.e. a ‘PPT’ or ‘Word’ document …and when we see a non-unique series of zeroes and ones, we can ‘drop’ them but leave a reference to where a future user can find the series of unique zeroes and ones.  With industry standard deduplication ratios of 40% …with many customers achieve much higher ratios of 60% or even 80% …data deupe can have a hugely positive impact on a customer’s backup infrastructure by significantly reducing the amount of data storage and time required to backup data.  As a technology, data dedupe has one of the quickest ROIs and demonstrable cost benefits …great for us as we use our equation of ROI + CBA + DPB = CSS to show customers how we can save them dosh not just now, but for years to come.

But.  There’s always a but, isn’t there?  Some have openly questioned what the performance impacts would be if we then had to restore the data we have deduped.  Sometimes known as ‘rehydration’, I do think that it is indeed possible …nay, probable …that it will take a bit longer to restore deduped data as opposed to bog standard backups.  To my mind the cost benefits far outweigh any potential performance impact on restoration, so I believe that this risk can be mitigated by ensuring that our customers reset their service level agreements internally such that any added restoration time is expected and catered for.

But.  There’s that word again!  But if data deduplication is so great for backup, why wouldn’t we just go ahead and introduce dedupe into primary storage?  In other words, why stop there …why not have dedupe in our SANs and NAS?

Perhaps, although I’m not convinced this is the most appropriate way forward.  If we anticipate performance degradation when we rehydrate deduped data during data restores from backups, should we not also expect some performance impact if we introduce data dedupe into primary storage?  Yes, I think we should.  Indeed, data dedupe is effectively changing the data in that non-unique zeroes and ones are dropped and replaced by a much smaller ‘reference’ to the unique zeroes and ones so it would stand to reason that there would be some performance impact during future host access to data.

But let’s not throw the baby out with the bathwater …we could still get the ROI and CBA benefits of deduplication without changing the data.  Enter data compression for primary storage.

Just as Mrs. PL gets more packing space when we go on hols by using vacuum bags …and you get more space by using ZIP files and compression on your PC hard drive …so too can we conserve data space in primary data through compression.  Put simply, whilst data deduplication uses an algorithm to ‘drop’ non-unique zeroes and ones data compression also uses an algorithm to compress non-unique data blocks.  I think it less likely for there to be a performance degradation in using compression as we’re not ‘changing’ the data, but merely compressing it.

One of the companies I’m watching in this space is Storwize.  Storwize have data compression products which can compress data with NAS devices, and often see ratios which aren’t dissimilar to data dedupe …40% or more of duplicate data compressed, in other words.  I am expecting them to be bringing out products in the near future which will allow for compression with SAN products …imagine reducing a corporate datacentre by ⅓ or more in a non-disruptive manner and you can see why I’m so excited by the prospect of saving our customers money through data compression within primary storage and data deduplication in backup.

Have a great weekend.


Click here to contact me.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: