Boil the kettle, data rationalisation and reduction could take a while.

UPDATE Mon 26 July: My interview with The BBC World Service ‘The World Today’ programme covering this topic aired this morning.  Click here for the full thirty minute podcast, or here for just my interview excerpt.

I thought perhaps I would begin this Weekly View with a quick experiment …now, you’ll need a kettle for this exercise …and in this context we need this to be a kettle of the electric variety … so if you don’t have one, or reading this blog where ‘lectric kettles may be a foreign concept, here’s a picture of one which will suffice for now.

Okay, ready?  Great.  Now, I’d like you to go and boil your kettle seventeen and a half times.  It’s okay, I know it takes a bit for each boil.  I’ll wait.  See you in a few minutes …or if you’re feeling generous, mine’s a PG Tips with milk and two Splendas.

Right …all done?  Great!  You’ve just expended as many greenhouse gases as you would by sending an email with a 4.7 megabyte attachment.

That’s right, campers … boiling your kettle 17.4 times consumes as many resources (electricity, water, and the like) and produces as much greenhouse gas as sending an email with a 4.7MB attachment in a traditional IT environment.

Source: Life-cycle analysis carried out by Mark Mills of Digital Power Group, translated into kettle boilings with help from the Energy Savings Trust [UK].

Now, I know what you’re thinking as I was thinking the same thing when I first read that statistic …what?  How can this be?!

Without getting overly geeky on the topic, the short answer is that traditional IT environments tend not to be overly efficient at scale and we’ve known for quite some time that the traditional IT infrastructure …server plus storage plus network plus operating system plus application …tends to be siloed with each individual component connected physically to the others with wastage and efficiencies lost between these connections and within the physical devices themselves.

And, to be fair, traditional datacentres don’t fare much better …indeed, the datacentre industry as a whole has reached parity with the airline industry in CO2 production with 2% of all man-made CO2 comeing from computers and communications technology.

Source: Green IT: A New Industry Shock Wave by Simon Mingay [Gartner]

What’s this got to do with Data Storage and Protection?

I suppose that there is the obvious ‘think before you send emails with 4.7 meg attachments’.  I’m somewhat bemused …well, saddened really …that with the green revolution of the past ten years or so I now get emails with the tagline ‘Think before you print!’ with pretty green trees from just about everyone these days.  But what about having a tagline which gently asks the user …’Do you really need to send this, and, if so …please consider an alternative method rather than sending the attachment.’  Or, ‘Think before you send!’ for short.

Email has been the bane of many a storage administrator’s life as it has morphed from a purely text based system …remember SNDMSG, CICS, and OS/400? …to the rich text and inefficient file distribution model we find today.  Why do people attach large files to email and send them to a large distribution list?  I suppose the short answer is …it’s easy and they would argue they’ve more important things to be getting on with.  Fair enough.

But this isn’t a blog post having a whack at email and email vendors …and we should consider the fact that the ubiquity of smart phones, high definition cameras, et al mean we’ll continue to create ever larger files …indeed, we’re uploading 24 hours worth of video to YouTube every minute up from 12 hours a minute just two years ago …so how do we reduce the amount of resources we’re consuming …electricity, datacentre space, and people to run the show cost dosh you know! …and the CO2 we’re creating when we need to share files with others?

I think there are five answers which are worth considering.

1. Introduce proactive limits for users.

Let’s face facts, reactive limits with users tend not to work and/or are quickly circumvented to keep the business moving.  Telling users ‘your email mailbox cannot grow beyond 20MB or we’ll cut you off so you can’t send/receive email’ rarely works in my experience.  Rather, we need to evolve this theory to be proactive.

For example, I use a great cloud based application called Evernote.  I could write a whole post on just how great this app is …it allows me to take notes anywhere I am on my MacBook Air, iPod, Blackberry and keeps the notes and related notebooks in sync so that where ever I am, all of my notes are up to date without me having to do anything.  Brilliant.

But here’s where it gets even better …it’s free.  Provided I don’t exceed a 20MB monthly limit, of course …and therein lies the true genius in my mind.  Evernote resets my 20MB limit at the beginning of each month so, providing I don’t exceed the 20MB in a month …sorted!  This is the type of proactive limit I’m thinking of for users …we give you a limit and then count down monthly to zero.  Stay in your limits, you’re good to go …exceed them, we charge you more on a graduated basis.

2. Rethink service levels for workloads.

So what are Evernote doing with the 20MB that I created last month …it doesn’t get deleted from my syncronised notes as they remain available to me, so what gives?  To be honest, I’m not quite sure …my guess would be they move the data previously created to a lower tier of storage, such as dense 2TB SATA drives, or even archive.

To be fair, I don’t much care.  I don’t notice any performance degradation and I get to carry on consuming the service for free.

Perhaps this is the key to the answer with our users …we’ll keep your data in a highly performant architecture for one month and then demote to a lower less performant tier thereafter and reset your limit.  And we won’t ‘charge’ you unless you exceed your limit.

3. Introduce datacentre technology optimisation in the form of virtualised datacentres [VDC]s.

I’ve talked a lot about VDCs in previous posts starting with this one and many more since, so there’s no reason for me to labour the point more here other than to say that what VDCs deliver is optimisation by removing wastage as well as increasing business agility.

How much optimisation?  Chad Sakac, Vice President of VMware Strategic Alliance and the general in charge of ‘Chad’s Army’ of VCE vBlock gurus, blogged in 2009 about the potential benefits of deploying a vBlock against deploying technology silos.  An excerpt follows below:

  • 30% increase in server utilization (through pushing vSphere 4 further, and denser memory configurations)
  • 80% faster dynamic provisioning of storage and server infrastructure (through EMC Ionix UIM, coupled with template-oriented provisioning models with Cisco, VMware, and EMC)
  • 40% cost reduction in cabling (fibre / patch cords etc.) and associated labor (through extensive use of 10GbE)
  • 50% increase in server density (through everything we do together – so much it’s too long to list)
  • 200% increase in VM density (through end-to-end design elements)
  • Day to day task automation (through vCenter, UCS Manager and EMC Ionix UIM)
  • 30% less power consumption (through everything we do together)
  • Minimum of 72 VMs per KW (note that this is a very high VM/power density)

Now, I say potential benefits as, at present, these numbers have been derived from product datasheets and lab work by EMC …however we at Computacenter are looking to put more substantive quantitative analysis behind these benefits (and those of other VDC variants such as NTAP SMT, HDS UCP, ‘open VDC’) as we deploy VDCs with our customers locally in the UK.  Watch this space.

4.  Use alternative distribution tools for large attachment distribution and filesharing.

I really try not to use email as a file distribution these days, preferring instead to use cloud applications such as Dropbox to share large files with others such as internal, customers, and our vendor partners.  Now, this isn’t perfect as a) in the absence of my using encryption I wouldn’t wish to use this for sensitive corporate data, and b) it does have a ‘hard stop’ limit where I can only store 2.5GB for free with no reset limit like we have with Evernote.

But using tools such as Dropbox, uploading personal photos to Facebook instead of emailing them, if I must send an attachment trying to shrink it by converting to PDF or similar …every little helps!

That said, I accept that I’m a geek by trade and we need to find ‘easy’ ways for everyday users which replace email as a distribution system without increasing complexity.

After I’ve done that I’m planning to sort world peace, famine, and poverty.

5. Rethinking how we create data.

Only about 20% of the data we create today is ‘text’, with rich media [high def videos, pictures, etc.] representing well over 50% or more of the new data being created.

Equally, the text data we are creating is rarely just text …by the time we save it in MS Word or similar we have increased the file size with the formatting and related metadata, and many users choose to use PPT to communicate ideas such as business plans and so on …a large file type if ever there was one …and that’s without even adding any pictures, charts, or videos to the PPT.

Again, I’m not having a go at the use of PPT or MS Word …but I do believe we are going to have to begin to think about how we create data so that the data ‘footprint’ is smaller in addition to the optimisation and alternative distribution models we’ve discussed above.

Which has me thinking …it’s time for a nice cuppa before Mrs. PL needs my help setting the table for dinner with she and PL Junior …the highlight of my week!

Have a great weekend and remember your kettle the next time you send an attachment.



5 Responses to “Boil the kettle, data rationalisation and reduction could take a while.”

  1. Martin G Says:

    you should look at some of the things which happen in the world of video codecs where at times we save the same file twice, the actual video payload is identical but the vendors have managed to create completely incompatible metadata headers. Completely bonkers!

  2. Andy Johnson Says:

    Hi Matthew,

    Interesting post, but I’m rather dubious about the 17.5 kettles figure from Wired.

    I wonder what “boils of a kettle” actually means in the Wired article? Kettle full of water (how big)? Kettle containing a cup of water? Without knowing that, its impossible to calculate the power consumption.

    And without knowing how the electricity was generated its impossible to convert power consumption to CO2 emission. Different countries have different mixes of generating methods, each of which produces different amounts of CO2.


    • Matthew Yeager Says:

      Hi Andy

      Thanks for your comment and taking the time to post your feedback.

      I believe that the first time the ‘kettle boil’ statistic entered the public domain was in the issue of Wired UK in 2009 [Credit: Wired UK, July 2009, page 41] where Wired referenced Life-cycle analysis carried out by Mark Mills of Digital Power Group, translated into kettle boilings with help from the Energy Savings Trust [UK] as the source.

      I must admit that I have been unable to locate an online link to a PDF copy of the Mills report, but it does appear to have an ASIN number so presumably should be available from a library.

      It would seem, however, that the majority of the Mills paper is covered in an article in Forbes as described here.

      The Forbes article is here.

      Not having written the Wired UK article which referenced the kettle boil statistic, I noted the first sentence of the Forbes piece with some interest; “The current fuel-economy rating: about 1 pound of coal to create, package, store and move 2 megabytes of data.”

      Now, accepting that the Forbes article was written in 1999, I can only surmise that Wired UK had the Energy Savings Trust [UK] translate pounds of coal [above] into kettle boils. Again, I would stress that this is conjecture and educated assumption on my part.

      In any case, data storage can be a very complicated and, some would say, strange area of technology and I was attempting to use a stat which could succinctly bring data storage to ‘life’ for the lay reader and help them more readily understand the very real issues we are facing today as well as tomorrow with regards to data creation and the storage of that data. Indeed, these are issues that I face with my customers today and we have more than one customer in the UK seeking to reduce their datacentre power and cooling costs by implementing more efficient data storage.

      Put simply, my job and intention is only ever to help and not hinder hence my attempt to translate the complicated technical world of data creation and storage into a stat which many if not all readers could relate to.

      Hope that helps and again, thanks for the feedback.


  3. bernard bof Says:

    the 17.4 boils seems a way over the top figure:

    17.5 x 3min boils at 2kW gives 1.75 kWh, at £0.10 /kWh that’s £0.175 energy cost to send 4.7MB of data. My monthly ISP data transfer is 30GB which on that basis should cost £1117.02 in energy usage alone. I pay my ISP £25/month, who pays for the other £1092.02 of energy?

    • Matthew Yeager Says:

      Hi Bernard and thanks for commenting.

      You raise a very interesting and valid point, that of ‘shared’ infrastructures versus stand alone ‘monolithic’ infrastructures. Generally speaking, ISPs tend to be shared infrastructures, and shared infrastructures are designed to be efficient with hundreds/thousands/tens of thousands all sharing the same pooled resources of storage, compute, networking, et al. In this instance [e.g. Google mail], the 17.4 kettle boils metric might indeed be a bit high …although I would still submit that uploading a single copy of a high def video or picture at 4.7mb and then emailing the link would remain far more efficient than emailing the 4.7mb attachment to 25 of your mates, in which case you may still approach the 17.4 metric even on a shared infrastructure.

      That said, most [if not all] corporate IT infrastructures tend to be closed ‘monolithic’ infrastructures where we see a fair amount of wastage given stand alone servers, stand alone storage arrays, stand along switches/routers, often incorrectly placed hot/cool rows in the datacentre, and so on. Add this wastage and inefficiency together and the 17.4 metric is probably not far off, although your mileage may vary depending upon your corporate setup. A good analogy here would be the Victorian water delivery system we have here in London; you get your water just fine, with most people not realising that if one looks beneath the surface we lose something like 50% or more water through the system given leaks and so on and wastage varying by borough and post code.

      In any case, thanks again for reading and for your comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: