back to article Whoops, my cloud's just gone titsup. Now what?

“We apologise for the disruption. We have identified the cause and are working to restore the service as quickly as possible.” Attempting to log onto your cloud service and being faced with a message like that is guaranteed to strike fear into the heart of anybody that has trusted all or just part of their company's CRM, email …

  1. Zog_but_not_the_first
    Thumb Up

    Not all bad news then

    "The outage even prevented the interactive edition of the Daily Mail - Daily Mail Plus- from appearing."

    You make it sound like a bad thing.

    1. Rich 11

      Re: Not all bad news then

      Every cloud has a silver lining.

      1. big_D Silver badge
        Coat

        Re: Not all bad news then

        Shouldn't that be "every cloud outage has a silver lining"?

  2. Dr Who

    Now a year is not exactly 365 days, but if it were then that would be 525600 minutes. At four nines that allows for an outage of 5256 minutes or 87.6 hours. SLAs calculated on an annual basis are worthless. The same service level would allow for an outage of 7.44 hours before being triggered if worked on a monthly basis, which is more reasonable.

    All of the above is of course meaningless if there's no (or trivial) compensation in the event that the service level is breached, which is the case with most SaaS offerings.

    One must not however confuse SaaS with cloud. It's quite possible to get a robust infrastructure in the cloud by using two or more infrastructure providers and installing your own business software. That's why SugarCRM is infinitely preferrable to SalesForce. You are in control be it in the cloud or on your own infrastructure.

    1. Mephistro

      "then that would be 525600 minutes"

      I think the author meant a 99.99 percent or 9,999 per 10,000. Agreed with the rest of your post, though.

      I'd also like to call fellow's commentards attention to the fact that the cloud provider is not the only possible source of downtime for the customer. Screw ups by the telcos probably will cause localized outages more often than the cloud provider does.

    2. Phil O'Sophical Silver badge

      four nines that allows for an outage of 5256 minutes

      No, 53 minutes, it's 99.99%. 5-nines is generally taken as no more than 5 minutes downtime per year, or more realistically 1 hour per 10 years, since few people install services for only a year.

      A bigger problem is that such a simple calculation only works for a total outage. What if your network is struggling due to, say, a DDoS on the cloud provider, but some traffic is getting through? Or some of your apps are running but some aren't? What number of nines does that give you, and how do you write an SLA for it?

  3. Ole Juul

    Cloud

    It's all in the name.

    1. Steve Davies 3 Silver badge
      Coat

      Re: Cloud

      It rained last night. I guess that was the Daily Flail storage washing away then..

      mines the one with a copy of the 'I' in the pocket.

  4. thomas k.
    Thumb Down

    you know you've hit the big time when

    like sharks circling the scent of blood in the water, lawyers start circling a possible source of lucre.

    1. Destroy All Monsters Silver badge

      Re: you know you've hit the big time when

      Logical fallacy.

      If lawyers start circling, you *may* have hit it big. It may also be that your destiny was to serve as a warning to others.

  5. Mr_Pitiful
    Unhappy

    SaaS

    A bit like Microsoft Azure crashing out this morning in Northern Europe then!

    I just gave up in the end, I think it's back now though!

  6. Anonymous Coward
    Anonymous Coward

    Well if it wasn't for these scrounging immigrants and gays adopting children it would never have gone down in the first place and wouldn't be causing people to spontaneously get cancer.

    For the daily mail reader this - this comment is meant to be humours and is by no means factual

    1. Ian Watkinson
      Facepalm

      "is meant to be humours"

      or even meant to be humorous?

      Guardian reader by any chance?

      1. Anonymous Coward
        Anonymous Coward

        can you even still buy the Guardian?

      2. Anonymous Coward
        Anonymous Coward

        Damn can't spell. Thanks.

        And I am more of a Telegraph man ta. The mail gives people who sit at the right of centre a bad name however.

        1. Destroy All Monsters Silver badge

          Putin is getting it in both. American-style bi-partisanship?

          Then Cameroon comes out like an idiot from hell and is telling tall tales about WWI being about "The Freedoms" but I digress.

    2. MyffyW Silver badge

      If it wasn't for...

      You forgot to mention single mothers. And any story probably needs a Princess Di (or Kate) angle.

      I once bought the Mail on Sunday for a free CD. I felt dirty afterwards though.

  7. Pascal Monett Silver badge

    Comparing a cloud outage to a power cut ? Really ?

    That's like comparing theft to copyright infringement.

    If your problem is power, then what you need is a backup diesel generator (or however many are required to cover your needs). Insert it into the grid, fill it up, put it on standby and you're done, apart from the regular maintenance and trial runs. Frankly, apart from the cost, this is a no-brainer operation (and yet, some still manage to fudge it up anyway).

    That is peanuts in price and hassle compared to a cloud outage. Even if you do go for a backup cloud operator (and we're talking big budget operations right there), there will be a boatload of problems to deal with on the spot when (not if) it happens.

    There are internal procedures to devise, which will need to be amended after the first live-fire event (because there's always some difficulty that was not taken into account).

    There is (company) user training, because said procedures need to be understood and implemented in an urgent situation. There is proper warning and communications, because the switch cannot be made before it can be, and (company) users switching manually on their own willy-nilly is going to create its own special brand of havoc.

    There is monitoring that the switch has taken place and that operations are once again in a working state. What are the metrics ? How to measure them in a time of crisis ? How to ensure that all required functions have been taken into account ?

    Finally, there is recovering from the outage, and the decisions that need to be taken - mainly do we switch back again, or do we only switch when this cloud fails ? After the first live-fire event, maybe previous policy decisions will be reviewed in light of performance before and after the switch.

    Then there will be the accounting fallout, because all of this hoopla will be quantified and cost-assigned, and the next board meeting will be a live-fire event of its own.

    No, comparing with a power cut doesn't even begin to do this kind of thing justice. It is a very poor comparison.

    1. Swiss Anton

      Re: Comparing a cloud outage to a power cut ? Really ?

      And unlike a cloud outage, the power will eventually be restored. In the event of the power not being restored, the lack of any computer serive will be the least of our worries.

  8. Hargrove

    Count on it . . .

    Despite service providers pushing the reliability of their services, outages are a very likely reality for those using cloud services.

    First, there is something called the law of large numbers. Massively parallel systems at state of the art computing centres run to hundreds of thousands to millions of microprocessor cores. Even more astronomical numbers are being discussed for data centers where the goal is capacity to do lots of jobs as opposed to raw throughput.

    The presumption of solid state reliability can be seriously questioned.

    The state of the art has change dramatically since the term “solid state reliability” became common. Transistor feature sizes and component densities have all changed radically. New materials have introduced new failure mechanisms. These have been well-understood for years:

    ITRS http://www.itrs.net/Links/2005itrs/Linked%20Files/2005Files/PIDS/4377atr.pdf

    Critical Reliability Challenges for The International Technology Roadmap for Semiconductors (ITRS)

    Since then, restrictions on hazardous substances have added a new failure mechanism. Among the unintended consequences of this initiative is the spontaneous crystal formation tin of “whiskers”, that eventually short to some other part of the circuit causing failures.

    Bottom line: state-of-the-art microprocessors run 24 x 7 are going to have a limited life. Credible speculation is that this could be as short as a few years. And nobody appears to be seriously thinking about the cost of end-of-life replacement.

    The issue is not the probability that there will be a catastrophic meltdown of data centers. The problem is manageable with existing technology if cost to the customer is no option.

    The critical issue is that a small handful of large companies are effectively moving to limit the average customers’ options to reliance on large IT services companies all their information management needs.

    And then, there's bandwidth . . . a subject for another post.

  9. Hargrove

    Money is an object

    Large data centers cost hundreds of millions to billions to construct. At the moment the Cloud has to compete with local alternatives. . . which include my ability to buy a hard drive for more terabytes of data than I can envision using for a few hundred dollars.

    This going to make redundancy as a solution to reliability issues a touch challenge. I'm not at all sanguine that at a half billion a pop, industry is going to build excess unused capacity.

    Unless, of course, they can contrive to create a virtual monopoly and dependence where they can demand what the traffic will bear.

    And then, there's the bandwidth . . .

  10. Hargrove

    About that bandwidth

    There is no such thing as a free lunch. The notion of achieving reliability in a flexible cloud is all well and good. There are two problems . . . first, the use of a flexible cloud presumes the existence of redundant unused capacity. Second, it presumes the ability to transfer petabytes of data.

    As a fellow commentard wisely noted, the telecom companies have a dog in this fight. Like the data centers, they are in business to make money. They cannot be expected to build large amounts of excess capacity. Unless, of course, they can charge for it.

    Bottom Line: There are four major stakeholders in this issue: The folks building the large data centers; the telecom companies; the government (for whom the infrastructure is strategically vital); and the customer. All but the customer have a strong vested interests in forcing the customer to use and pay for Cloud Computing services.

    Finally, about that bandwidth: Shannon's "law" is still alive and well. Many of us have had the experience of getting on the Wifi connection at a hotel, only to watch the number of bars shrink as more guests arrive and log on until eventually, only the guests closest to the Wifi transmitter have the signal to noise to get any quality of service.

    Now Imagine that on a global scale.

  11. Stretch

    wow. so you need DR. thanks for that.

    i'll let the 1960s know you have caught up with their ideas.

    1. P. Lee

      No need to be sarky. :)

      Some people think all the tech in the cloud is redundent, therefore you don't need a DR site.

      They don't always know that its rather like using RAID5 instead of a backup.

      The problem is the cloud doesn't scale cheaply. When you push the limits of tech, things get expensive. When you add a third party, things get expensive. When you need serious uptime, things get expensive. When you put all your eggs in one basket, outages become expensive.

      A third party has no interest in the value of your application uptime. Therefore, the (cost of) tech used is only really going to be vaguely appropriate.

  12. JimC

    Compensation...

    Compensation is wonderful. Not only does it do exactly nothing to get your users working again and off your back, but also it means that your vendor is concentrating on minimising the payments rather than getting you

    you operational again ASAP

    ASAP.

  13. channel extended
    Happy

    That's not a Cloud it's just Smoke.

    So if a Cloud provider goes down their systems were really just smoke and mirrors. You just got smoked. The mirrors are for the fake extra storage/business.

    Put two mirrors face to face see space expand!!!

  14. W. Anderson

    unrealistic expectations for tec reliability from commercial media firms

    Many world citizens assume that large and recognizable corporations like Adobe will surely employ the best Cyber Security and reliability technologies available. This is certainly no so, since Adobe has no history or experience what-so-ever in Internet networking, Computer security, high availabiliy and reliability and therefore probably give less priotity to such matters which are then automatically reflected in whatever technology reliability and security solutions is engaged.

    Don't forget, Adobe is a retail grapgics technology firm, nothing more, irrespective of their wealth. Examine their rens od dozens of Adobe Flash fixes just in the past two to three years.

  15. HKmk23

    Head in the clouds....

    Anyone who puts critical data/performance needs into someone else's basket will get what they deserve eventually - nothing!

    Bean-counters latest version of Citrix el al.....

  16. NJobs

    Cloudy Future? I don't think so, at least not in the next 2 years...

    The only people predicting a rapid take up of the Cloud over the next 2 years are the vendors whom want to give you the impression that Cloud is taking over the world - the truth is quite the opposite. The only winners with for instance Microsoft's Cloud products remain the vendor and the partners / resellers receiving greater incentives. Most businesses (small, medium and large) face increased costs with Cloud over the period of contractual period (compared to perpetual volume) and I would be amazed if Microsoft achieved even 20% Cloud revenue by 2016 given how slow Enterprise customers have been in taking up Office 365 and Azure to date. And you cannot blame businesses for being sceptical - outages and increased costs are just a couple of issues to grapple with - would you want the NSA spying on your company's data?

  17. Rberns

    So I need an on premise standby to cater for when the cloud, which saves me from on premise data centre, fails?

  18. Tim Bates

    Cloudy places have their uses.

    "Cloud" services are good when they're things you don't need live 100% of the time. Like overnight backups. As long as it works 99% of the time, it's not a big deal.

    But for anything you need instant access to at random times, the cloud is not it. Think about how many things can break between your keyboard and the cloud provider's hard disks. Add in the number of people who can cock up a config or damage equipment between you and the cloud provider, and the whole deal looks really stupid.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like