back to article Microsoft Azure was most FAIL-FILLED cloud of 2014

Microsoft’s cloud had the worst service reliability of the three main players in 2014, according to annual metrics from uptime experts CloudHarmony. Microsoft Azure Virtual Machines suffered 103 outages for the year, resulting in downtime of 42.94 hours. The Azure Object Storage service was offline 138 times, leaving users in …

  1. AMBxx Silver badge

    99.9%?

    I'm sorry, but that's just not good enough - there should be no single point of failure. The major outages have been due to some really dumb system design,.

    Moving my website to Azure was a really bad mistake.

    1. Anonymous Coward
      Anonymous Coward

      Re: 99.9%?

      Downvoted for not having the sense it would be a bad idea before you did it.

  2. Anonymous Coward
    Anonymous Coward

    "Microsoft Azure was most FAIL-FILLED cloud"

    Full of Apple stuff isn't it? ......

  3. Anonymous Coward
    Anonymous Coward

    A downtime of 42.94 hours

    That's Microsoft's downtime and doesn't include the overhead that customers would suffer getting their systems recovered and the associated business outages. On average, Microsoft's cloud infrastructure was down some time in roughly 1 day of every 3 last year. That's far worse.

    1. Anonymous Coward
      Anonymous Coward

      Re: A downtime of 42.94 hours

      "A downtime of 42.94 hours "

      Isn't that an uptime of 99.5% anyway or am I missing something here ?

  4. breakfast Silver badge
    Facepalm

    Another point of view...

    Alternatively, if you went by the "Current Status" message for the Azure service it ran perfectly the whole time.

    Really annoying during a massive outage with services unavailable and customers on the phone to go to the status page and see "Everything is running just fine" staring back at you like a big, massive, lie..

  5. Anonymous Coward
    Facepalm

    Where's the news?

    In my 25 years experience of using Microsoft products, I would never refer upon them as being reliable.

    My response to those who think otherwise: "Hi, you're the new guy? Mine's black with no sugar, please."

  6. John Sager

    five nines

    99.999% - That's a generally accepted uptime target. Some industries need/want six nines, i.e. 31 secs per year, and when I was back working on design of UK infrastructure I reckoned we needed more like 7 nines with any failures being in the 1 in many years category. That costs, of course...

    1. dan1980

      Re: five nines

      @John Sager

      RE: 5, 6 and 7 'nines' . . .

      Sure, but to do that you would rarely rely on a single service. To achieve that kind of uptime, you must assume that everything WILL fail and put contingencies in place.

      So, you would not be storing data just with Azure or running the application only on EC2 - you would have multiple copies in multiple locations with multiple providers, managed by a redundant, geographically-distributed system.

      Putting everything on the cloud of one provider is always going to entail downtime. Choosing geographically-distributed hosting within that providers system adds some resilience but it's still possible (as we saw a few times this year) for problem to affect multiple - or even all - locations.

      I would argue that any hyper-resilient system that did not make use of some kind of cloud-based service as part of the design would be a rather rare thing and only really justified in areas where privacy is very important. (And that's totally valid, of course.)

      1. Anonymous Coward
        Pint

        Re: five nines

        It always comes down to the constraints applied to a system in engineering (phrased as scarcities in economics). Relying on one provider for anything is frequently lunacy yet I see that happen again and again due to the supposed reliability of the provider. Definitely supposed as there isn't much of a track record developed and the technologies used change faster that the reliability data can be developed. Bridges are far easier and we still run into odd failure modes. The only real observation that I have to add to this is that far too often we are seeing hidden single points of failure [everyone use a common leased fibre] but I've yet to see anyone pointing to how dangerous any particular dependencies [Heartbleed, language, distro/hypervisor...]. That'll happen, the sheer complexity constrains the ability to identify all the failure modes.

        It's Friday in your land...

  7. Daz555

    Were ALL VMs down for 42hours or just SOME down for 42hours? I really need to know an equivalent of "weighted business impact" because pure failure stats mean nothing in terms of business impact.

    1. Anonymous Coward
      Anonymous Coward

      OK let's look at it another way: Azure's infrastructure is the smallest, yet it had the most outages.

  8. Kanhef
    FAIL

    Math fail

    There are 8,760 hours in a typical year, give or take a few. By some fairly basic calculations, the Azure uptimes should be 99.5098% and 99.8757%. I don't know how CloudHarmony came up with their numbers, but I wonder if Microsoft 'encouraged' them to use some alternative voodoo calculations so they can claim "99.9% uptime", when any service that is down for more than 8.76 hours clearly fails to meet that standard.

    1. Anonymous Coward
      Anonymous Coward

      Re: Math fail

      rounding to 1 decimal point

    2. Chemist

      Re: Math fail

      "Math fail "

      I too thought that but an AC above has actually been downvoted for spotting the same thing. Downvoted for an arithmetically accurate statement - what next ?

  9. Nate Amsden

    meanwhile in the real world

    storage uptime

    System is up and running from 2011-12-19 16:15:58 EST

    (that was when that particular system was installed, before that my company was hosted in one of the public cloud providers), through software upgrades, hardware failures etc.. of course probably no big surprise to anyone in the industry to have 3 years of 100% uptime on storage.

    What pissed me off most about being in a cloud an reliability was not the full big outages but the small fails that occurred seemingly constantly. Since moving out we haven't had to "rebuild" a VM even once in over 3 years. That was a semi regular occurrence when we were in "the cloud".

    We've had one sudden physical host failure in 3 years, I was in bed watching tv, phone went nuts, got up to walk 10 feet to my desk and by that time vmware had moved the VMs to another host and was starting back up, and the underlying HP server detected the failure and automatically restarted(first time I had personally experienced that since being a ESX customer since 2006, I didn't even know HP servers had a feature called Automatic Server Recovery). There was some additional manual recovery steps for some apps but wasn't too bad. HP diagnosed the problem as a system board issue, forget what exactly and they replaced the board the next day(4 hour support on everything but I wasn't in a big rush).

  10. Anonymous Coward
    Anonymous Coward

    Does this even matter?

    According to Microsoft's own TV advertising Azure is for their xbox games? Why would an enterprise want to use it?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like