back to article Microsoft's Azure cloud down and out for 8 hours

Microsoft's cloudy platform, Windows Azure, is experiencing a major outage: at the time of writing, its service management system had been down for about seven hours worldwide. A customer described the problem to The Register as an "admin nightmare" and said they couldn't understand how such an important system could go down …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    Another failure, along with Office 363

    1. Spiracle
      FAIL

      Leap year

      Hmmm... cert error on Feb 29th? Should have been Office 366.

    2. Bob Vistakin
      FAIL

      Go Google!

      Right, we've played it the usual way where for unknown reasons microsoft mysteriously win yet more major governmental work. Now we've all suffered the inevitable car wreck the Nokia rapers have inflicted on everyone, it's time to do it properly.

  2. Phil O'Sophical Silver badge
    FAIL

    Leap years

    Don't you just love them.

    I've also heard of at leat one Freeview TV recorder that wouldn't let people put items into the recording planner for today...

    1. John Riddoch
      FAIL

      Re: Leap years

      Seriously? You'd think people might realise leap years exist, they've only been around for a few centuries after all... The confusion around 2000 being a leap year was vaguely understandable as the rules about centuries are marginally more obscure, but even then...

      1. Anonymous Coward
        Anonymous Coward

        Re: Re: Leap years

        Don't forget that Microsoft have their own calculations for leap years which probably just adds to the confusion

        1. BristolBachelor Gold badge

          MS Calculations

          I remember that MS have/had their own way or working out which is the last Sunday in October. It's the 4th Sunday. It always is (except for those Octobers with 5 Sundays...). Cue the time being different between desktop wondows machines and back-end servers causing all sorts of headaches.

          1. SYNTAX__ERROR
            Facepalm

            Re: MS Calculations @ BristolBachelor

            Hi, welcome to the discussion. Did you read the comment you replied to?

            We're talking about leap years, not Daylight Savings Time.

            1. BristolBachelor Gold badge
              Facepalm

              MS Calculations @ Syntax error

              Yeah I read the comment. It said something about MS calculating Leap Years using a non-standard method that is wrong, hence their problem.

              Then I made a comment that MS also decided that they wanted to make another calculation a different way to the standard, so that too is wrong sometimes (causing problems with British Summer Time).

              I'm not an Apple fanboi so I don't see what's wrong with poking fun at MS inability to calculate thing correctly (so we also shouldn't forget their spreadsheet program which has also had problems calculating in the past!)

        2. Anonymous Coward
          Anonymous Coward

          Re: Re: Re: Leap years

          Probably the same calculations they've been using for sales of the Lumia.

    2. Test Man
      FAIL

      Re: Leap years

      Yeah, my Digital Stream has decided to "lose" scheduled recordings for today and tomorrow.

      I'm afraid of switching on my PS3 after the last debacle 2 years ago.

  3. TeeCee Gold badge
    Facepalm

    Cloud Computing.

    = One big basket for all your eggs........

    1. Joe Drunk
      Mushroom

      Re: Cloud Computing.

      Wholeheartedly agree. I am absolutely NOT sold on this marketting gimmick called 'Cloud Computing' based on outages reported by major providers besides MS.

      As with all IT marketting campaigns, this looks really good in the Powerpoint presentations and balance sheets but in practice is a business continuity disaster.

      1. Darryl

        Re: Re: Cloud Computing.

        Yep, another ringing endorsement for cloud computing.

        Funny, my servers in the room next door haven't had any problems with Feb 29 (or whatever this will end up being blamed on.)

  4. Anonymous Coward
    Anonymous Coward

    Not so much a Cloud...

    ... as a long rasping, nasty smelling fart.

  5. Anonymous Coward
    Anonymous Coward

    Abort, Retry, Ignore...

    Fail!

  6. hplasm
    Happy

    Azure Screen

    Of Death.

  7. Ben Rose

    Leap year bug?

    Smells a lot like a leap year bug. I doubt many maidens would wanna propose to MS engineers today.

    1. Anonymous Coward
      Anonymous Coward

      Actually...

      ... the idea of being married to an MS engineer might not be so bad.

      Granted, there's the appalling dress sense, the immature world view, the lack of stimulating conversation, the endless teeth-curling jargon and the permanent I'm-a-middle-manager-and-I'm-your-friend fixed shit-eating grin and patronising tone of voice which might obviously put you off.

      But then again, there's lots of housekeeping money, and best of all, they work all day, all night and all weekend, so you get the house to yourself!

      1. Anonymous Coward
        Anonymous Coward

        Re: Actually...

        (Apologies to the two working women with no sense of humour who've read this so far)

        1. Anonymous Coward
          Anonymous Coward

          Re: Re: Actually...

          I'm not one for up-voting or down-voting, but you obviously missed the point. It's not that they didn't have a sense of humor. It's that you mixed acceptable "jokes" in there with unnacceptable ones. If I say a nerd has "an appalling dress sense", "teeth-curling jargon", or even "a lack of stimulating conversation", they're likely to laugh, say "yep, that about sums it up" and shrug it off. But you went too far with "immature world view" (which is just vague and meaningless enough of an insult to apply to anyone) and calling them middle-managers. No one likes to be called immature or middle-managers (especially when you already called them engineers, WTF?), thus the down-vote from two nerds. Your joke lacked consistency and the finesse to expect everyone to take it lightly and move on. And perhaps you should do just that with your two downvotes :).

          1. Anonymous Coward
            Anonymous Coward

            Re: Re: Re: Actually...

            I work with the boys from Redmond (seriously, I have yet to work with a woman from MS) on BPOS on a very regular basis and immediately recognized the poster's point. There is something different about most of the MS folks (always exceptions to the rule of course!) vs. the rest of the industry.

            It's a bit like, thinking back to the University days, if Nerds had their own Fraternity - Microsoft would be it.

            1. Michael Wojcik Silver badge

              Re: Actually...

              I've worked with a number of Microsoft developers and other technical types, and I've always found them to be pretty much like various non-Microsoft people I know in the industry. Generalizations about large groups of people are always suspect, and yours (even vague as it is) certainly doesn't match my experience.

          2. Anonymous Coward
            Anonymous Coward

            Re: Re: Re: Actually...

            Sorry, I forgot to insult tedious, windy, pedantic bores with egg in their beards - can I have a third downvote please?

      2. Anonymous Coward
        Anonymous Coward

        Re: Actually...

        "Granted, there's the appalling dress sense, ..."

        Until I hit the bit about 'manager' and 'friend' I thought you'd got them confused with Linux gimps ...

  8. Anonymous Coward
    Trollface

    They're finally becoming a /real/ cloud provider!

    Yes, this is somewhat of a troll / joke but still...

    Every major cloud provider has had some extreme outages. Amazon has been down and out for a few days, That UK clouding provider (can't remember the name) has had a downtime of more than 2 weeks. Google has had issues and got down for one day.

    And now Microsoft's cloud suffers from the same. As such my conclusion: they're finally becoming a real provider to content with ;-)

  9. Version 1.0 Silver badge
    Pint

    Sooner or later

    All clouds die ... all software crashes ... and hardware fails. It's just the way things are so quit bitching about it and go down the pub and have a beer, play scrabble, chat up the girl in the cubicle next to you, go out to lunch.

    Chances are it will be back up again soon and then you can get back to work - enjoy the moment.

  10. Anonymous Coward
    Childcatcher

    may i be the first

    To say [Microsoft | Apple | Google | Amazon (choose your preference) ] is a rubbish platform and if users had only gone with [ insert your other provider of choice ] their world would be rosy.

    Troll / bitch, platform fanbois rant.

    There saved all you folk who were going to have a rant at MS.

    (p.s. i don't like MS as much as the next guy but the 'my dad can beat up your dad' argument gets tiresome.)

    1. Mike Pellatt
      Facepalm

      Re: may i be the first

      If you care to read most of the comments, you'll not see bitching at a particular cloud/outsourced/managed services/whatever this year's marketing buzzword is/etc provider, but at the idea that this cloudy thingy is some sort of panacea to all your IT availability issues, and you can throw away any other business continuity solution.

      It isn't. Factor in proper BC, Due Diligence over the providers' offerings, and service failure insurance (ha!!!) and suddenly all those putative cost savings go Poof!!!

      Unless, of course, your name is Matt Asay......

  11. Tim 11
    Happy

    probably someone plugged in a USB scanner

    ... and caused a BSOD :-)

    1. Dan 55 Silver badge

      Re: probably someone plugged in a USB scanner

      I'm going for a USB memory card reader. That brought down my work machine yesterday.

      But then again it was Vista.

      1. Anonymous Coward
        Anonymous Coward

        Re: Re: probably someone plugged in a USB scanner

        Vista in 2012? Are your IT Support guys snails?!

        1. Vic

          Re: Re: Re: probably someone plugged in a USB scanner

          > Vista in 2012? Are your IT Support guys snails?!

          I've recently been contracting for a company that is rolling out a managed image to its desktops. It's been a slow start, but they're now getting there.

          The image is Vista. I met precisely zero users who were happy with it - some saw XP as being more performant, others saw Win as being a more useable choice. But no, Vista was the thing on the cards.

          I found an old box & put Fedora on it. I had quite a few users by the time I left :-)

          Vic.

          1. Michael Wojcik Silver badge

            Re: probably someone plugged in a USB scanner

            I retired my last Vista machine last summer, but only because it was a laptop, the keyboard wasn't working properly (hardware failure), and it was no longer under warranty. If it hadn't suffered the keyboard failure I'd likely still be running Vista on it. I've read hundreds of these complaints about Vista, but frankly I never found it that much worse than XP, Server 2008, or Win7. I find them all frequently annoying, when they're not working well enough to be invisible.

            Of course, my Vista installations were highly customized (security policy, group policies, etc), and I do most of my work in Cygwin bash sessions with vim and command-line tools, so perhaps I'm not the typical user.

            But certainly if I were still running Vista I wouldn't want to waste even a day upgrading to Win7. I didn't when I was running them side-by-side.

        2. Dan 55 Silver badge

          @AC - Yes

          I count myself as one of the lucky ones, I've got a Vista machine just about powerful enough to run Vista. Most people have an XP machine just about powerful enough to run XP. If the snails would let me I'd downgrade to XP, but I suppose that that'd be too fast for them.

  12. Anonymous Coward
    Anonymous Coward

    I'll save a few posts here

    This post is provided as a community service, trying to save the Reg storage and bandwidth for worthwile discussions instead of the usual server hugging cloud hate commentard reactions.

    "Never trust your data to a cloud, I need to have absolute control over my data"... yeah, as if you were using 100% in sourced data center, tape storage management, disaster recovery, or help desk services. Remember, the vast majority of leaks and security threats come from people inside your organization.

    "Their SLAs are a joke"... yeah, as if your uptime was 99.999% Oh well, this maybe is true of a few banks and financial institutions and perhaps a secret agency or two. The rest of us know that to have even the same uptime as the Amazons, Googles and Microsofts needs a disproportionate amount of investment.

    "They are not responsive when things goes wrong"... really? Have you ever asked your users how they feel about your responsiveness in the event of an incident? Do you have a dashboard providing real time status updates each time one of your self managed and self hosted apps or services is down? How has been your incident management and recovery time measured lately?

    It's all about the feeling of losing control, I know. But face it, sooner or later it will happen. In a few years time, the idea of someone providing basic services from its own hosted facilities will sound simply crazy and wasteful.

    1. Chris Miller

      Uptime

      If the service has been down for 7 hours already, they'll struggle to achieve 99.9% availability for the year. Actually, before I escaped from corporate IT, I was achieving 99.997% availability across the UK - and that was in 1996 with Win3.1/Netware4.1/mainframe (IIRC the .003% was BT losing a Kilostream link to Preston for an afternoon).

      Guaranteeing 99.999% can get expensive (and adds complexity) and is probably more than most businesses require. But it's not rocket science.

      1. NomNomNom

        Re: Uptime

        99.997% availability is like 15 minutes down in an entire year?

        1. Vic

          Re: Re: Uptime

          > 99.997% availability is like 15 minutes down in an entire year?

          Yes. Well calculated.

          Customers frequently demand "five nines" availability (99.999%) until they see the cost. But 15 minutes a year will usually give you one reboot even if you haven't got spare hardware (which you should have).

          I have a policy to reboot at least every 1000 days. Just because...

          Vic.

          1. Chris Miller

            Re: Re: Re: Uptime

            To be pedantic (and why not?), our service was based on 10 hrs a day and 6 days a week, so 99.997% is just 5 minutes. Preston was down for 4-5 hrs, but it only represented ~1% of the work force, hence the 99.998% (can't remember where the other 0.001 went). We didn't count (though we measured) individual workstation failures, since we effectively had roaming profiles and no data on the local drives, so if your PC died, you just used a spare.

            True story: PCs would occasionally behave erratically, which a reboot would cure. It turned out that there was a bug in the Netware client for Windows which was failing to release a user handle after each logoff - after 25 or so logon cycles there were no handles left. Most of our users logged on once a day, so Windows 3.1 had to run for a month without a reboot to show the problem. Try telling that to t'youth of today ... and they won't believe you.

            1. Goat Jam
              Joke

              Re: Re: Re: Re: Uptime

              Ah yes, rebooting once a month.

              Microsoft fixed that problem in Win95 by ensuring the PC crashed on day 27*, thereby enforcing a monthly reboot!

              * Actually it was 49.7 days, see icon

            2. Anonymous Coward
              Anonymous Coward

              Re: Re: Re: Re: Uptime

              Thanks to your pedantic comment, we now know that it all comes down to your definition of "uptime". If my numbers are not wrong, according to your definition (10 hrs a day, 6 days a week, 52 weeks= 3120 hours), to meet the 99.999% uptime out of those hours your entire user community could be no more than 11 minutes without service during the whole year!!!!

              And you had a location with 99,84% availability, but that was diluted in the overall user count. My feeling is that the (few) people at Preston laughed at your claims of 99.999%

              Well played on your part, and your butt was probably saved because all these Windows reboots and dead PCs did not count on your "uptime" measurement. But surely did counted on the business value of that service.

              The point is still valid, while not rocket science, true, actual, business definition of five nines is extremely expensive. Just to make it clear, to make those five nines actually valuable from a business perspective would mean in your case that no Windows 3.1 PC could be dead or need more than a couple of times per year.

              And let's be honest, most business types can go beyond wishful thinking when facing the true costs and accept that they can do with four or even three nines without too much disruption. It's all about the cost and the benefit.

              1. Chris Miller
                Happy

                Re: Re: Re: Re: Re: Uptime

                Yes, SLAs are agreed with the business and inevitably include some degree of averaging over space and time (our agreed level was 99.99% measured annually, which we comfortably exceeded). If the business had required 99.99% availability at each individual location, we 'd have arranged for multiply routed WAN links (ideally from separate suppliers, but that was difficult to achieve nationwide back then). When we'd shown the bosses the costing figures, I'm sure they would have settled for a lower guarantee.

                PCs dying was rare, but still more common that reboots. I'm not sure it's possible to guarantee 99.999% availability to an individual workstation - you'd not only need UPS but also lots of dual components. Does anyone make a desktop with dual power supplies? (If they do, I bet it's pretty expensive.) In reality, all you had to do was walk across the office and use the PC of a colleague who was away. A replacement would arrive with minutes at a big office (where spare systems were held), but at Preston you might have to wait a day for repair or a replacement to be shipped.

  13. pip25
    Coat

    I cannot even reach the linked service dashboard page right now. I guess the hotfixing did not go well...?

  14. Select * From Handle

    I use windows Azure.

    i must say its flipping fast! Our web app that runs on azure is still working so that isn't affected by this outage, must just be the management screen.

  15. Anonymous Coward
    Anonymous Coward

    Typically we apply patches and upgrades to a test server before company wide deployment. I cannot see why they cannot do the same thing with datacenters. If they bugger up, pull it off the grid and let the rest tick along.

    > Vista in 2012? Are your IT Support guys snails?!

    We still run WinXP on most of our workstations, even the shiny new ones. Many of our mission critical software vendors do not support the later versions of Windows clients yet. The virtual WinXP mode on Win7 does not work with one of our major database packages so no go on upgrades for now. Not everything is within the IT Folks control. Well, not counting the BOFH, of course.

    1. Anonymous Coward
      Anonymous Coward

      XP - you are doing well, I have just spent the day working on Windows 2000 servers.

      Some of our customers still have 13 year old Windows NT4 servers running happily on their original SCSI hard disks, HP and Quantum built them well.

  16. Janko Hrasko
    Coat

    you know...

    Clouds have weather too.

    Complex systems, etc...

  17. toadwarrior
    Trollface

    enterprise failure

    It's the cloud for enterprises. Everything needs to be big including their outages.

  18. Rob Kirton

    The Fix

    Yep, this one is eating into my day too. Evidently somebody at Microsoft is going to switch it off and back on again. Solves most problems..

    1. Anonymous Coward
      Anonymous Coward

      Maybe...

      ... the reason it's taking so long is that they're stuck on a phone to a call centre operative who has to say 'Mr. Microsoft' five times before he's allowed to end the call.

      1. Anonymous Coward
        Anonymous Coward

        Re: Maybe...

        Or, they're on the line with a call centre staffer in India and they can't understand his accented English?

    2. NomNomNom

      Re: The Fix

      there's a lot of power buttons to turn on and off in the cloud

      rough calculation:

      -5 seconds to move to the next computer and press the reset button

      -100,000 computers in the cloud (guess)

      = 83 hours!

      So I imagine they have a team of 10 or so people that explains why it took 8 hours to fix

  19. John P
    FAIL

    Yay I finally beat the odds!

    Our main internal job management app runs on Azure and is currently completely unavailable, guess we're among the lucky 3.8%.

    Still better availability than we had with our previous hosting company...at least as long as it doesn't go on too long.

    1. Ball boy Silver badge

      Re: Yay I finally beat the odds!

      Not so fast: it's 'only' the management interface that's down so your app. should still work fine.

      BTW: management is up now - 14:40 as I write this - provided you don't want to manage any Database, Datasync, Reporting or Service Bus, Access Control and Caching settings.

      So, just the Hosted Service, Storage Account & CDN or Virtual Network configs available then.

      What's the betting this resolves itself as the data centres move out of the Leap year 'danger zone'?

  20. Bob Vistakin
    FAIL

    Who on earth is suprised by this?

    Merely days after announcing the G-Cloud will be based on Microsoft garbage, it suffers the same Kiss of Death Ford experienced when they used the beasts crapware in their cars: http://goo.gl/t7FPB

    1. Ebeneser

      Re: Who on earth is suprised by this?

      bob - my thoughts entirely.

  21. IGnatius T Foobar
    Facepalm

    Of course it broke.

    Of course it broke. They're running it on Windows. DUH.

  22. This post has been deleted by its author

  23. proto-robbie
    Pirate

    That's so "yesterday"...

    ... it's got to be Raspberry Pis in the sky.

  24. b166er

    AC 29/02 13:41

    How does a database app not run in a WinXP VM?

    To all intents and purposes, it's just another WinXP box.

    Glad to see Version 1.0 got more upvotes than down for that!

  25. Ted Treen
    Facepalm

    You're all wrong...

    It's not a fault - it's a new security feature.

    A totally inaccessible service is a secure service.

    Simples.

  26. N2

    Perhaps

    Microsoft have re-written the standard to define a leap year?

    & got that wrong as well.

  27. PeterM42
    FAIL

    I hate to say "I told you so"

    No, actually, I LOVE to say "I told you so" - clouds blow on the wind. How can anybody run a business on that basis. NOT ME for one.

    It's the same with outsourcing to some country where they cannot even speak to your users without communication problems. Been there, experienced that, thank you very much.

    FAIL!!!

    FAIL!!!

    FAIL!!!

    FAIL!!!

    FAIL!!!

    FAIL!!!

    1. Michael Wojcik Silver badge

      Re: I hate to say "I told you so"

      If only we had any historical data to refer to - like, say, all the businesses that used mainframe service bureaus for decades.

      Kids, lawn, etc.

  28. John P

    @ball boy - It wasn't just the management interface, the compute side was playing silly buggers all day too. While our app was technically running, it was unavailable to any external traffic from midday (when I checked) to about 9pm last night.

  29. Drummer Boy
    Facepalm

    Does anyone else get the Private Cloud from MS ad in this comment section as well??!!!

  30. savvy
    Thumb Up

    Cloud Commerce System in 10 data centers

    You know I get criticised for overbuilding my network and hosting topology through the software layers and this is the reason why I spend so much, so we can deliver the highest SLA's on the market. At Savtira we have 10 data centers which are distributed throughout the USA/EMEA and they are load balanced at the DNS level and go as deep as checking to ensure the SSL is responding correctly before sending consumers to a replicated server. Its self-healing, easy to maintain since you can just take a entry out of DNS to do maintenance on a entire data center and in the event of a problem it has auto-fail-over to the next closest data center... we just sent out our SLA's (I created first ever network/hosting SLA when I founded Savvis)

    http://www.savtira.com/press_release.php?n=84

This topic is closed for new posts.

Other stories you might like