Microsoft's Azure cloud goes a bit wobbly in West Europe • The Register Forums

Thursday 1st May 2014 16:19 GMT hplasm

DaaS

Downtime as a Service.

Continuing Innovation from Microsoft.

32 3
Thursday 1st May 2014 16:28 GMT dogged

In before

the inevitable Bob Vistakin claim that nobody else ever has downtime.

6 1
Thursday 1st May 2014 16:30 GMT John 104

Small Scale

Still, an outage of much smaller scale than the last time around for MS and AWS..

0 0
Thursday 1st May 2014 16:54 GMT Anonymous Coward

Get it at the Windows Shop

The Azure YoYo.

8 1
Thursday 1st May 2014 17:07 GMT Proud Father

Huh?

I thought these wonderful cloud systems were supposed to be highly reliable?

Like system redundancy, fail safe, multiple copies of data etc etc so if something fails it just keeps going with the remaining working resources?

Isn't that the whole point of these cloud systems?

13 0
1. Thursday 1st May 2014 17:40 GMT Anonymous Coward
  
  Re: Huh?
  
  "I thought these wonderful cloud systems were supposed to be highly reliable?"
  
  Nope - circa 99.9% uptime is the quoted norm. Azure has historically been a bit more reliable than say Amazon S3 though.
  
  "Like system redundancy, fail safe, multiple copies of data etc etc so if something fails it just keeps going with the remaining working resources?"
  
  That's why Azure has multiple regions - so that you can create applications that are resilient to a local issue.
  
  4 8
  1. Friday 2nd May 2014 00:47 GMT Levente Szileszky
    
    Re: Huh?
    
    "Nope - circa 99.9% uptime is the quoted norm. Azure has historically been a bit more reliable than say Amazon S3 though."
    
    Ahahahaha, thanks for the afternoon laugh.
    
    7 3
  2. Friday 2nd May 2014 10:20 GMT Anonymous Coward
    
    Re: Huh?
    
    Nope - circa 99.9% uptime is the quoted norm.....
    
    And the powers that be want telephony in the cloud. HA!
    
    5 9's minimum or no go.
    
    1 0
  3. Friday 2nd May 2014 11:28 GMT Dan 55
    
    Re: Huh?
    
    "That's why Azure has multiple regions - so that you can create applications that are resilient to a local issue."
    
    Funny how the whole lot regularly dies on its arse though.
    
    4 2
2. Monday 12th May 2014 15:42 GMT Tom 38
  
  Re: Huh?
  
  I thought these wonderful cloud systems were supposed to be highly reliable?
  
  No, they are supposed to be cheaper in capex.
  
  Everyone's comments here are proof that it is possible to build a reliable service on top of an unreliable service, TCP being a reliable service that is implemented over IP, an unreliable service. The idea of clouds is that lower capex costs allow you to dynamically scale your loads, allowing you to provide a reliable service to your users that is built on commodity cloud servers that may be unreliable.
  
  Not seen one done right so far though, and if you are in business long enough, the benefit of lower capex is quickly extinguished by the massive increase in opex.
  
  0 0
Thursday 1st May 2014 17:41 GMT Steve Knox

It's your fault

If you'd allow MS to store copies of all of your data here in the US, they could just redirect affected customers to a US mirror.

5 0
1. Saturday 3rd May 2014 05:05 GMT Phil_Evans
  
  Re: It's your fault
  
  That's right Steve, then we could all have a single geo-lo and no data redundancy at all. I'm not going to fall for the old NSA/Orwellian barb there since there are so many good jokes to Make about msft before we even get to that :-)
  
  1 0
2. Saturday 3rd May 2014 19:53 GMT Anonymous Coward
  
  Re: It's your fault
  
  Azure has 2 regions in Europe. No need to send your data all the way to the colonies.
  
  0 0
Thursday 1st May 2014 18:10 GMT BlueGreen

I wonder if it's down to their love of complexity

MS are addicted to complexity. If you use their site, it needs JScript. If you use hotmail it uses an utter ton of JS. If you use their products they tie in together to make a gordian knot that can't be cut. It's deliberate; put one foot into their garden and they try to tie you down forever. So, they are addicted to complexity. The downside is bugs and failure. I wonder if that's the ultimate root of their cloud problems, as well as their desktop flakiness[*].

[*] Am learning SSAS 2008, just working through tutorials. Just basic stuff and I've managed to crash it outright once and have had over half a dozen internal errors (for which google has been much more helpful than MS) which has cost me hours. Utter crap.

9 3
1. Thursday 1st May 2014 18:22 GMT John P
  
  Re: I wonder if it's down to their love of complexity
  
  "If you use their site, it needs JScript. If you use hotmail it uses an utter ton of JS"
  
  As is the case with about 99% of sites on the internet, what's your point?
  
  5 7
  1. Thursday 1st May 2014 18:28 GMT BlueGreen
    
    Re: I wonder if it's down to their love of complexity @John P
    
    (sigh) It's not about JS per se. My point is that it is unnecessary. gmail works fine without it. But they use it by the shovel load for no good reason (UI prettiness isn't a good reason IMO). They're addicted to More, not Simpler, and if I'm right that attitude may have worked its way into their datacentres and is causing them problems. Clear enough?
    
    10 3
    1. This post has been deleted by its author
      1. Thursday 1st May 2014 18:51 GMT BlueGreen
        
        Re: I wonder if it's down to their love of complexity @John P @Pascal
        
        > What are you talking about, gmail.com is very, very heavily loaded up with js.
        
        disable your JS and try it. It works. That's how I use it. Disabling JS on hotmail just redirects you to a page telling you to enable it.
        
        >>>> BUT the point is not JS but complexity. I mentioned JS overuse as a proxy, not the main point. <<<<
        
        13 1
Thursday 1st May 2014 18:13 GMT Dave 15

Maybe the service is having trouble uploading the data to the nsa

Why don't American corporations cut the middle man out and ask the NSA to provide the storage, that way at least there is only one copy of all the data not 2.

9 0
1. Saturday 3rd May 2014 19:54 GMT Anonymous Coward
  
  Re: Maybe the service is having trouble uploading the data to the nsa
  
  The NSA mostly don't copy it - that would be very inefficient. They just index it...
  
  0 0
Thursday 1st May 2014 19:11 GMT Destroy All Monsters

Well, it *is* "International Workers' Day"

These servers just decided to perform some "labour action". ROTM?

(In the US, this day is called either "Law Day", "Americanization Day" or "Loyalty Day". WTF?)

4 0
1. Thursday 1st May 2014 19:49 GMT Mephistro
  
  Re: Well, it *is* "International Workers' Day"
  
  In the US, this day is called either "Law Day", "Americanization Day" or "Loyalty Day". WTF?
  
  I think that could be one of the side effects of McCarthyism . Apparently, "Labour Day" sounds too commie for America.
  
  2 0
Friday 2nd May 2014 14:31 GMT Apemantus

Azure Ads

As I read this article it is surrounded by Azure cloud adverts, the whole background, the top and a box in the middle.

I want my storage on their advertising cloud, that'll never go down.

4 0
Friday 2nd May 2014 14:54 GMT Tom 38

It's good to know that The Register is following the highest standards of journalism possible, as practised by the BBC, viz that it is not news unless you can find two arbitrary people complaining about it on Twitter.

Fuck yeah! Digital engagement!

3 0
1. Saturday 3rd May 2014 13:04 GMT Destroy All Monsters
  
  They also attack technological cripples in tech autism land that are differently abled. THAT'S UNFAIR!
  
  0 0
Monday 12th May 2014 12:08 GMT blondebier

RCA

We were affected by this outage in West Europe... The RCA report we received is as follows :

Incident Title Storage and Compute in West Europe : Partial Service Interruption

Service(s) Impacted Azure Compute (Service Management), IaaS, Azure Service Management, Storage, Azure Web Sites

Incident Start Date and Time

5/1/2014 2:39:00 AM (Pacific Time)

Date and Time Service was Restored

5/1/2014 3:40:00 PM (Pacific Time)

Summary

On May 1st, Customers may have experienced timeouts or errors with their Compute or Storage services in West Europe sub-region. The root cause of this interruption was an unexpected power outage during scheduled maintenance in the datacenter.

A set of racks lost power affecting compute and storage services running there. Most racks recovered automatically once power was back, however some needed a reboot of their chassis to recover. Once mitigation and verification steps were executed on all clusters, full functionality of all Azure services were restored.

Customer Impact

Customers may have experienced timeouts or errors with their Compute or Storage services in West Europe sub-region. Storage account creation may have failed during the impacted window.

Affected sub-regions

Region Sub-Region

Europe West Europe

Timeline

Time Event

5/1/2014 02:39 AM PST The Microsoft Azure team received the first alert of a power outage. The investigation initiated promptly.

5/1/2014 02:40 AM PST Power restored to impacted racks.

5/1/2014 03:08 AM PST Majority of services were restored automatically once power was back. Automated repair process (Service healing) started repairing for offline instances.

5/1/2014 03:40 AM PST The Microsoft Azure team identified some racks needed a reboot of their chassis to recover. Mitigation steps were validated and executed over the next hours

5/1/2014 11:25 AM PST All services were fully restored but Azure team kept monitoring and verifying that the restoration processed as expected.

5/1/2014 15:40 PM PST The Microsoft Azure team confirmed full recovery of all Microsoft Azure services.

Root Cause

A power outage due to a human error during scheduled maintenance in the datacenter.

Next Steps

We are continuously taking steps to improve the Microsoft Azure Platform and our processes to ensure such incidents do not occur in the future, and in this case it includes (but is not limited to):

• Improve validation process during maintenance to prevent human errors.

• Investigate and repair server hardware that encountered additional reboot failures, working closely with our partners.

• Tooling and automation improvements to minimize time to recovery.

We apologize for any inconvenience.

---------------------------------------------------------------------------------------------------------------------------------------------

Work experience boy was allowed in to the data centre!

Microsoft were slow to respond and we were without compute and storage services for over 6 hours.

You may be interested to know that geo-failover did not occur. Why not you say? Isn't that one of the main attractions to the cloud?

Apparently, for a Microsoft Azure data centre, a “major disaster” would be a complete data centre going off-line. Microsoft felt that as this was not a complete data centre outage as the majority of their other worldwide customers were not affected. Since the entirety of the services in the data centre were not affected, the geo-failover process was not invoked.

Our future involvement with Azure will now be very limited. There is no service level redundancy. Data is copied from one site to another, but you aren't in control of it and you can't access it in the event of a disaster. If you want to have service level redundancy in the Azure, you need to provision additional services yourself. Effectively duplicating all your systems should an apprentice unplug a row of server racks. This makes the entire Microsoft Azure offering uneconomic and we'd be better placed expanding our current data centre where we are in full control.

0 0

Topics

Special Features

Vendor Voice

Resources

COMMENTS

DaaS

In before

Small Scale

Get it at the Windows Shop

Huh?

Re: Huh?

Re: Huh?

Re: Huh?

Re: Huh?

Re: Huh?

It's your fault

Re: It's your fault

Re: It's your fault

I wonder if it's down to their love of complexity

Re: I wonder if it's down to their love of complexity

Re: I wonder if it's down to their love of complexity @John P

Re: I wonder if it's down to their love of complexity @John P @Pascal

Maybe the service is having trouble uploading the data to the nsa

Re: Maybe the service is having trouble uploading the data to the nsa

Well, it *is* "International Workers' Day"

Re: Well, it *is* "International Workers' Day"

Azure Ads

RCA

Other stories you might like

Want to keep Windows 10 secure? This is how much Microsoft will charge you

French lawmakers take a swing at cloud monopolies

Microsoft slammed for lax security that led to China's cyber-raid on Exchange Online

Microsoft gets new Windows boss as Start Menu man Parakhin 'to explore new roles'

German state ditches Windows, Microsoft Office for Linux and LibreOffice

SharePoint logs are easily circumvented and Microsoft is dragging its heels

How a single buck bought bragging rights in the battle to port Windows 95 to NT

US government excoriates Microsoft for 'avoidable errors' but keeps paying for its products

Cloud Software Group and Microsoft pledge another eight years of co-opetition

Microsoft rolls out safety tools for Azure AI. Hint: More models

Microsoft gives Hyper-V ceilings a Herculean hike

Microsoft breach allowed Russian spies to steal emails from US government

About Us

Our Websites

Your Privacy

Well, it is "International Workers' Day"

Re: Well, it is "International Workers' Day"