Feeds

back to article Amazon cloud knocked out by violent storms in Virginia

A wave of "hurricane-like" thunderstorms ripped across Indiana, Ohio, West Virginia, and Virginia on Friday night, leaving more than 3.5 million people without power and knocking out the US-East-1 data center operated by Amazon Web Services. Netflix, Pinterest, Instagram, and Heroku, which run their services atop Amazon's …

COMMENTS

This topic is closed for new posts.

Page:

Silver badge

So, basically a land hurricane?

> 2012

> Still not using quark computronium safely embedded in earth's core, accessible only via high-energy neutrinaser.

It's the capitalists, I say. They are holding EVERYTHING back.

0
0

Re: So, basically a land hurricane?

Basically, yeah. I hear it was a hell of a storm -- I wish I hadn't slept through it.

0
0
Anonymous Coward

Re: So, basically a land hurricane?

Was driving I-64 just west of Richmond VA and pulled in to a hotel for the night just before the storm hit. Watched the storm from our room. The wind absolutely whipped the trees. We were lucky. Lights flickered, but we didn't lose power or communications. No damage around the hotel.

Continued west on I-64 today and saw large trees down and debris along the interstate. Crews had cleaned up anything blocking the interstate by the time we got on the road. No traffic backups due to down trees. Found rest areas and towns without power the whole way. On top of that there are 100F+ (37C) temperatures today. I saw 108F on my car's thermometer at one point. Without air conditioning it's like an oven out there.

0
0
Silver badge

So you want to leave all you data on the cloud?

Rethink time methinks.

3
2
Silver badge
Pint

It's fine

When you really understand cloud then if the customer can fire up his generator, train his repurposed 10M satellite dish on a distant wifi point and get Internet, then he can access your service in whatever degraded way his bandwidth permits - even if the rest of the continent and your local resources are offline.

0
0
Bronze badge

Re: So, basically a land hurricane?

I'm impressed that you could. Of course, maybe I'd have done so, had I been asleep.

0
0
Silver badge
Alert

Re: So, basically a land hurricane?

More likely, a squall line http://en.wikipedia.org/wiki/Squall_line

Hurricanes cannot form over land. They are driven by hot moist air rising from an ocean surface. They also take several days to get going, so you get at least twelve hours notice that a hurricane is headed your way. Usually longer.

Squall lines give very little advance notice. I've heard tell of a transition from a hot summer day to roofs being blown off an hour later.

0
0
FAIL

The cloud taken out by clouds? Now I've heard it all...

14
0

Quite - kind of worrying that the entire cloud was taken out by a storm. So much for redundancy....

3
0
Coat

Have you heard the sound of a man eating his own head?

0
0
FAIL

9 minutes?

What about backup power? Even my home office can survive 30 minutes.

Guess I'll be keeping my own servers for a while longer.

0
0
WTF?

Standby Generators?

Don't Amazon bother with standby generators at their datacentres, to cover loss of utility power?

0
0
142

Re: Standby Generators?

You need to ensure that a safe voltage arrives inside the datacentre or none at all. Turning on generators automatically when there's an orderly powercut is straight forward. But when you're dealing with shorts, or indeed lightning strikes, on the sort of high voltage power-lines likely feeding that site, thing's become very unpredictable and very dangerous. Switchovers will be almost impossible to load balance and achieve cleanly.

I'd say the engineers were quite happy with only a 9 minute power down, given the situation - as the power lines feeding the site likely looked a bit like: http://www.youtube.com/watch?feature=player_embedded&v=NYCHBI66izs

As for why Instagram @ co put everything in a single availability zone, well, that's just sheer muppetry.....

1
2
Silver badge

Re: Standby Generators?

That's what a UPS is for. As soon as the main supply becomes unstable the UPS kicks in. Main supply is then disconnected from UPS input and replaced with generator power, the UPS covering the outage. It remains like that until the public supply comes back solid & stable.

0
0

Re: Standby Generators?

If they do not have UPS (that is UPS that come with surge protection as standard) then frankly anyone doing biz with such a Data Centre needs their head examined.

Switch over should not be problematic...if it is then you do not have a resilient environment and your BCS/DR guy is a cowboy.

There is no excuse these days for a power outage in a DC that does not trigger a UPS to take over supplying power - from your main ones to the rack mounted....there is no excuse for a DC to go 'lights out' if the external power dies.

More worrying is that it is clear that Amazon do not spread redundancy across their Data Centres. The first sign of trouble the management and staff need to think about what the break point is in terms of failovers to other sites.

Cloud. Yeah right.

Pint coz its the finals of Euro 2012. Go England! Oh wait...they already have...errr...Forza Italia! And Leeds United *cough*

1
0
142

Re: Standby Generators?

"More worrying is that it is clear that Amazon do not spread redundancy across their Data Centres."

They do. Read up on "Availability Zones" - in this case only a single one went down... That Netflix, Instagram, etc put all their eggs in one basket, is their own problem...

1
1
Pint

Re: Standby Generators?

@ 142

Well evidently they don't have this kind of redundancy within their 'Availability Zones' right? It is quite clear that the entire 'Availability Zone' thing is just marketing wankery if the only Cloud Provider that went down was Amazon.

Pint coz...tomorrow is Monday.

0
0
Anonymous Coward

Re: Standby Generators?

So that's like: we have redundancy, but you have to provide it yourself?

2
0
142
Facepalm

Re: Standby Generators?

"Well evidently they don't have this kind of redundancy within their 'Availability Zones' right? "

That's.... the whole.... point..............................

0
1
Silver badge

Re: Standby Generators?

>They do. Read up on "Availability Zones" -

According to some reports it was the availability zone failing to fail over that took out amazon's own service. It is claimed that the data centre went down too quickly and the availability service relied on the down data centre to inform the others in advance to spin up their copies.

In the middle of the outage some users were complaining that Amazon's service dashboard for their instances claimed that the DC was fully up and running when in fact it was dark.

0
0
Joke

Ouch!

That's gotta hurt their guaranteed 99,9999999999999999999999999999999999999999999999999999% uptime!

2
0
Silver badge

So what was it? A lightning strike on the building? I assume they've got UPS to handle the power issues.

0
0
Bronze badge
FAIL

RE: So what was it? A lightning strike on the building?

More likely some muppet decided to see what `the BIG RED button does`!!!

WHAM!!!! TOTAL DARKNESS!

0
0

leader of a bolt of lightning can travel at speeds of 220,000 km/h (140,000 mph), and can reach temperatures of about 30,000 °C (54,000 °F),oh and the fact that its over a 1000 time the energy that goes into the UPS.

1
1
Anonymous Coward

Wrong kind of cloud?

Don't mess with the Cumulonimbus kids. They are mean MoFo's

0
0
Anonymous Coward

Have amazon never heard of diesel backup generators? I work for a mid sized ISP - miniscule compared to Amazon, and we have backup generators for just this kind of situation. What gives Amazon?

2
0

You answered your own question - "miniscule" being the operative word. I've worked at a large computer centre (still a fraction of the size of Amazon of course) and there they had a big room full of batteries to last for the few seconds it took to run up the secondary generators. These only lasted for a minute or so before they were getting really overloaded, but that gave the primary generators time to run up and stabilise. Now scale that up for an Amazon-sized server farm and see what kind of monstrous UPS plant that would need.

Basically, cloud computing is supposed to be sufficiently resilient that you don't need a UPS. Well, that's clearly more the theory than the practice.

1
1
Silver badge

Monstrous UPS?

Flywheel UPS works well for that. The local synchrotron facility has over 1MW of standby generators, with flywheel UPS between them and the grid. When grid power fails the flywheels power the generators for the 10 seconds or so it takes the diesels to fire up and take up the load.

Alternatively go the telco route, with a small battery/UPS in each rack.

0
0
Silver badge

So you're saying that Amazon can't do what a small scale hosting solution can do, because it is tricky? Isn't that what they are selling us - "Trust us, we know DCs".

Isn't the whole point of cloud computing is that someone much more experienced than you at providing DC facilities provides your DC facilities?

0
0
Stop

There is nothing wrong here, from what I can tell a single availability zone went down.

AWS is designed in such a way that if availability isn't important, you can base your load in one local (in this case, N.Virginia). If you want more availability, use best practices and spread your loads around.

The real story is that these services still aren't properly able to cope with the conditions of the underlying infrastructure.

1
0
eLD

Yeah they've got the region "us-east" (virginia) availability zones a -> d. If only one zone in the region went down i don't see this as a real issue. There's a phrase involving eggs and baskets that springs to mind.

I cant see how this would drive people to move to other public clouds for reliability either. At least from the EC2 perspective, Amazon's cloud is failing in the way its advertised to fail. Again with Azure, unless T&Cs have changed since I last looked, you get no SLA unless you've got your stuff deployed in different azure reliability zones anyway.

0
0

Poor planning

Sounds like someone either didn't do an adequate disaster plan, or more likely some accountant wonk in management decided to save some money and not implement the entire plan. Personally, I hope they kept track of who made those decisions - but somehow it's more likely that the wonk got a promotion for "saving money" and its someone else who is going to get blamed for it. Probably someone in I.T.

0
0
Joke

I think the storm was an excuse; it was the leap second that did it!

0
0
FAIL

Second outage in June

AWS had another power issue in their N. Virginia region on June 15. I don't know if it was the same datacenter offhand, but this does make me wonder. Where are the UPS and generator backups?

0
0
Facepalm

cloud? enterpise quality data center? hmmm :/

number one: I thought cloud based services are supposed to cope with this kind of occurrence?

number two: one single data center does not a cloud make... ...services resident in one data center ARE NOT cloud services! (e.g. Instagram et al.)

number three: don't they have a UPS and backup generator?

summary; if it was my business running in 'the cloud' on amazon services, I would be LIVID to put it mildly.

roy.

1
0
Stop

Re: cloud? enterpise quality data center? hmmm :/

AWS let you buy services in regional availability centres, be they throughout the continental US, Europe, Asia, all over really. Its upto the customers to decide on their failover scoping, preparation and scripting - I think this highlights that many didn't and assumed that it would 'just happen', but no-one should assume that it is bulletproof and make necessary backup plans. Google reckon they can do all this automagically, but even they occasionally manage to stuff things up.

0
0
FAIL

Re: cloud? enterpise quality data center? hmmm :/

EXACTLY.

Not looking very "elastic" is it.

Why on earth didn't Amazon fail over the workload to another DC within minutes of a problem occuring? Isn't that the whole bleeding idea of the all magic, highly resilient, always on cloud?

PMSL. Epic Cloud Fail. Just another example of a cloud hype vs reality disconnect.

0
0
Linux

Re: cloud? enterpise quality data center? hmmm :/

Deja vu! Always have a plan B - no matter how sophisticated a system, the unexpected will always happen. This is our strategy: rely on others only to the extent that you are able to manage any failure. http://www.workbooks.com/community/blog/buck-stops-here

0
0
Bronze badge
Alert

Cause and effect?

"Luckily for the Prickett Morgan household, we had just finished up watching several episodes of The IT Crowd over Netflix just before the storm hit."

Have you considered the possibility that this extremely rash and almost incomprehensible act may well have CAUSED this devastating storm?

2
0
Bronze badge

And have you tried turning it off and on again?

Ah go on! Go on! Go on, go on, go on, go on, go on, go on, go on. GO ON!

...oh yeah, wrong show.

1
0
tpm
(Written by Reg staff)

Re: And have you tried turning it off and on again?

I believe Mother Nature turned it off and on.... HA!

0
0
Coat

Re: And have you tried turning it off and on again?

ShowS shurely...?

Still, the outage may not have been small, but it was FAR AWAY (from Reg Towers)

Yeah, the one with the bookshop business card (scribbled on the back of a torn beermat) in the top pocket.

0
0
Silver badge
Boffin

Amazon Did Well

The storm here (Western Capitol Region/D.C.) was terrible. The worst I've seen in the six years since I moved to the mid-Atlantic region in terms of property and infrastructure damage. 'Land Hurricane' per above is accurate except the lightning was intense! At times it looked like daylight outside because of the sustained lightning.

That being said all the "disaster planning experts above" have to understand that in situations like that (which the systems are designed to detect though mains variances) you can't just instantly go to backup without knowing what caused the power outage. If the site was hit directly there could be internal shorts causing the problem and if you keep forcing juice down its throat then the whole place might burn; Halon be damned... The system detected those variances and worked as designed.

Amazon did a fine job with only nine minutes of downtime. Most people can't even get in a good wank in that time.

0
0

Re: Amazon Did Well

Because of course Amazon do not have lightening conductors on their buildings and the only other ingress for power is via a UPS (that deals with power spikes) which does not exist, amirite?

0
1
Anonymous Coward

Re: Amazon Did Well

Lightning strikes don't always raise the potential of the building. They can also raise the potential of the surrounding ground which is where most Earth taps are on your electrical system.

Most equipment (especially UPS) have a hell of a time with GND having a higher voltage than the incoming phases. Follow your +pV GND with a +pV phase and the resulting surge is like a tidal wave where electricity flows back and then hits forward twice as hard.

0
0
Silver badge
Meh

Re: Amazon Did Well

Shouldn't be an issue. An operation of their size would (or should) have it's own substation, and as they can reliably load balance the phases they can (should) have no primary neutral and float secondary one, or at least anchor it to the building's steel frame. Outside ground potential can then do whatever it likes and even if the primary terminals are lit up like a christmas tree the secondary should be fine.

1
0
Silver badge
Boffin

Re: Amazon Did Well

No. You are not right.

Grounded rooftop conductors can help with small indirect strikes but nothing can manage multiple direct strikes in a short period. Meters go wonky, internal breakers flip and shit's just weird. Keeping everything up and running is not as simple as you think. You have to know what's going on before you just put the juice back to it.

0
0
Silver badge
Boffin

Re: Amazon Did Well

The nine minutes to restore service would have been no shorter even if they had their own substation. The detected fault was internal and no amount of outside infrastructure would have mattered. The system worked as designed and it worked rather well.

0
0

Netflix outage site takes hours to notice cloud is missing.

I tried to watch Netflix on my Wii the night after this occurred and it locked up at a loading tiles screen. For hours their twitter feed showed that everything was ok. I went to sleep a while after that so I don't know how much longer it took them to notice they lost an entire datacenter somewhere out there in the cloud.

0
0

Re: Netflix outage site takes hours to notice cloud is missing.

I hate replying to myself, but..

Is it a bad idea to have your system that warns everyone the cloud is down, on the cloud?

0
0

Page:

This topic is closed for new posts.