User topics

Article topics

Log in Sign up

Lightning strikes cloud: Amazon, MS downed

Microsoft has been left reeling again after another BPOS crash but at least on this occasion it was not alone, as Amazon's EC2 web services were also downed by the same act of God in Europe. A bolt of lightning struck a transformer at a power utility provider in Dublin, causing an explosion that took down the back-systems last …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Monday 8th August 2011 11:56 GMT Anonymous Coward

Slightly biased comments

...we have both BPOS email services and Amazon live EC2 services. Microsoft was back last night whereas EC2 is big time down, cant even get to the images to rebuild in USA which was their advice. Recovering from Crashplan to new US instance).

Huge difference here which seems biased the other way in article !

2 0
Monday 8th August 2011 11:56 GMT Graham Bartlett

Cloud schmoud

Hopefully a wakeup call to all those compsci fanbois who've been saying "put your services in the cloud" for the last couple of years. Newsflash, kiddies: there's some real engineering underneath all this, with real volts and amps and hardware and stuff. It doesn't matter how pretty your little class diagrams are - if it doesn't bloody work then it doesn't bloody work.

10 2
1. Monday 8th August 2011 13:35 GMT Ru
  
  You seem to be confusing computer scientists with marketers
  
  Newsflash sunshine, the people who understand the issues involving with running fault tolerant distributed systems are not the ones selling those systems to you.
  
  3 4
  1. Monday 8th August 2011 14:30 GMT Ian Michael Gumby
    
    Cute
    
    But the issue is that when people sell 'the Cloud' to the bean counters they fail to price in the need to duplicate the costs so that the data could also reside in a second cloud in a different data center in order to provide for the up time.
    
    The bottom line is that this could have happened to a private 'cloud' as well.
    
    0 2
2. Monday 8th August 2011 18:39 GMT Alastair 7
  
  Don't be so sure
  
  This isn't exactly a crisis for cloud computing. A lightning strike could hit a data centre you own yourself, too. In fact, with cloud hosting you could recover very quickly by spinning up a new instance (say, in the US) while the Dublin centre recovers. Can't do that with your own data centre.
  
  1 0
  1. Tuesday 9th August 2011 19:14 GMT Ian Michael Gumby
    
    @Alastair 7
    
    That's true and that's part of the point.
    
    You control your own data centers and you control your own DR.
    
    You build in the duplication such that you don't have the down time.
    
    But when you go to the 'cloud' like Amazon, you don't necessarily have people doing the due diligence and setting up replication to a different cloud. They don't account for the costs associated with having the second cloud.
    
    0 0
3. Monday 8th August 2011 23:33 GMT Kev99
  
  Ditto
  
  Since I first heard of "The Cloud" I've said only fools and geeks would succumb to the marketing hype. Why anyone wants to risk everything on someone else's unknown kit is beyond me.
  
  3 0
4. Tuesday 9th August 2011 10:34 GMT Jay42
  
  lovely, some more security theater
  
  ... and we couldn't 'float' across to another server farm as you could* on the Cloud
  
  Honestly the cloud availability is far, far better than in-house up time. I can only think that El Reg is haunted by IT bods that feel their cushy number is threatened. Why have one BOFH for 20 servers when you only need one per 2500 servers?
  
  (*data replication permitting)
  
  0 0
5. Tuesday 9th August 2011 13:34 GMT Anonymous Coward
  
  Ewe-e-e!
  
  Volts and amps and stuff? I thought it was all just 1's and 0's.
  
  0 0
Monday 8th August 2011 11:56 GMT Anonymous Coward

Maybe change the name to ...

Office ???

0 0
Monday 8th August 2011 11:56 GMT Anonymous Coward

Phase synchronized

"Power sources needed to be "phase synchronised" before being brought online to load, which needed to be done manually"

I remember doing that at Uni. 3 lightbulbs across the phase pairs, wait till they all go dark at the same time & hit the switch. Delay of a few seconds, at most. Maybe this new cloudy stuff is too high tech?

2 0
1. This post has been deleted by its author
  1. Monday 8th August 2011 13:35 GMT Steve X
    
    dark bulbs
    
    3 phases (red, yellow, blue) from the grid, three phases from your local source. Lightbulbs between grid "red" and local "red", ditto for the other two. When all the bulbs go dark together it means there's no significant voltage difference between grid and local on any phase, i.e. you know that your local source is in sync with the grid, and can be connected.
    
    2 0
  2. Monday 8th August 2011 13:35 GMT Jacqui
    
    Re: Phase synchronising
    
    You have three phases in a std 415 v supply.
    
    To sync your gensets you need to align the frequency and voltage of each of these so that the voltages match for a short period then you can switch over.
    
    The three bulbs trick is an old one - the bulbs flicker slower and slower as the genset matches the supply and when the match and are in phase the bulbs all go dark.
    
    The problem seems to be that the mains went completely out and wrecked the sync/swtchover gear.
    
    In this case the gensets need only be synced with UPS or if on the feed side of the UPS the syncswitchgear needs to be disabled and start up with the gensets.
    
    It sounds that rather than risk having two outages, they intentionally "stayed down" so they could repair the defective sync/switchgear. My other guess is that the gensets failed as quite often, datacentres do not run thier gensets often enough or they fail to bleed off the water condensation and when the genset is finally ran in anger, the water usually gets into the filters and kills them dead just when they are needed. If this happens the "duff switchgear excuse" gets trotted out to cover incompetent maintenance.
    
    I would estimate that roughly half the gensets in datacentres do not get the regular testing and servicing they require. I know some mechanics involved in the genset servicing game and due to the work going to lowset bidder, money is a joke - so they cut corners, reuse filters etc.
    
    1 0
2. Monday 8th August 2011 13:59 GMT Roger Greenwood
  
  Synchronisation . . .
  
  . . . is only necessary when going back onto the mains. In the meantime the UPS and generators should have been sufficient running isolated. So this is not the whole story. Lightning protection units? We've heard of them.
  
  3 0
Monday 8th August 2011 11:58 GMT Bilgepipe

The Cloud

So, this whole "cloud computing" thing can be felled by a bolt of lightning? Is there no redundancy built into this rubbish at all? Wow.

Let's see an end to this cloud nonsense, ffs.

6 0
Monday 8th August 2011 11:58 GMT Stu J

Surely...

<devil's advocate>

...there should have been a lightning conductor to protect the equipment that was struck? If not, and it caused such an almighty spike, then the utility provider should be held responsible.

Also, by the same logic, anyone else with a datacentre near Dublin on the same part of the power grid will also be equally screwed (if not moreso)? So cloud hosting doesn't get rid of these kind of risks - doesn't mean that it's inherently worse than self-hosted datacentres.

</devil's advocate>

0 2
Monday 8th August 2011 12:03 GMT Anonymous Dutch Coward

It works, but it doesnt...

"the incident will rightly lead to questions about the "viability of the cloud as a delivery platform" but added outages were not a sign that the cloud does not work."

and

"There is also a need to address business contingency on behalf of customers...blah"

In other words, it could have worked, but it doesn't. But it will. Really. Or does it "work" already as long as you keep tweeting that you have a disruption when your disaster recovery doesn't work?

I'm sure a lot of customers really appreciate that there is a need to address business contingency. Problem is, the providers should have done so in advance....

1 0
1. Monday 8th August 2011 12:19 GMT Jon Double Nice
  
  You could mirror your sites in different AWS regions
  
  It's unlikely that both Dublin and Virginia (I think) will both go down at the same time. Then the load balancer stuff should kick in to save your bacon. Just a thought. I only use the free micro tier, but my stuff is all hosted in the USA.
  
  0 0
Monday 8th August 2011 12:03 GMT Anonymous Coward

BPOS?

Anyone else notice BPOS could stand for something else not unreleated?

Big Pile Of Sugar? ;)

Although given our recent experiences you should make it BSPOS, add a steaming in there :)

1 0
Monday 8th August 2011 12:03 GMT Alan Bourke

Odd ...

... because I'm near Dublin, there was one almighty crack of thunder at about 3PM but I didn't hear any more after that.

0 0
Monday 8th August 2011 12:03 GMT Steven 1

Pass the umbrella..

...rain's falling from the clouds....

0 0
Monday 8th August 2011 12:05 GMT amanfromMars 1

Another fine mess, Stanley?

Wow, just love that timeless classic postmodern gobblydegook and unwinese, used as a mass excuse and explanation of failure to supply uninterruptible and incorruptible power services.

1 0
Monday 8th August 2011 12:05 GMT Natalie Gritpants

Is this ironic?

All these cloud service providers felled by something that came out of a cloud?

8 0
Monday 8th August 2011 12:05 GMT Anonymous Coward

Does this count as cloud at all?

How is it different to shared hosting if it's all in one place? I thought cloud was supposed to replicate your stuff a bit better.

0 0
Monday 8th August 2011 12:18 GMT philbo

What, no redundancy?

..I smell redundancies :)

But seriously: isn't a lightning strike knocking out the cloud possibly the most ironically wonderful event possible?

1 0
Monday 8th August 2011 12:18 GMT Anonymous Coward

Eh?

there should have been a lightning conductor to protect the equipment that was struck...

Yes they are called Pylons, f**king great metal spikes in the gound, usally covering several hundred miles. However if the lightning decides it's found a better route, it will.

However we look at this a positive as we have a (unrealistic) 1 hour SLA to get all systems back on line. Now with this we can say, "see even MS and Amazon take several hours, so can we make it 4 please?"

0 0
Monday 8th August 2011 12:18 GMT Matt Bryant

I iz confuzzled? Who designed these solutions?

Disaster Resilient Datacenter Designs 101 - if it really matters, get your mains supply from two completely seperate mains supplies. This is very expensive, which is probably why cheapo cloud services wouldn't have it. And you need to have that double sourcing for everything, not just the back-end, as it's pointless having your datacenter humming along nicely if your front-end network routers are all offline.

Disaster Resilient Datacenter Designs 102 - diesel gennies are the belt to your mains braces. Yes, they cost a lot, the diesel needs to be churned and replaced every couple of months, but - when the mains is up the creek - those gennies are completely under your control.

Disaster Resilient Datacenter Designs 103 - if you skip on lessons 101 and 102, then you actually do need metro-level redundancy as well as intercontinental redundancy. That means another datacenter to fail over to in a close location (but not close enough to be taken out by the same disaster, ideally on a completely seperate mains power source on the other side of the city or in another town) before you have to fail over to a whole different continent. Amazon seem to have forgotten that one.

But, seeing as I'm quite happy to see cheapo cloud crash and burn, please don't tell Amazon or M$!

4 2
Monday 8th August 2011 12:19 GMT Shane McCarrick

Weird weather here.......

We don't normally get storms as bad as this in Dublin- the lightening and torrential rain of the last 2 days has been startling at times........ Entire areas of the city were knocked off the grid some areas several times- which in this day and age is very unusual. I'm not altogether surprised that some high prominence systems fell over- you can build as much redundancy as you like into systems, but you will always encounter scenarios that you just don't plan for.........

1 2
Monday 8th August 2011 12:26 GMT Anonymous Coward

Stormcloud computing…

… thundering into a datacentre near you.

Me thinks they might want to consider running the servers on DC power instead … DC does not require synchronisation.

1 1
Monday 8th August 2011 12:50 GMT Anonymous Coward

amazon still down

I am the customer of a company who use amazon for their hosting, today i got an email telling me "Its not us, its amazon" ... doesn't matter to me who they offloaded their services to, i didn't choose them.

While i am no fan of cloud computing, I am left wondering if there is any benefit atall.. since pre-cloud the services i am using would likely be co-located over just two boxes in different locations, currently i seem to have had all my eggs put in one amazon shaped basket - which is an improvement how?

1 0
1. Monday 8th August 2011 23:32 GMT Anonymous Coward
  
  Re: amazon still down
  
  Jeremy,
  
  I would blame your hosting company in this case. Why didn't they host your service on multiple geographically redundant instances, if Amazon allows them to do so?
  
  Don't take me wrong, I'm a cloud skeptic as well, and Amazon is by no means fail-safe, but beware of clueless resellers.
  
  Daniel
  
  0 0
Monday 8th August 2011 12:53 GMT Anonymous Coward

I wonder ...

I wonder if there were some major routers knocked out at the same time? There were some decidedly strange connectivity problems yesterday ... (and no, it wasn't my ISP).

0 0
1. Monday 8th August 2011 13:36 GMT Anonymous Coward
  
  Have you tried...
  
  .switching it off and on again...?
  
  2 0
Monday 8th August 2011 13:35 GMT jake

::heh::

This is why you shouldn't let MBAs design networks ...

1 0
Monday 8th August 2011 13:44 GMT Anonymous Coward

Divine Intervention

I was going to buy some expensive gifts last night from Amazon. The website conked out, so I took it as Divine Intervention. Seems like I was right :)

1 0
Monday 8th August 2011 19:01 GMT Will Godfrey

-1

By George! The competition is getting savage these days. Two clouds knocked out by a third one.

0 0
Monday 8th August 2011 19:01 GMT Mike VandeVelde

clouds & social media

"using social media where its own service has failed"

But what if Twitter was down at the same time? I guess then you could say it would truly be time to panic? Because as long as you can tweet to your customers "we are aware that your service is currently unavailable" then you can say you've done everything you possibly could have with a straight face.

0 0
Monday 8th August 2011 20:47 GMT Jim O'Reilly

All eggs in one basket?

So, why are we surprised?

Clearly, the technology deployed so far in "the cloud" isn't resilient enough for catastrophic failure. The likely cause here was the use of transfer switches to change the power sourcing from the failed source to a new mains power feed. These are semiconductor devices, and are damageable by lightning strikes.

The solution for this has been around a few years. --- Remove the transfer switches and have two power sources on each server, each with its own PSU. Lightning might take out a few of the supplies on the A-feed, but the B-feed should keep on trucking.

Of course, this all stems from the concept of megadatacenters. Perhaps that's the crucial fallacy, as seen in the comms failure that down Amazon a few months ago. Maybe smaller regional "bigadatacenters" might be better. The economics are surprisingly close.

0 0
Monday 8th August 2011 20:52 GMT Mr Young

Aaah - the Cloud

Best named new technology ever!!! I sometimes wonder if some sarcastic bastard in DAARPA thought that one up? I do struggle with reasonable explanations for this insanity:)

0 0
Monday 8th August 2011 20:53 GMT Ben 5

I've said it once, I'll say it again...

Amazon EC2 provides the *infrastructure* on which you can build a redundant service.

They are virtual instances running on physical hardware, not much difference to any other machine running virtual machines.

The difference is that Amazon have availability zones in the same location, and other data centres around the planet that have exactly the same setup, and they provide a single supplier to deal with. So it's much easier to build something that has redundancy built-in. However *you* have to do the work for that.

The services that they offer themselves that do have redundancy (eg. S3) were not affected. To me this is a minor incident as it only affected one availability zone - the other was running fine. So the sites that are well engineered were unaffected. It's the people running just a single EC2 instance against all advice that were affected.

2 0
1. Tuesday 9th August 2011 08:41 GMT Ken Hagan
  
  Re: I've said it once...
  
  "It's the people running just a single EC2 instance against all advice that were affected."
  
  That would include Amazon's own shopping site, then.
  
  0 0
2. Tuesday 9th August 2011 22:00 GMT Anonymous Coward
  
  EC2 is too unreliable for business use
  
  I don't know if my company has just been unlucky, but aside from this major outage (which is still classed by Amazon as a "performance issue!") we often have problems with servers locking up for no reason. When this happens, all you can do is reboot it and hope that fixes it. Disks sometimes just don't attach properly. It simply isn't a reliable technology.
  
  You say it's not much different to running on another virtual machine, but if I had hosting on another company's server and the power went out, I think they would get the power back and restart the server. Amazon on the otherhand still haven't fixed many of the downed servers two days after the event. Their advice for recovering data just isn't working in many cases, as you can see from their own forum.
  
  You say we have to do the work for the redunancy, but if it's just people not setting things up correctly, why did all of Amazon's own sites in Europe go down for about an hour when this all happened?
  
  The truth is, Amazon say mirror stuff in different zones and you will be fine, and use their own DBMS. But although they don't admit it, it wasn't just one zone that was affected and their DBMS had serious problems too.
  
  I can't speak for other cloud systems, but I think EC2 is not suitable for critical business applications.
  
  0 0
Monday 8th August 2011 23:35 GMT rototype

I just wonder...

... if somene up there is trying to send us a messagenot to try and be so 'clever' at creating 'disaster proof systems'. I wonder if the planet has had enough of being f****d around with. (and NO I'm not an environmentalist).

0 0
Tuesday 9th August 2011 08:24 GMT wsm

Not again

These cloud systems have fallen down more often than any data center I have been involved with. They must be too complex for the people running them--or have management decisions, influenced by marketing types, led to over-promising things that haven't been designed into the systems?

You wouldn't see these situations if they stuff they promised was actually functional.

1 0
Tuesday 9th August 2011 10:41 GMT MrCheese

Fsckin Cloud

"There is also a need to address business contingency on behalf of customers through the use of backup and mirrored facilities. That costs but it is a necessary cost and underlines the need for a web of alliances between application providers and cloud service and infrastructure providers to allow switching in the event of a failure"

So what you mean luv, is that as well as throwing money at you and your marketing cronies for Cloud services we should also build local redundant systems as well? What, you mean like the ones your trying to convince us to move to the Cloud!!?!?!

Said it before, say it again: It's just another fad designed to justify the ever increasing size and scale of modern datacentres, and the claim that is ever going to save a business money is utterly trite (especially if Ms Cloud is now saying you'd best have own backup as we're fskin useless)

1 1
Monday 15th August 2011 09:06 GMT 5media

amazon goes under blackout

Amazon, a leading cloud provider, was disrupted in its service lately and caused discomfort to many websites!

http://cloudtechsite.com/news/amazon-cloud-undergoes-another-blackout.html

1 0

This topic is closed for new posts.

Other stories you might like

Amazon search results now less self-centered, boffin says

Self-preferencing pushback in Europe and US seems to have had some effect

Networks 12 Apr 2024 | 8

SharePoint logs are easily circumvented and Microsoft is dragging its heels

Now is the perfect time to review those permissions

Applications 10 Apr 2024 | 7

Irish power crunch could be prompting AWS to ration compute resources

Exclusive Users report being pointed to other EU regions if they need more grunt

On-Prem 9 Apr 2024 | 113

Ex-Amazon exec claims she was asked to ignore copyright law in race to AI

High-flying AI scientist claims unfair dismissal following pregnancy leave

AI + ML 22 Apr 2024 | 42

Researchers claim Windows Defender can be fooled into deleting databases

BLACK HAT ASIA Two rounds of reports and patches may not have completely closed this hole

Security 22 Apr 2024 | 17

October 2025 will be a support massacre for a bunch of Microsoft products

Not just Windows 10. Don't forget about Exchange Server, Skype for Business, and all those Office installations

Software 18 Apr 2024 | 34

Microsoft is a national security threat, says ex-White House cyber policy director

Interview With little competition at the goverment level, Windows giant has no incentive to make its systems safer

Public Sector 21 Apr 2024 | 111

Open source versus Microsoft: The new rebellion begins

Opinion Neither side can afford to lose, but one surely must

SaaS 15 Apr 2024 | 183

GenAI will be bigger than the cloud or the internet, Amazon CEO hopes

And Andy Jassy will happily take your money along the way

Off-Prem 11 Apr 2024 | 15

Microsoft breach allowed Russian spies to steal emails from US government

Affected federal agencies must comb through mails, reset API keys and passwords

Cyber-crime 12 Apr 2024 | 18

Microsoft shrinks AI down to pocket size with Phi-3 Mini

Language model focused on reasoning fits on a smartphone and runs offline

AI + ML 23 Apr 2024 | 11

Microsoft aims to triple datacenter capacity to fuel AI boom

And it's far from the only hyperscaler getting in on the act

On-Prem 18 Apr 2024 | 2

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Situation Publishing

Copyright. All rights reserved © 1998–2024