Amazon has finally resolved the lingering issues from the outage last weekend, but the root cause of the power failure has yet to be determined by energy provider ESB Networks. The web giant said at the time that lightning had struck a transformer in Dublin on Sunday, which knocked out all power provision including the back-up …
Lulzsec is now hiring out the IRA to perform dirty work on transformers and generators...
So ... ESB scrambled EBS?
The name seems apt given what the power outage did to one of my EBS volumes. Quite how Amazon's power systems screwed up this badly is a tougher question though: why on earth did all the backup power systems fail completely? Which were the EBS volumes corrupted by the power failure? Presumably something to do with the block replication logic mangling ordering; if the volume had just stopped abruptly at any point, respecting write barriers, it would have been no different from a regular power failure: a quick journal replay, a fsck/chkdsk for lesser filesystems and we'd all have been back online in minutes.
am i missing something?
Is Amazon running their datacenter off of an extension cord plugged into a neighborig building? I can't see any reason a single exploding pole-pig could take down a data center.
Both redundant rings inside the DC took a power feed from the same sub-station, assuming 2x PSU per node as the usual design for DCs.
Not sure about UPS - they normally run on Diesel with capacity for a week or 2.
Seems a bit iffy.
As with all hosting suppliers or telco feeds, EAST/ WEST is the usual game with diverse suppliers and fibre paths in different trays.
I always ask for the blueprints first.
But like they say - either they don't know the root cause, yet or are stalling until they can come up with something creative. All the best lies are founded in truth.
Yes, I agree with James 100. I think there is more to this than meets the eye, some ass covering involved. I have a client who I have spent considerable time and effort setting up EBS images for. Now they are all screwed up. This is not good enough.
This continuous need to design in your own resilience.
Even with the advances in technology which are touted in the cloud, we still need to think long and hard on making our sites resilient. The best means of being resilient would be to have a site which is hosted from multiple cloud providers, which is possible but also expensive. There's also the issue of the differing services and means of accessing them between the different providers, there's no standards here yet, so you must write different provisioning scripts for both or buy someone's middleware software (I'm a fan of do-it-yourself and free software). Perhaps we'll see some standards emerge over the next few years, at least amongst similar types of cloud platforms, such as the infrastructure clouds such as Amazon's AWS and rackspace. Don't hold your breath though! I have been exploring the use of Amazon AWS for hosting a website and have posted advice and free scripts at http://www.practicalclouds.com