The Register® — Biting the hand that feeds IT

Web startups crumble under Amazon S3 outage

William Vambenepe

Cloud computing or fog computing? 

The lack of visibility is a problem indeed. In the long term, what really matters is not just to post updates on a status blog. It is to provide machine-readable management data that can integrate with the customer's IT management tools (once these tools are aware of the utility computing aspects of the infrastructure). More at http://stage.vambenepe.com/archives/165

Ben

Details from Amazon 

Go

Details of what caused the outage were also posted to the blog:

http://developer.amazonwebservices.com/connect/message.jspa?messageID=79982#79982

Mats Koraeus

SLA 

Flame

Weeell... their SLA promise a 99.9% uptime/month, and this little outage accounts for what? 0.3%? That's a 10% service credit payout coming everyone's way. Wohoo!

Poor Amazon ;D

theotherone

yup.. 

Paris Hilton

yup, it took out second "get a" life files too....so many SL'ers weren't able to "get it on" yesterday..

P.H cause...well it obvious really...she always gets it on...

David Wiernicki

The things we lack 

It is my opinion that the lack of personal plugs on the Reg is indeed a problem as well, and I applaud Mr. Vanbenepe for doing his part to remedy the situation...

Anonymous Coward

Get real 

Stop

No ISP/ISV will *ever* provide real-time data from their monitoring systems to their customers. Such data will always be vetted by their systems/network engineers prior to "disclosure". And that takes time, so get used to it. (Real-life ain't CS skool.)

lord_farquaad

Get real real 

Unhappy

Anonymous Coward wrote :

"No ISP/ISV will *ever* provide real-time data from their monitoring systems to their customers. Such data will always be vetted by their systems/network engineers prior to "disclosure". And that takes time, so get used to it. (Real-life ain't CS skool.)"

Shall I conclude that the real time data my hosting company is sending me is a fake ?

Steve Roper

We apologise for the inconvenience 

Flame

When things in IT go pear-shaped, that last thing you want to do is waste time prepping press releases and updates, time better spent ACTUALLY FIXING the problem. You post a message along the lines of "We're currently experiencing technical difficulties. Our technicians are working to resolve the issue, and we will restore service as soon as possible. We apologise for the inconvenience." Then you knuckle down to fixing it.

Distracting the techs with "How long is it going to be" questions simply pisses the techs off and makes it take even longer. People need to realise that when things like this happen, there IS a team of technicians working their bloody arses off to get the damn thing going again. The last thing we need is to be constantly harassed over how long it's going to take, or having some benighted suit threatening to relieve us of our jobs if the problem isn't fixed within the hour. Great - he's going to sack the people who are fixing the problem and take weeks more to hire and train our replacements. Believe me, we want it fixed every bit as badly as you do, so we can put our feet back up and go back to reading and posting on El Reg!

It takes as long as it takes to isolate the problem and develop and implement a solution. Screaming and jumping up and down and threatening to go elsewhere or sack people isn't going to make it go any faster; we're only human and we can only think and act so fast.

So the next time you see an "apologise for the inconvenience" message, picture in your head a bunch of stressed-out geeks running around pulling apart circuitry and frantically coding patches with a stick-wielding boss standing over them threatening to destroy their livelihoods. And CHILL. Don't ask us how long it's going to take. It'll be fixed as quickly as we damn well can fix it!

michael Weissman

Amazon suffers Paradox of Excellence 

The book, The Paradox of Excellence well explains what Amazon is going through right now.

The paradox of excellence is the better you perform, the more your performance becomes invisible - from everything but bad news. Clearly, Amazon web services has just delivered bad news - a major service outage. Right now, people are focused on the bad news, instead of the otherwise outstanding performance.

Amazon's mistake has been to assume that customers will always value outstanding performance. They don't. They need to be reminded. Amazon's value needs to beconstantly reinforced in developers' minds.

Of course, Amazon needs to fix the issues that allowed this outage to happen. But of equal importance, Amazon needs to communicate its value more effectively, too.

Jon

"and the S3 rep didn't post again until they resolved the issue." 

maybe because they were focusing on fixing it?

(I see Steve Roper beat me too it, agree totally)

Magnus

To the two techies above 

When you have to make a decision as a company as to how you should respond to an outage having a rough idea as to how long it is going to last is essential. That's why customers demand ETAs and some feedback, it isn't just to make your day even more miserable.

Now, I'd also say that there should be a good CS/CR team fielding the flak and keeping customers updated, so that the techies can focus on fixing things. An information blackout is never acceptable though.

Simon B

well done to Amazon for resolving the outage quickly 

Thumb Up

Yes feedback could have been better BUT, I say well done to Amazon for getting it back up so quickly. All too often I read about outages that go on , and on and on, and when it say it's been restored lots of people shout oh no it hasn't!

So well done Amazon on getting it back up so quickly.

Patrick O'Reilly

Cloud Computing eh... 

Perfect timing for Sun to swoop in with their new offering.