Amazon Web Services was struck by a temporary outage today, dragging the thousands of web sites which rely on its hosted storage down with it. Reports of content stored on Amazon's Simple Storage Service (S3) being unavailable or performing poorly began appearing around 5AM PST. Hardest hit by the outage was the crowd of "web 2 …
Cloud computing or fog computing?
The lack of visibility is a problem indeed. In the long term, what really matters is not just to post updates on a status blog. It is to provide machine-readable management data that can integrate with the customer's IT management tools (once these tools are aware of the utility computing aspects of the infrastructure). More at http://stage.vambenepe.com/archives/165
Details from Amazon
Details of what caused the outage were also posted to the blog:
Weeell... their SLA promise a 99.9% uptime/month, and this little outage accounts for what? 0.3%? That's a 10% service credit payout coming everyone's way. Wohoo!
Poor Amazon ;D
yup, it took out second "get a" life files too....so many SL'ers weren't able to "get it on" yesterday..
P.H cause...well it obvious really...she always gets it on...
The things we lack
It is my opinion that the lack of personal plugs on the Reg is indeed a problem as well, and I applaud Mr. Vanbenepe for doing his part to remedy the situation...
No ISP/ISV will *ever* provide real-time data from their monitoring systems to their customers. Such data will always be vetted by their systems/network engineers prior to "disclosure". And that takes time, so get used to it. (Real-life ain't CS skool.)
Get real real
Anonymous Coward wrote :
"No ISP/ISV will *ever* provide real-time data from their monitoring systems to their customers. Such data will always be vetted by their systems/network engineers prior to "disclosure". And that takes time, so get used to it. (Real-life ain't CS skool.)"
Shall I conclude that the real time data my hosting company is sending me is a fake ?
We apologise for the inconvenience
When things in IT go pear-shaped, that last thing you want to do is waste time prepping press releases and updates, time better spent ACTUALLY FIXING the problem. You post a message along the lines of "We're currently experiencing technical difficulties. Our technicians are working to resolve the issue, and we will restore service as soon as possible. We apologise for the inconvenience." Then you knuckle down to fixing it.
Distracting the techs with "How long is it going to be" questions simply pisses the techs off and makes it take even longer. People need to realise that when things like this happen, there IS a team of technicians working their bloody arses off to get the damn thing going again. The last thing we need is to be constantly harassed over how long it's going to take, or having some benighted suit threatening to relieve us of our jobs if the problem isn't fixed within the hour. Great - he's going to sack the people who are fixing the problem and take weeks more to hire and train our replacements. Believe me, we want it fixed every bit as badly as you do, so we can put our feet back up and go back to reading and posting on El Reg!
It takes as long as it takes to isolate the problem and develop and implement a solution. Screaming and jumping up and down and threatening to go elsewhere or sack people isn't going to make it go any faster; we're only human and we can only think and act so fast.
So the next time you see an "apologise for the inconvenience" message, picture in your head a bunch of stressed-out geeks running around pulling apart circuitry and frantically coding patches with a stick-wielding boss standing over them threatening to destroy their livelihoods. And CHILL. Don't ask us how long it's going to take. It'll be fixed as quickly as we damn well can fix it!
Amazon suffers Paradox of Excellence
The book, The Paradox of Excellence well explains what Amazon is going through right now.
The paradox of excellence is the better you perform, the more your performance becomes invisible - from everything but bad news. Clearly, Amazon web services has just delivered bad news - a major service outage. Right now, people are focused on the bad news, instead of the otherwise outstanding performance.
Amazon's mistake has been to assume that customers will always value outstanding performance. They don't. They need to be reminded. Amazon's value needs to beconstantly reinforced in developers' minds.
Of course, Amazon needs to fix the issues that allowed this outage to happen. But of equal importance, Amazon needs to communicate its value more effectively, too.
"and the S3 rep didn't post again until they resolved the issue."
maybe because they were focusing on fixing it?
(I see Steve Roper beat me too it, agree totally)
To the two techies above
When you have to make a decision as a company as to how you should respond to an outage having a rough idea as to how long it is going to last is essential. That's why customers demand ETAs and some feedback, it isn't just to make your day even more miserable.
Now, I'd also say that there should be a good CS/CR team fielding the flak and keeping customers updated, so that the techies can focus on fixing things. An information blackout is never acceptable though.
well done to Amazon for resolving the outage quickly
Yes feedback could have been better BUT, I say well done to Amazon for getting it back up so quickly. All too often I read about outages that go on , and on and on, and when it say it's been restored lots of people shout oh no it hasn't!
So well done Amazon on getting it back up so quickly.
Cloud Computing eh...
Perfect timing for Sun to swoop in with their new offering.