Mega-Cloud Vendors Don't Get It
The pattern of outages for the mega-cloud vendors (Amazon, Google, Microsoft, etc.) is disturbing. First there is the outages themselves. The June thunderstorm outage is an excellent case in point. No thunderstorm should take down a hardened data center. Why didn't they switch to diesel 30-45 minutes before the storms were obviously going to hit? A planned switch to diesel is far safer and more reliable than an automated emergency switch. If the planned switch fails, they have 30-45 minutes to take steps to get those diesels up and running or, at least, minimize the impact.
Second, they (after a week) still have not published a cause for last week's outage. I find that particularly curious. There is a definite pattern of poor communications from the vendors during and after outages. In the June incident, I noticed hours between updates and skimping on details. Enterprise-class customers will not suffer this treatment for long.
Third is that in many of these outages, the size of their environment seems to exacerbate the issues, causing such things as replication storms that back-up and cause even more problems. They claim to have availability zones, although some of these outages have affected multiples of them. Maybe they should further subdivide their environment and do a better job of isolating them. I believe they also share management functions across their environment. Maybe they should develop management zones aligned with their ideally smaller availability zones. At least offer this as an option, even if they charge a little more.
To me, this all points out the immaturity of the current cloud landscape and will drive enterprises more towards internal private clouds for the near- and mid-term until the vendors can get their acts together. It might also drive enterprises to the clouds being offered by the traditional outsourcing vendors who understand the enterprise market better.
Time for the cloud industry to do some soul-searching.