back to article AWS celebrates Labor Day weekend by roasting customer data in US-East-1 BBQ

A power outage fried hardware within one of Amazon Web Services' data centers during America's Labor Day weekend, causing some customer data to be lost. When the power went out, and backup generators subsequently failed, some virtual server instances evaporated – and some cloud-hosted volumes were destroyed and had to be …

  1. jake Silver badge

    But everything's OK.

    They had it all backed up to the cloud.

    Right? RIGHT???

    1. Nolveys

      Re: But everything's OK.

      The "lost" data are still in the cloud. It's just that the cloud is black, made of smoke and is floating around a data centre somewhere.

    2. lglethal Silver badge
      Trollface

      Re: But everything's OK.

      But but...CLOUD!! *mumble mumble* Something something redundancy... something something backup... something something disaster recovery.... CLOUD!!

    3. wcpreston

      Re: But everything's OK.

      No. Customers who lost data on their EBS volumes did NOT have it backed up to anywhere – including the cloud. EBS is nothing more than very resilient block storage. Nowhere in its service description or SLA does it imply that you do not need to back it up. In fact, in the service description it blatantly states you can expect to lose 1 or 2 volumes a year if you have 1000 volumes. ... so you should use EBS snapshots and back it up.

      Customers that lost data had no backup of their EBS data.

      1. Mark 65

        Re: But everything's OK.

        Err, no. Customers who lost data most likely have backups just not real-time ones. If you backed up at midnight and had a failure at mid-day, for example, there will be a portion of your data that is not backed up hence you suffer data loss. It wasn't because you don't have backups though, just the timeliness of them.

        1. Anonymous Coward
          Anonymous Coward

          Re: But everything's OK.

          That is not accurate, there was real data loss: https://www.bleepingcomputer.com/news/technology/amazon-aws-outage-shows-data-in-the-cloud-is-not-always-safe/

  2. Ian Michael Gumby
    Boffin

    Outch!

    Hate to be the guy who has to tell his non-technical boss that going to the clouds just cost them a bunch of data that they can't recover.

    1. jake Silver badge

      Re: Outch!

      Or perhaps he could say "I told you so!". Has worked for me several times, albeit as a consultant, not an employee. It's sweet. Very sweet.

  3. Anonymous Coward
    Anonymous Coward

    What's Corey going to do?

    Can't wait for @quinnypig to awkwardly hyperventilate "Well, at least 99.5% survived".

  4. Anonymous Coward
    Anonymous Coward

    Idiots!

    We all know that this kind of thing can only happen if you use the cloud.

    1. Anonymous Coward
      Anonymous Coward

      Re: Idiots!

      Yes, because no ones ever had a disk subsystem fried by a power issue in the history of the world before cloud.

      1. EH

        Re: Idiots!

        r/woosh ?

  5. Paul

    Using the cloud doesn't absolve you of the need to design your platform

    I've heard so many times that you can migrate your on-premise servers to the cloud in a more or less 1:1 mapping and let your cloud provider do all the work of maintaining uptime and data integrity.

    And yet again we have proof that you still have to put in the effort to ensure you have geographically diversified replication and backups.

    1. wcpreston

      Re: Using the cloud doesn't absolve you of the need to design your platform

      Yes, you can migrate your on-premises systems to EC2 in a 1:1 mapping -- as long as one of those servers is the backup server. ;) . (Or, of course, use the cloud equivalent of such.) . These customers migrated but left backup out of the equation.

      1. Olivier2553

        Re: Using the cloud doesn't absolve you of the need to design your platform

        Or maybe they migrated because they had no backup to start with and were made to believe that now they would not need to even consider having any.

        1. wcpreston

          Re: Using the cloud doesn't absolve you of the need to design your platform

          "Made to believe." Not by the product descriptioin or the SLA, that's for sure. The product description comes right out and says you will lose 1 or 2 volumes a year if you have 1000 volumes -- so backup. The SLA does not imply anything about backup.

          So if they were "made to believe" that, they didn't get it from Amazon.

  6. JacobZ
    Joke

    Convenience of the cloud

    In the old days we all had to be constantly alert for the possibility of power loss, network outages, server failures, and other physical disasters.

    Nowadays you can pay a Cloud vendor to provide them for you.

  7. Tom 38

    Gonna get some downvotes for this, but if you use EBS, that sort of failure is to be expected. Amazon say if you have 1000 EBS volumes running for a year, you should expect one or two to fail and have to be restored from backup. Those numbers are obviously averages across all AWS DCs, whilst problems tend to be concentrated at particular DCs.

    If you put data in there that you absolutely must have restored, you should either use a different storage or take snapshots as regularly as you need them to be. EBS is the equivalent of local disk storage, its not cross-AZ, if you require proper resilience in cloud you should be using something like S3, and design your systems appropriately to be able to use that sort of storage.

    1. sabroni Silver badge

      re: if you use EBS, that sort of failure is to be expected.

      No Tom, just stop it!

      This thread is for ill informed rants about "the cloud is just someone elses computer". Someone who uses the cloud, understands exactly what this storage is supposed to provide and then points it out is gonna get short shrift.

      1. Doctor Syntax Silver badge

        Re: re: if you use EBS, that sort of failure is to be expected.

        Given that Cloud is sold to manglements on the basis that it takes away all those complications of dealing with their in-house expert staff and hands it over to people who'll just do the work without arguing those rants seem fully justified.

        It is somebody else's computer. When using your own computers you expect someone on your staff to look after them. If you've been persuaded to use somebody else's because it's cheaper you might reasonably expect that somebody else to do the looking after. Anything else smacks of keeping a dog and barking yourself.

        1. wcpreston

          Re: re: if you use EBS, that sort of failure is to be expected.

          "Manglements" should at least know how to read an SLA. That's what they are good at. Someone somewhere should mention that this doesn't include backup, and if they're doing their job then they would double-check that. And if they double-checked it, they would find out they are responsible for backup.

          1. Anonymous Coward
            Anonymous Coward

            Re: re: if you use EBS, that sort of failure is to be expected.

            Manglements *should* know how to do a lot of things.

            Reality differs.

          2. DJV Silver badge

            Re: re: if you use EBS, that sort of failure is to be expected.

            ""Manglements" should at least know how to read"

            Stopping right there explains many of the problems.

    2. Blane Bramble

      This sounds like EBS failed and wasn't even locally redundant though. That is a much bigger problem if true.

      1. Mr.Nobody

        EBS is a very secret system that no one is allowed to understand. The only details AWS will provide about it are its general service and uptime/redundancy, but you aren't allowed to know how it works or how its redundant.

        I can't understand why anyone in technology wouldn't want to understand how it works, but all these developers and PHBs seem to be fine with not knowing.

        I had a days long discussion with a developer at one point explaining to them that I have never had a storage failure on a raid system in the more than 20 years I have been doing this at many different orgs, some of them fortune 500. When it finally dawned on him that its not normal for businesses to incur data loss due to disk failure, he was shocked (because he was a developer and just thought about things realted to what he knew, like his desktop computer).

        No offence to any devs out there...

  8. Doctor Syntax Silver badge

    It gives "Availability Zone" a whole new layer of meaning.

  9. Anonymous Coward
    Anonymous Coward

    EC2 backups and replication are easy and cheap to backup. it has been around for more than 3 years. Nakivo makes a great ec2 B&R tool.

    A customer not Nakivo employee.

  10. IGnatius T Foobar !

    Get out

    Just one more example of why one company shouldn't have so much of the IT world in its own data centers. Get out of Amazon and find a smaller cloud provider. Diversity is what keeps the Internet reliable. Too much in one place, and you have this kind of problem.

  11. steviebuk Silver badge

    Well then...

    ...that's cause you get managers who say "I want to be infrastructure free. It will save us loads of money" and despite being told it won't and there is no such thing as infrastructure free, they want stuff done on the cheap. So think moving to the Cloud means their service will do everything for you backup wise, not understanding you have to set that up yourself and pay for it.

    1. Anonymous Coward
      Anonymous Coward

      Re: Well then...

      Are you sure you didn't mean to type "[they] want to be serverless, it will save [...] lots of money"

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like