back to article Amazon's AWS S3 cloud storage evaporates: Top websites, Docker stung

Amazon Web Services is scrambling to recover from a cockup at its facility in Virginia, US, that is causing its S3 cloud storage to fail. The internet giant has yet to reveal the cause of the breakdown, which is plaguing storage buckets hosted in the US-East-1 region. The malady kicked off around 0944 Pacific Time (1744 UTC) …

  1. Andy the ex-Brit
    Mushroom

    Strava

    Strava is down due to this! How can I check how many miles I've ridden so far this month?

    1. Hedley Phillips

      Re: Strava

      If it's not on Strava it didn't happen

    2. BongoJoe

      Re: Strava

      Has the Ordnance Survey site gone down?

  2. Steve Davies 3 Silver badge
    Paris Hilton

    But....

    Isn't the selling point of all this cloudy stuff that it does not go down???????

    I guess the AWS cloud must have pissed down on someone until all the clouds disappeared.

    Paris because she is good at shedding tears.

    1. Just a geek
      Mushroom

      Re: But....

      Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue. No matter how many outages Amazon, Azure, etc have, people still seem to think that it's made of magic.

      Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure.

      1. bombastic bob Silver badge
        Devil

        Re: But....

        "Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."

        always good advice.

        /me uses github. that's cloudy enough.

      2. Doctor Syntax Silver badge

        Re: But....

        "Deploy in the cloud by all means but still backup, replicate"

        We used to call that keeping a dog and barking yourself.

      3. Anonymous Coward
        Anonymous Coward

        @Geek

        "Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue."

        True, but who's fault is that? Isn't this exactly their whole selling point to begin with?

        I also don't think you should dismiss the whole argument that easily, because when properly set up you can get a redundant environment if you want to. The fact that it now doesn't work this way at AWS tells me more about their infrastructure than the (in)abilities of virtualized hosting.

      4. Dan 10

        Re: But....

        "Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."

        Unfortunately, that is what they've done. This fault affects a specific region, each of which contain multiple availability zones. Each zone constitutes a logical datacentre, comprising multiple physical datacentres (between 3 and 6 in each AZ, I believe). Deployment across two or more AZs in a given region *is* removing the single points of failure. Supposedly. Didn't work this time.

        AWS don't particularly recommend deploying across more than one region, because each region is effectively a completely different cloud, common in branding, usage etc, but connected only via the public internet. Replication between zones within a region is fast and free, but replication between regions is slower and costs.

        Ultimately though, a well-designed AWS deployment, consisting of all the fault-tolerant bells and whistles, still has no upfront cost and is thus way more achieveable than doing it on-prem. Said bells/whistles will make nuclear outages like this the cause of the rare downtime you do get.

    2. FIA Silver badge

      Re: But....

      Isn't the selling point of all this cloudy stuff that it does not go down???????

      No.

      It's that 'IT stuff' has become a utility, as in you only pay for what you use.

      This means you can build highly resilient and/or scaleable systems without huge upfront costs.

      Doesn't mean people do though. ;)

      1. Anonymous Coward
        Anonymous Coward

        Re: But....

        Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.

        1. Adam 52 Silver badge

          Re: But....

          "better have a contract guaranteeing they get back more than the down time costs"

          Why? If the downtime is less than you'd get elsewhere, or if the savings are more than the cost or if the faster time to market means you make massively more than the cost then you're still up.

        2. FIA Silver badge

          Re: But....

          Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.

          No, that's exactly the opposite of what you should be doing, you're looking to apportion blame after the fact. This is little use if your business has gone bust due to the downtime. Better to design systems that minimise the risk of this happening in the first place.

          Using the cloud allows you to build complex systems with little upfront cost.

          That's it.

          This does mean that smaller companies can build an infrastructure that's distributed and resilient in a way that wasn't financially feasible 10-15 years ago; and larger companies can potentially significantly reduce their DR expenditure.

          It doesn't mean it'll never fail or require administration or backup or all the other things you should be doing with an IT infrastructure. It just means you don't spend a boatload upfront on kit.

          1. Anonymous Coward
            Anonymous Coward

            Re: But....

            >It just means you don't spend a boatload upfront on kit.

            And generally have less say on how things are setup and ran. Which is fine I guess for some but I personally wouldn't work for a company where I was responsible for production mission critical software running on systems not owned by my company, with a contract or not. The edge to building a lifetime of skills is getting a say directly and indirectly on such matters.

            1. Anonymous Coward
              Anonymous Coward

              Re: But....

              That said the cloud has it purposes. Definitely a cost saver for non mission critical non proprietary stuff. Still when internal manufacturing is your core mission the cloud is more a distraction for the bean counters than something to look forward too.

            2. Anonymous Coward
              Anonymous Coward

              Re: But....

              "It just means you don't spend a boatload upfront on kit."

              That is understating it.

              One of the huge advantages of public cloud is that you pay for actual utilization vs scaling to peak. That is huge. It would be worth using public cloud just for that benefit. As anyone who has ever sized on prem infrastructure knows, you scale to peak (meaning that you are paying for infrastructure every day as though it is the busiest day in the history of the company, even though most days are not the busiest day in the history of the company) and then you add 20% to the sizing because no one can be certain that the peak will not increase at some point and you cannot just elastically add scale. That equals many, many billions of dollars every year in infrastructure which is purchased and never or very rarely used.

          2. Doctor Syntax Silver badge

            Re: But....

            "It just means you don't spend a boatload upfront on kit."

            It also means your interests aren't necessarily at the front of the queue when it comes to recovering from this sort of (not) outage.

      2. Doctor Syntax Silver badge

        Re: But....

        "Doesn't mean people do though."

        Maybe because it's been sold as cheaper than running your own data centre.

        When IT try to persuade the business to make provision for this sort of thing it's probably dismissed as IT being profligate again or even IT trying to bump up costs so their own service is still competitive.

    3. macjules

      Re: But....

      Guys, EVERYTHING goes down on you at sometime or another.

      1. Anonymous Coward
        Anonymous Coward

        Re: But....

        >Guys, EVERYTHING goes down on you at sometime or another.

        Of course but when you have a good working personal relationship with gentlemen equally professional to yourself and with badges that only contain a slightly different number to yourself then its causes a lot less panic and is much easier to contact the exactly right people on the exact right time and get the answers you can count on and the service you need without as others say having to worry about if someone is putting your company's interests first. If this is not the case with your company then you should start thinking about finding a new company.

        1. Anonymous Coward
          Anonymous Coward

          Re: But....

          >Guys, EVERYTHING goes down on you at sometime or another.

          Network goes down and occasionally hardware goes down but fun fact even after years of supporting it I have never seen an HP-UX OS crash due to software ever. Of course thanks to Red Hat and cheap commodity hardware rising (and not giving 2 shits about POSIX) and HP squeezing its last few customers I do probably sadly see more Linux kernel panics in my future sigh.

        2. macjules

          Re: But....

          Two words: Tier Caching. All our US sites use S3 in W.VA but not one was affected.

          1. Dan 10

            Re: But....

            @macjules

            Caching in the Cloudfront sense, or within S3 itself?

    4. TheVogon

      Re: But....

      No - that's never been the claim of cloud. They specifically tell you it's not 100% guaranteed. That's why anything that matters should be designed not to rely on a single cloud region....

    5. Anonymous Coward
      Anonymous Coward

      Re: But....

      Yes, exactly. All our deployment and storage services are dependent on S3 or S3 backed apps and were all critically impacted but you wouldn't have noticed because our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm. A fortune 500 company managing many hundreds of web services.

      1. Anonymous Coward
        Anonymous Coward

        Re: But....

        "our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm."

        Righto.

        Cache doesn't have everything in it though, so what happens when something uncached is required from somewhere else ?

        Works, but slowly ?

        Total failure of that request and anything related thereto?

        "High error rate"?

        Interested readers want to know.

    6. Jason Hindle

      Re: But....

      "Isn't the selling point of all this cloudy stuff that it does not go down???????"

      Not without multiple levels of geographic redundancy. It's hugely expensive for an event that might only happen once every few years. Those dumb pipes known as the carriers have it in spades*. The likes of Amazon and Google, no so much. I like carriers (from a technical perspective).

      * Even for voice mail, and no one uses that.

    7. Anonymous Coward
      Anonymous Coward

      Re: But....

      This is why a proper public cloud should be 100% automated. Not mostly automated like AWS.

  3. Anonymous Coward
    Anonymous Coward

    I'll punt these up in advance:

    "You can't trust the cloud"

    "It's the NSA installing a tap"

    "My data centre has been up for 30 years" (btw, so is Amazon's).

    Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt.

    1. Valarian

      "Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt."

      This, times a thousand. Any website or service pinning itself to a single node of a by-design distributed storage facility deserves whatever arse-kicking their customers choose to administer. The cloud, as is so often the case, is not the problem here - it's how it's being (mis)used that is the cause of any woes.

      1. Anonymous Coward
        Anonymous Coward

        To be fair, s3 is supposed to be multi-AZ and resilient within a region but as we saw with the last us-east outage and the recent London PoP outage tropical storms and power failures are no respecters of architectural diagrams.

      2. Mage Silver badge
        Mushroom

        Cloud selling and Pricing

        Yes, the "Cloud" is the problem. The way it's hyped, priced and marketed encourages beancounters to outsource to it.

        Almost Zero regulation.

        No 3rd party audit or oversight

        No transparency on backup, resilience, security or privacy. Just vendor hype.

        There are things that are appropriate for the "Cloud". However increasingly due to marketing of the Cloud vendors, the applications are inappropriate.

        1. Lusty

          Re: Cloud selling and Pricing

          No third party audit? Have you ever tried reading? AWS and Azure are probably the most audited data centres on the planet!

          1. Anonymous Coward
            Anonymous Coward

            Re: Cloud selling and Pricing

            Just shows you how useless audits are. All the audit is "do you do dumb things"? Nope. Okay you pass. I'm sure those accounting folk who do the audits like getting paid the big chunk of money my company pays them to say, yep, they say they do this.

            1. jMcPhee

              Re: Cloud selling and Pricing

              You left out some key steps the auditors follow:

              1) Pay us

              2) Show us you don't do dumb things

              3) Here are some pissant concerns/findings so we can say we did something. Oh, and here are some meaningless pain-in-the-ass findings to address because they are one auditor's special area of expertise - you should make his book mandatory reading.

              4) Your own in-house staff know about the real problems. But, "A prophet is not without honor except in his own country, among his own relatives, and in his own house.."

              5) Set up the next audit. Don't forget about (1)

              1. Anonymous Coward
                Anonymous Coward

                Re: Cloud selling and Pricing (@jMcPhee)

                I've worked at a place where the internal risk reviews, done by an employee of a different department in the same company, were exactly like that.

                Real serious issues were not allowed to be raised. By order of the management, the only issues that were allowed to be mentioned were the ones that could be acceptably mitigated at no cost.

                So something like only having one developer who knew anything serious about the company's internally developed customer-specific architecture-specific version of gcc, one not used (let alone maintained) anywhere else in the world, wasn't considered a recordable risk by the auditor.

                Then one year the developer in question went on holiday and didn't come back. Never seen again.

                Still, it mustn't have been a problem, because it wasn't recorded as a risk.

        2. Anonymous Coward
          Anonymous Coward

          Re: Cloud selling and Pricing

          "Almost Zero regulation"

          Almost? Care to list any?

          I'd like to see the actual energy bill. Not a percentage estimate of what you save, but a percentage estimate of what Amazon does NOT save. Where's that at, in a NSA vault perhaps?

          "...most audited data centres on the planet!"

          Audited for what? Do you actually know, honestly know? Do you believe everything you read? Read this: the USA doesn't spy on its citizens.

          1. Lusty

            Re: Cloud selling and Pricing

            "Audited for what? Do you actually know, honestly know?"

            Yes. I and everyone else who bothered to look do know. It's quite well covered actually, and has to be to allow architects to do our work properly.

            Azure details are in the trust centre.

            https://azure.microsoft.com/en-gb/support/trust-center/

            AWS is in their compliance and assurance pages

            https://aws.amazon.com/compliance/

            1. TheVogon

              Re: Cloud selling and Pricing

              "Audited for what? Do you actually know, honestly know"

              There are 2 main types of data centre audit - security and environmental.

              Usually a security audit would be a once off and would certify the facility to a specific standard - or just generally that it was secure by design and process with no significant security risks.

              An environmental audit should be conducted yearly on any critical datacentres, MERs, SERs, etc. Usually after your annual deep clean... This will give you an extensive report on everything from aircon, UPS and fire alarms to the type and size of the particles in the air! For anyone who has any of the above facilities who isn't do this then you should be. Two companies that can help are Bureau Veritas and Aquacair...

  4. Anonymous Coward
    Anonymous Coward

    I guess this guy finally broke AWS...

    https://www.reddit.com/r/DataHoarder/comments/5s7q04/i_hit_a_bit_of_a_milestone_today/

  5. Anonymous Coward
    Anonymous Coward

    It's not "high error rates", it's total failure to accept connections!

    $ telnet s3.amazonaws.com 443

    Trying 54.231.82.140...

    ^C

    $ telnet s3-external-1.amazonaws.com 443

    Trying 54.231.33.168...

    ^C

    These are the endpoints listed at http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

    1. Lusty

      An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks.

      1. Sandtitz Silver badge
        Facepalm

        @Lusty

        "An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks."

        How is telnet to port 443 a 'fake connection and a security risk'?

        How can you drop telnet connections to port 443 but allow legitimate SSL traffic to the same port?

        1. Lusty

          Re: @Lusty

          "How is telnet to port 443 a 'fake connection and a security risk'?"

          The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.

          If you lot think ping is a good way to test a network then you need to get out more. For ping to work, it needs the service accessible and running on the endpoint you're testing and requires that nothing drops the traffic in between. It's quite a common thing and might confirm a connection is up, but lack of a ping response tells you nothing about whether that connection is down, certainly not a non-ping service on that same endpoint.

          1. Alister

            Re: @Lusty

            @Lusty,

            You put:

            The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.

            And just how do you imagine a TLS session starts? If you are using telnet to prove or disprove connectivity exists to a host, then the initial connection attempt is all you need, and that is the same for any tcp connection, whether it be a TLS negotiation or any other protocol.

            I agree with you about ping, most secured environments block ICMP traffic nowadays, however, it and traceroute are still useful for investigating latency and routing so long as you temporarily enable it on the endpoint.

            1. Lusty

              Re: @Lusty

              TLS works at the transport layer, clue is in the name. The security device sitting between the AWS/Azure host and the network would likely terminate any connections which are not actually setting up a secure transport as part of that connection. In case you missed it, both services have installed custom silicon on the network side of the NIC for exactly this purpose.

              Telnet doesn't expose the transport layer, and so if this were terminated it would indeed show as no connectivity when the service is up for legitimate traffic.

              I've not tested whether these services work with a Telnet test - my point was that just like ICMP, it proves nothing about the service itself.

      2. Anonymous Coward
        Anonymous Coward

        Umm, do you know the basics of networking? Even if Amazon had the most amazing WAF that specifically looked for telnet vs. curl or code, they'd have to let them connect first on the standard port to start talking. Until a program starts talking specific protocols and going, the WAF is going to have to let them start.

        Having telnet (or nc, or anything else in the world that can make a network TCP connection) all operates the same at the most basic levels of connecting out to a remote server on a specific port.

      3. disgruntled yank

        So let's not use telnet:

        $ curl https://s3-external-1.amazonaws.com

        ^C

        [me@mine ~]$ curl https://s3.amazonaws.com

        ^C

      4. Alister

        @Lusty

        I think you just blew any credibility you had to comment on networking subjects.

        1. Lusty

          @Alister, see other response regarding TLS and Telnet. Right back at you.

    2. Doctor Syntax Silver badge

      It's not "high error rates", it's total failure to accept connections!

      I suppose 100% counts as high.

  6. paul-m-w72

    The internet is borked...........

    Even down detector is down so you can't tell anyone your site is down.......

  7. Alan Sharkey

    I wondered why DPReview (a camera site) was down till I rwad this and realised that Amazon own DPReview.

    Makes the "cloud" look more like cirrus (light and wispy) rather than Cumulus.

  8. Fan of Mr. Obvious

    Will the re-write history?

    Since the Current Status on the service health dashboard shows green, it will be interesting to see if they update the Status History to show it was FUBAR on 02/28/2017.

    1. Steven Raith

      Re: Will the re-write history?

      The status page may be running on AWS gear.

      Oh the hypocrisirony.

      Clicky for tweety that says "The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates."

      Steven "tempting fate" R

  9. nsld

    FFS

    If you have single points of failure you deserve everything you get.

    That's not the fault of 'the cloud' it's down to incompetent shitgibbons who shouldn't be within a thousand miles of any critical technology.

    1. Anonymous Coward
      Anonymous Coward

      Re: FFS

      Good sir, I have voted you upwards for using the concatenation of "shitgibbons." Excelsior!

      "my increased error rate is 100%"

      That was a winner too! I love it!

      'We had 10% errors, at FIRST, which is pretty bad, then it increased all the way too 100% errors which should be total failure, but until the dashboard that it clobbered recovers to tell us otherwise, we are calling this "increased error rate." It sounds nice. Like saying; We have no services for you at this time, but you're important to us so, have a great day!'

    2. albaleo

      Re: FFS

      "If you have single points of failure you deserve everything you get."

      Are you suggesting that multiple points of failure are better? Maybe I'm being pedantic, but I've never quite understood the expression. I've had to deal with people who wanted to put part of our system on AWS and another part on Azure to avoid "a single point of failure". That the two parts are required for the system to operate and thus the chances of the system being down would increase didn't seem to cross people's minds.

      1. Anonymous Coward
        Anonymous Coward

        Re: FFS

        The concept does not imply that there be multiple points, each of which are required for proper operation of the system, but multiple redundant paths, processes, structures etc, such that failure of any one does not compromise the system. Think of a physical mass being held up by chain of links, wherein a failure of any single link would cause the load to fall, and a multi-stranded cable, wherein the failure of any strand would continue to hold the load.

  10. Anonymous Coward
    Anonymous Coward

    1) us-east-1 is the cheapest was for a looong time the cheapest aws region (and it still joint cheapest), so plenty of people will have their eggs in that particular basket

    2) Plenty of people are pretty dumb and trust the "multi-az" aspect of a single region. You're on the cloud. Use more than one region (or, frightening thought, more than one provider?). It's exactly the same effort and saves you from nightmares like this. Same as using more than one datacentre. The AZ should be thought of as a (very large) rack, not as a DC.

  11. inmypjs Silver badge

    Amazon Music borked?

    Been telling me everything is not available and try later for a couple of hours. Amazon prime video seems ok.

    1. bsdnazz

      Re: Amazon Music borked?

      Yes. Our Echo Dot cannot play any music and while I can logon to our Amazon Music Library web site it cannot play any of our tracks - "We're Sorry We are unable to complete your action. Please try again later."

      Good job I also uploaded everything to Google Play and still have local copies for our Sonos system.

      1. bsdnazz

        Re: Amazon Music borked?

        Wow! One post on El Reg and the Amazon Music service is working for me again.

      2. Mage Silver badge

        Re: Amazon Music borked?

        I just have multiple copies of all my music on our own multiple systems. Why would I store my media on the cloud to play it when:

        1) I have only one internet connection.

        2) I have a cap

        3) The "cloud" isn't available walking, cycling or in the car.

        1. inmypjs Silver badge

          Re: Amazon Music borked?

          "I just have multiple copies of all my music"

          Are we supposed to care?

          Amazon music comes free with Prime which I bought mostly for free delivery and a bit for Amazon produced video content. As I already paid for it occasionally I feel obliged to browse and listen to some of the music included with Prime.

          Today was one of those days and it went tits up. No great loss, the biggest annoyance being from thinking it may be a problem with the tablet I was using or Amazon account.

    2. W4YBO

      Re: Amazon Music borked?

      Prime Video is okay, but oddly, won't display captions. Tried several videos that have shown captions in the past. ???

      Edit: Damn if I didn't click "Submit", and the captions appeared.

  12. TReko
    FAIL

    Weasel words

    >AWS, for some reason, insists this isn't an "outage" but rather a case of "increased error rates" for its

    >most popular cloud service.

    "Outage" means that they will have to cough up money due to service level agreements.

    We have the same issue with Google Cloud, it never has an "outage" when it goes titsup, it just has "issues"

  13. Disk0
    FAIL

    Fake news

    now on an Amazon dashboard near you...

    1. Adam 52 Silver badge

      Re: Fake news

      Well the AWS support console shows an outage under "service notifications".

      Starting to come back up now, btw.

    2. Anonymous Coward
      Anonymous Coward

      Re: Fake news

      .. yeah I blame Trump. Amazon is on the wrong side of the wall.

  14. ruet

    Service Health Dashboard

    Hah, they couldn't even update thier own status page correctly:

    "Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue."

    1. Anonymous Coward
      FAIL

      The whole cloud rests on a single turtle

      The dashboard was not the only central AWS system affected. The console and API also seem to rely heavily on the US-EAST-1 storage with increased errors rates when trying to deploy elsewhere or change entries in Route 53.

    2. Doctor Syntax Silver badge

      Re: Service Health Dashboard

      "what we believe will remediate the issue."

      Sigh. remedy or maybe ameliorate but make up your mind.

  15. Anonymous Coward
    Anonymous Coward

    CloudFog / Someone-Else's-Computer -vs.- Industry PR

    ~ S3 down, ouch! But it won't impact cloud business much. Why? Corporations are addicted to cost cutting. Its the 'New Innovation, Stupid'!

    ~ But industry still doesn't want anyone thinking about 'Someone Else's Computer'. Instead we should use buzzwords like hyperscale:

    http://www.zdnet.com/article/stop-saying-the-cloud-is-just-someone-elses-computer-because-its-not/

    http://www.techrepublic.com/article/is-the-cloud-really-just-someone-elses-computer/

    ~ Its a hyperscale failure today. And next time there's an even bigger cloud config / data center / net outage, its still not 'someone else's computer'...

    1. Anonymous Coward
      Anonymous Coward

      Re: CloudFog / Someone-Else's-Computer -vs.- Industry PR

      Requires obligatory xkcd (908).

  16. Anonymous Coward
    Anonymous Coward

    Accounts cloudy app Xero is down as well, always useful on the last invoicing day of the month, and lets not talk about people who get paid on the 1st of the month ... needing xero to process those payments may make it awkward in the office tomorrow morning...

  17. AGFLawless
    Headmaster

    Drudge knocks Kellyanne off headline for AMAZON

    Wow it took an internet outage to Knock Sweet Kellyanne off Drudge's Headline Pic?

    Please put her back!

  18. Anonymous Coward
    Anonymous Coward

    Still up for me

    S3 console is down, but the app using S3 is working fine. So it does look to be a partial outage.

  19. Anonymous Coward
    Anonymous Coward

    Fire Stick

    So Amazon Fire Sticks become completely unusable during an S3 outage, can't run any local apps or do anything. Lump of plastic.

    And my Motorola security camera also stopped working due to it relying on .... S3

    A classic example of how much a bad idea it is to reply on cloud services from unreliable vendors.

  20. Anonymous Coward
    Anonymous Coward

    Maybe it's time to condsider Cloudian

    Private S3 Cloud Storage

  21. John Geek

    amazon's own webpile couldn't deliver my order history a hour ago....

    I love the CIO's that mandate all internal critical systems are running on high availability high grade hardware, with redundant fiberswitches, multipath network connections, san storage, etc, then decides its all too expensive so outsources things to the likes of Amazon and Google, who are using the *cheapest* of commodity hardware they can get away with.... The irony of this escapes the suits.

    1. Bandikoto

      "Services" is a completely different accounting bucket from "capital hardware" and why do you hold a grudge against your CIO for enclouderating everything? He got a big fat bonus for that!

  22. Sirius Lee

    Calling bullshit

    Our storage is on S3 in the US East and we've not experienced problems or losses. Maybe there is some other problem some users have which manifests itself as an S3 problem.

    On the back of this story we've wasted time checking our store on the S3 service and have not found any issues.

    1. James 47

      Re: Calling bullshit

      If you'd checked it a few hours ago you might have seen something. All our buckets disappeared. All back to normal now.

  23. a_yank_lurker

    Wondering What Was Happening

    I noticed several sites having issues today while others were fine. This might explain what was happening.

  24. Haku

    Over exaggeration.

    People who flippantly use phrases such as:

    "That's like half the internet."

    "It's all over Facebook."

    "It broke the internet."

    Should be made to stand in a cherry picker in front of a blackboard the size of a skyscraper and write out a billion times "I will not exaggerate ever again."

  25. WibbleMe

    Chaps don't put all your eggs in one basket AWS/Google/DigitalOcean/Linode spread them out.

    My advice, switch it off and go to bed.

  26. WibbleMe

    Quotes...

    Who brought coffee into the server room?

    1 hour before I go on holiday and I'm the only guy who can fix it!

  27. This post has been deleted by its author

  28. Anonymous Coward
    Anonymous Coward

    Bigger Problem

    This dramatically illustrates the U.S. national-security vulnerability of the whole interweb to hostile take-down. Like so many other infrastructure constructs - e. g., the electric grid, gas & oil pipelines, etc, they have been developed with only the least expensive, most "efficient", criteria in mind, with security and reliability under duress, as an afterthought. To add such security after the fact is extremely expensive, more so than would have been case if designed into the original system. Such was the situation pointed out in the weeks following 9/11 by ex-CIA director Woolsey regarding existing infrastructure, and little or nothing has been done to remedy the problem in existing or new infrastructure since.

  29. bombastic bob Silver badge
    Devil

    cloud overrated for most things

    "The Cloud" has its uses, like shared docs stored on google docs, or source on github. But if you don't have some means of "failure override" (like using a private repository, or e-mail documents to people) you're totally b0rked when the cloud has another 'technicolor belch'.

    I can imagine people using Office 365, google's javascript document editors, or even a cloudy-based mail service, running about like chickens with heads cut off, if their entire business model has them as 'single point of failure'.

    I have to wonder who didn't hear about "distributed load" "replication" and "automatic failover" over at AWS...

  30. Anonymous Coward
    Anonymous Coward

    The Irony...

    Downdetector.com is Down... guess its an AWS site.

  31. Lib Serum

    No doubt. Another insidious Russian plot!

    www.youtube.com/watch?v=2TszIJX-F4U&feature=youtu.be&t=7

  32. Anonymous Coward
    Mushroom

    Definition of "The Cloud" ?

    So I guess "The Cloud" now potentially means "you're system has gone up in smoke, and has been vapourised"

    I guess The Cloud could be heaven for computers when they go and die.

    "ah my machine is in the cloud ..."

    As a society, I am now thinking Star Trek's Next Generation "Binards" were actually a prophetic warning to us all, and that was about 30 years ago.

    ( sorry for the obvious icon choice ;) )

  33. DocNo

    No surprise

    Look, everyone piles everything on AWS East because it's the cheapest (or among the cheapest) of their datacenters.

    It's the cheapest because it's the oldest.

    It's not hard to do the math. Or it shouldn't be. It just proves that people really do stink at assessing risk.

    Also as others have pointed out, it's not Amazons fault that applications fail when they have an eventual outage - it's why Amazon (and other cloud providers) have multiple data centers that are geographically dispersed. It's up to appliction owners/users to design redundancy into their applications. Indeed AWS makes it easier and far more accessable to everyone to build proper geo-diverse disaster recovery into their applications that has ever been possible before. Technology and functionality previously available only to the biggest organizations is now accessable to just about everyone.

    People just don't want to pay for it, deluding themselves that it will never happen to them. Surprise!

    1. Anonymous Coward
      Anonymous Coward

      Re: No surprise

      It's not just about where you put your snazzy app stuff. It's also a lot of support infrastructure (Console, Status Page blah blah) is hosted in US-East-1 and not replicated out to other Regions. So a failure of a important service like S3 (that seems to be the pillar of the supporting services) leaves in the dark to reacting to the incident.

      If you're in a co-lo DC, at least you can ring the DC support; ask a tech to check what's going on behind the scenes and make a local switch to another piece of kit. On AWS... you even need to automate that failover and even that might break if API breaks.

  34. Anonymous Coward
    Anonymous Coward

    Perhaps Storm Doris

    Blew the clouds away ...

  35. Anonymous Coward
    Anonymous Coward

    my survivable disaster

    The android "walk my dog" app failed to sync last night's 4 mile walk with mitzy (my german shepherd) to magic cloud land which I'm going to attribute to this s3 outage debacle.

    This is clearly an unacceptable disaster of biblical proportions. Not.

    I'll be going out for an hour with the dog in the fresh air again tonight. During that I won't be worrying if virtual clouds are present but I do expect to be keeping an eye out for real clouds above

  36. Jason Hindle

    As the T-Shirt Says

    It's not a cloud! It's someone else's computer*.

    * And it can break.

  37. Potemkine Silver badge

    Bloody solar flares!

    For sysadmins, choose your explanation here

  38. Frank Jennings - The Cloud Lawyer

    Quick, dig out the contract to see what protections you've got.

    Clause 10: The service offerings are provided “As Is.” We…make no representations or warranties of any kind…that the service offerings or third party content will be uninterrupted.” https://aws.amazon.com/agreement/

    If you didn't like that one, you definitely won't like clause 11.

  39. Anonymous Coward
    Anonymous Coward

    But the "Cloud" is infallible, as we're always sold.

  40. Anonymous Coward
    Anonymous Coward

    Re: Whats this GUI thingy?

    It broke our system in two places:

    1. We take a data feed from TfL. That died for five hours so no traffic updates. Nothing we can do as its not our kit, we just consume the data when its there.

    2. We then discovered that cdn.leafletjs.com was also down. We use their CDN. That was our fault as we relied on a CDN server being up. Lesson learnt and 15 mins later we were back up.

    That was the worst outage we've had and it wasn't our fault, Highly annoying but since we paid exactly 0p for the lot we cannot complain.

    I have no doubt that far bigger businesses are talking to Amazon re outages and service penalties. Amazon can use weasel words like "100% error rate" but I'd be gobsmacked if money doesn't start flowing from Amazon to big clients (even if its service credits).

  41. stevebp

    You built your cloud service on what?

    Don't put all your eggs in one basket. If AWS is your 'cloud strategy', make sure another cloud provider (or your own private cloud) is in an 'active-active' configuration strategy as well. You won't regret it.

  42. Howard Hanek
    Holmes

    A Clue?

    A poison cloudlet dipped in carre administered by a foul wind?

  43. PeterM42
    Facepalm

    Rather than "remediating" the problem.....

    .....wouldn't it be better to FIX it?

    The problem with "clouds", as I have always said: they blow away in the wind!

  44. Anonymous Coward
    Anonymous Coward

    ... yeah --- but AWS is cheaper than ..

    hosting a DC on site -- you can blame someone else when it fails !

  45. anonymous boring coward Silver badge

    This should happen more often, in order to learn how vulnerable we are.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like