Amazon Web Services is scrambling to recover from a cockup at its facility in Virginia, US, that is causing its S3 cloud storage to fail. The internet giant has yet to reveal the cause of the breakdown, which is plaguing storage buckets hosted in the US-East-1 region. The malady kicked off around 0944 Pacific Time (1744 UTC) …

COMMENTS

Post your comment

House rules Send corrections

Add to 'My topics'

Tuesday 28th February 2017 18:47 GMT Andy the ex-Brit

Strava

Strava is down due to this! How can I check how many miles I've ridden so far this month?

8 0 Reply
1. Tuesday 28th February 2017 20:51 GMT Hedley Phillips
  
  Re: Strava
  
  If it's not on Strava it didn't happen
  
  11 0 Reply
2. Friday 3rd March 2017 15:38 GMT BongoJoe
  
  Re: Strava
  
  Has the Ordnance Survey site gone down?
  
  0 0 Reply
Tuesday 28th February 2017 18:57 GMT Steve Davies 3

But....

Isn't the selling point of all this cloudy stuff that it does not go down???????

I guess the AWS cloud must have pissed down on someone until all the clouds disappeared.

Paris because she is good at shedding tears.

26 3 Reply
1. Tuesday 28th February 2017 20:18 GMT Just a geek
  
  Re: But....
  
  Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue. No matter how many outages Amazon, Azure, etc have, people still seem to think that it's made of magic.
  
  Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure.
  
  18 1 Reply
  1. Tuesday 28th February 2017 22:05 GMT bombastic bob
    
    Re: But....
    
    "Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."
    
    always good advice.
    
    /me uses github. that's cloudy enough.
    
    1 1 Reply
  2. Tuesday 28th February 2017 23:03 GMT Doctor Syntax
    
    Re: But....
    
    "Deploy in the cloud by all means but still backup, replicate"
    
    We used to call that keeping a dog and barking yourself.
    
    0 6 Reply
  3. Wednesday 1st March 2017 04:19 GMT Anonymous Coward
    
    @Geek
    
    "Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue."
    
    True, but who's fault is that? Isn't this exactly their whole selling point to begin with?
    
    I also don't think you should dismiss the whole argument that easily, because when properly set up you can get a redundant environment if you want to. The fact that it now doesn't work this way at AWS tells me more about their infrastructure than the (in)abilities of virtualized hosting.
    
    3 0 Reply
  4. Wednesday 1st March 2017 09:50 GMT Dan 10
    
    Re: But....
    
    "Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."
    
    Unfortunately, that is what they've done. This fault affects a specific region, each of which contain multiple availability zones. Each zone constitutes a logical datacentre, comprising multiple physical datacentres (between 3 and 6 in each AZ, I believe). Deployment across two or more AZs in a given region *is* removing the single points of failure. Supposedly. Didn't work this time.
    
    AWS don't particularly recommend deploying across more than one region, because each region is effectively a completely different cloud, common in branding, usage etc, but connected only via the public internet. Replication between zones within a region is fast and free, but replication between regions is slower and costs.
    
    Ultimately though, a well-designed AWS deployment, consisting of all the fault-tolerant bells and whistles, still has no upfront cost and is thus way more achieveable than doing it on-prem. Said bells/whistles will make nuclear outages like this the cause of the rare downtime you do get.
    
    4 0 Reply
2. Tuesday 28th February 2017 20:41 GMT FIA
  
  Re: But....
  
  Isn't the selling point of all this cloudy stuff that it does not go down???????
  
  No.
  
  It's that 'IT stuff' has become a utility, as in you only pay for what you use.
  
  This means you can build highly resilient and/or scaleable systems without huge upfront costs.
  
  Doesn't mean people do though. ;)
  
  12 2 Reply
  1. Tuesday 28th February 2017 21:12 GMT Anonymous Coward
    
    Re: But....
    
    Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.
    
    8 1 Reply
    1. Tuesday 28th February 2017 21:20 GMT Adam 52
      
      Re: But....
      
      "better have a contract guaranteeing they get back more than the down time costs"
      
      Why? If the downtime is less than you'd get elsewhere, or if the savings are more than the cost or if the faster time to market means you make massively more than the cost then you're still up.
      
      5 2 Reply
    2. Tuesday 28th February 2017 21:32 GMT FIA
      
      Re: But....
      
      Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.
      
      No, that's exactly the opposite of what you should be doing, you're looking to apportion blame after the fact. This is little use if your business has gone bust due to the downtime. Better to design systems that minimise the risk of this happening in the first place.
      
      Using the cloud allows you to build complex systems with little upfront cost.
      
      That's it.
      
      This does mean that smaller companies can build an infrastructure that's distributed and resilient in a way that wasn't financially feasible 10-15 years ago; and larger companies can potentially significantly reduce their DR expenditure.
      
      It doesn't mean it'll never fail or require administration or backup or all the other things you should be doing with an IT infrastructure. It just means you don't spend a boatload upfront on kit.
      
      10 1 Reply
      1. Tuesday 28th February 2017 21:49 GMT Anonymous Coward
        
        Re: But....
        
        >It just means you don't spend a boatload upfront on kit.
        
        And generally have less say on how things are setup and ran. Which is fine I guess for some but I personally wouldn't work for a company where I was responsible for production mission critical software running on systems not owned by my company, with a contract or not. The edge to building a lifetime of skills is getting a say directly and indirectly on such matters.
        
        9 1 Reply
        
        Tuesday 28th February 2017 22:06 GMT Anonymous Coward
        
        Re: But....
        
        That said the cloud has it purposes. Definitely a cost saver for non mission critical non proprietary stuff. Still when internal manufacturing is your core mission the cloud is more a distraction for the bean counters than something to look forward too.
        
        4 0 Reply
        
        Friday 10th March 2017 05:07 GMT Anonymous Coward
        
        Re: But....
        
        "It just means you don't spend a boatload upfront on kit."
        
        That is understating it.
        
        One of the huge advantages of public cloud is that you pay for actual utilization vs scaling to peak. That is huge. It would be worth using public cloud just for that benefit. As anyone who has ever sized on prem infrastructure knows, you scale to peak (meaning that you are paying for infrastructure every day as though it is the busiest day in the history of the company, even though most days are not the busiest day in the history of the company) and then you add 20% to the sizing because no one can be certain that the peak will not increase at some point and you cannot just elastically add scale. That equals many, many billions of dollars every year in infrastructure which is purchased and never or very rarely used.
        
        0 0 Reply
      2. Tuesday 28th February 2017 23:09 GMT Doctor Syntax
        
        Re: But....
        
        "It just means you don't spend a boatload upfront on kit."
        
        It also means your interests aren't necessarily at the front of the queue when it comes to recovering from this sort of (not) outage.
        
        5 0 Reply
  2. Tuesday 28th February 2017 23:06 GMT Doctor Syntax
    
    Re: But....
    
    "Doesn't mean people do though."
    
    Maybe because it's been sold as cheaper than running your own data centre.
    
    When IT try to persuade the business to make provision for this sort of thing it's probably dismissed as IT being profligate again or even IT trying to bump up costs so their own service is still competitive.
    
    3 0 Reply
3. Wednesday 1st March 2017 00:24 GMT macjules
  
  Re: But....
  
  Guys, EVERYTHING goes down on you at sometime or another.
  
  6 0 Reply
  1. Wednesday 1st March 2017 00:58 GMT Anonymous Coward
    
    Re: But....
    
    >Guys, EVERYTHING goes down on you at sometime or another.
    
    Of course but when you have a good working personal relationship with gentlemen equally professional to yourself and with badges that only contain a slightly different number to yourself then its causes a lot less panic and is much easier to contact the exactly right people on the exact right time and get the answers you can count on and the service you need without as others say having to worry about if someone is putting your company's interests first. If this is not the case with your company then you should start thinking about finding a new company.
    
    2 0 Reply
    1. Wednesday 1st March 2017 01:10 GMT Anonymous Coward
      
      Re: But....
      
      >Guys, EVERYTHING goes down on you at sometime or another.
      
      Network goes down and occasionally hardware goes down but fun fact even after years of supporting it I have never seen an HP-UX OS crash due to software ever. Of course thanks to Red Hat and cheap commodity hardware rising (and not giving 2 shits about POSIX) and HP squeezing its last few customers I do probably sadly see more Linux kernel panics in my future sigh.
      
      2 4 Reply
    2. Wednesday 1st March 2017 07:59 GMT macjules
      
      Re: But....
      
      Two words: Tier Caching. All our US sites use S3 in W.VA but not one was affected.
      
      1 0 Reply
      1. Wednesday 1st March 2017 14:18 GMT Dan 10
        
        Re: But....
        
        @macjules
        
        Caching in the Cloudfront sense, or within S3 itself?
        
        0 0 Reply
4. Wednesday 1st March 2017 03:17 GMT TheVogon
  
  Re: But....
  
  No - that's never been the claim of cloud. They specifically tell you it's not 100% guaranteed. That's why anything that matters should be designed not to rely on a single cloud region....
  
  2 0 Reply
5. Wednesday 1st March 2017 05:52 GMT Anonymous Coward
  
  Re: But....
  
  Yes, exactly. All our deployment and storage services are dependent on S3 or S3 backed apps and were all critically impacted but you wouldn't have noticed because our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm. A fortune 500 company managing many hundreds of web services.
  
  3 0 Reply
  1. Wednesday 1st March 2017 10:30 GMT Anonymous Coward
    
    Re: But....
    
    "our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm."
    
    Righto.
    
    Cache doesn't have everything in it though, so what happens when something uncached is required from somewhere else ?
    
    Works, but slowly ?
    
    Total failure of that request and anything related thereto?
    
    "High error rate"?
    
    Interested readers want to know.
    
    3 0 Reply
6. Wednesday 1st March 2017 08:08 GMT Jason Hindle
  
  Re: But....
  
  "Isn't the selling point of all this cloudy stuff that it does not go down???????"
  
  Not without multiple levels of geographic redundancy. It's hugely expensive for an event that might only happen once every few years. Those dumb pipes known as the carriers have it in spades*. The likes of Amazon and Google, no so much. I like carriers (from a technical perspective).
  
  * Even for voice mail, and no one uses that.
  
  2 0 Reply
7. Friday 10th March 2017 04:46 GMT Anonymous Coward
  
  Re: But....
  
  This is why a proper public cloud should be 100% automated. Not mostly automated like AWS.
  
  0 0 Reply
Tuesday 28th February 2017 19:03 GMT Anonymous Coward

I'll punt these up in advance:

"You can't trust the cloud"

"It's the NSA installing a tap"

"My data centre has been up for 30 years" (btw, so is Amazon's).

Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt.

18 2 Reply
1. Tuesday 28th February 2017 19:18 GMT Valarian
  
  "Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt."
  
  This, times a thousand. Any website or service pinning itself to a single node of a by-design distributed storage facility deserves whatever arse-kicking their customers choose to administer. The cloud, as is so often the case, is not the problem here - it's how it's being (mis)used that is the cause of any woes.
  
  17 3 Reply
  1. Tuesday 28th February 2017 19:23 GMT Anonymous Coward
    
    To be fair, s3 is supposed to be multi-AZ and resilient within a region but as we saw with the last us-east outage and the recent London PoP outage tropical storms and power failures are no respecters of architectural diagrams.
    
    15 0 Reply
  2. Tuesday 28th February 2017 19:55 GMT Mage
    
    Cloud selling and Pricing
    
    Yes, the "Cloud" is the problem. The way it's hyped, priced and marketed encourages beancounters to outsource to it.
    
    Almost Zero regulation.
    
    No 3rd party audit or oversight
    
    No transparency on backup, resilience, security or privacy. Just vendor hype.
    
    There are things that are appropriate for the "Cloud". However increasingly due to marketing of the Cloud vendors, the applications are inappropriate.
    
    10 18 Reply
    1. Tuesday 28th February 2017 20:07 GMT Lusty
      
      Re: Cloud selling and Pricing
      
      No third party audit? Have you ever tried reading? AWS and Azure are probably the most audited data centres on the planet!
      
      11 2 Reply
      1. Tuesday 28th February 2017 20:20 GMT Anonymous Coward
        
        Re: Cloud selling and Pricing
        
        Just shows you how useless audits are. All the audit is "do you do dumb things"? Nope. Okay you pass. I'm sure those accounting folk who do the audits like getting paid the big chunk of money my company pays them to say, yep, they say they do this.
        
        7 0 Reply
        
        Tuesday 28th February 2017 20:32 GMT jMcPhee
        
        Re: Cloud selling and Pricing
        
        You left out some key steps the auditors follow:
        
        1) Pay us
        
        2) Show us you don't do dumb things
        
        3) Here are some pissant concerns/findings so we can say we did something. Oh, and here are some meaningless pain-in-the-ass findings to address because they are one auditor's special area of expertise - you should make his book mandatory reading.
        
        4) Your own in-house staff know about the real problems. But, "A prophet is not without honor except in his own country, among his own relatives, and in his own house.."
        
        5) Set up the next audit. Don't forget about (1)
        
        9 0 Reply
        
        Tuesday 28th February 2017 23:56 GMT Anonymous Coward
        
        Re: Cloud selling and Pricing (@jMcPhee)
        
        I've worked at a place where the internal risk reviews, done by an employee of a different department in the same company, were exactly like that.
        
        Real serious issues were not allowed to be raised. By order of the management, the only issues that were allowed to be mentioned were the ones that could be acceptably mitigated at no cost.
        
        So something like only having one developer who knew anything serious about the company's internally developed customer-specific architecture-specific version of gcc, one not used (let alone maintained) anywhere else in the world, wasn't considered a recordable risk by the auditor.
        
        Then one year the developer in question went on holiday and didn't come back. Never seen again.
        
        Still, it mustn't have been a problem, because it wasn't recorded as a risk.
        
        2 0 Reply
    2. Tuesday 28th February 2017 20:09 GMT Anonymous Coward
      
      Re: Cloud selling and Pricing
      
      "Almost Zero regulation"
      
      Almost? Care to list any?
      
      I'd like to see the actual energy bill. Not a percentage estimate of what you save, but a percentage estimate of what Amazon does NOT save. Where's that at, in a NSA vault perhaps?
      
      "...most audited data centres on the planet!"
      
      Audited for what? Do you actually know, honestly know? Do you believe everything you read? Read this: the USA doesn't spy on its citizens.
      
      6 7 Reply
      1. Wednesday 1st March 2017 13:00 GMT Lusty
        
        Re: Cloud selling and Pricing
        
        "Audited for what? Do you actually know, honestly know?"
        
        Yes. I and everyone else who bothered to look do know. It's quite well covered actually, and has to be to allow architects to do our work properly.
        
        Azure details are in the trust centre.
        
        https://azure.microsoft.com/en-gb/support/trust-center/
        
        AWS is in their compliance and assurance pages
        
        https://aws.amazon.com/compliance/
        
        1 0 Reply
        
        Friday 3rd March 2017 20:24 GMT TheVogon
        
        Re: Cloud selling and Pricing
        
        "Audited for what? Do you actually know, honestly know"
        
        There are 2 main types of data centre audit - security and environmental.
        
        Usually a security audit would be a once off and would certify the facility to a specific standard - or just generally that it was secure by design and process with no significant security risks.
        
        An environmental audit should be conducted yearly on any critical datacentres, MERs, SERs, etc. Usually after your annual deep clean... This will give you an extensive report on everything from aircon, UPS and fire alarms to the type and size of the particles in the air! For anyone who has any of the above facilities who isn't do this then you should be. Two companies that can help are Bureau Veritas and Aquacair...
        
        0 0 Reply
Tuesday 28th February 2017 19:13 GMT Anonymous Coward

I guess this guy finally broke AWS...

https://www.reddit.com/r/DataHoarder/comments/5s7q04/i_hit_a_bit_of_a_milestone_today/

2 0 Reply
Tuesday 28th February 2017 19:20 GMT Anonymous Coward

It's not "high error rates", it's total failure to accept connections!

$ telnet s3.amazonaws.com 443

Trying 54.231.82.140...

^C

$ telnet s3-external-1.amazonaws.com 443

Trying 54.231.33.168...

^C

These are the endpoints listed at http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

3 0 Reply
1. Tuesday 28th February 2017 20:09 GMT Lusty
  
  An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks.
  
  0 22 Reply
  1. Tuesday 28th February 2017 20:20 GMT Sandtitz
    
    @Lusty
    
    "An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks."
    
    How is telnet to port 443 a 'fake connection and a security risk'?
    
    How can you drop telnet connections to port 443 but allow legitimate SSL traffic to the same port?
    
    14 0 Reply
    1. Wednesday 1st March 2017 13:05 GMT Lusty
      
      Re: @Lusty
      
      "How is telnet to port 443 a 'fake connection and a security risk'?"
      
      The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.
      
      If you lot think ping is a good way to test a network then you need to get out more. For ping to work, it needs the service accessible and running on the endpoint you're testing and requires that nothing drops the traffic in between. It's quite a common thing and might confirm a connection is up, but lack of a ping response tells you nothing about whether that connection is down, certainly not a non-ping service on that same endpoint.
      
      1 1 Reply
      1. Wednesday 1st March 2017 14:22 GMT Alister
        
        Re: @Lusty
        
        @Lusty,
        
        You put:
        
        The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.
        
        And just how do you imagine a TLS session starts? If you are using telnet to prove or disprove connectivity exists to a host, then the initial connection attempt is all you need, and that is the same for any tcp connection, whether it be a TLS negotiation or any other protocol.
        
        I agree with you about ping, most secured environments block ICMP traffic nowadays, however, it and traceroute are still useful for investigating latency and routing so long as you temporarily enable it on the endpoint.
        
        4 0 Reply
        
        Wednesday 1st March 2017 15:27 GMT Lusty
        
        Re: @Lusty
        
        TLS works at the transport layer, clue is in the name. The security device sitting between the AWS/Azure host and the network would likely terminate any connections which are not actually setting up a secure transport as part of that connection. In case you missed it, both services have installed custom silicon on the network side of the NIC for exactly this purpose.
        
        Telnet doesn't expose the transport layer, and so if this were terminated it would indeed show as no connectivity when the service is up for legitimate traffic.
        
        I've not tested whether these services work with a Telnet test - my point was that just like ICMP, it proves nothing about the service itself.
        
        0 2 Reply
  2. Tuesday 28th February 2017 20:24 GMT Anonymous Coward
    
    Umm, do you know the basics of networking? Even if Amazon had the most amazing WAF that specifically looked for telnet vs. curl or code, they'd have to let them connect first on the standard port to start talking. Until a program starts talking specific protocols and going, the WAF is going to have to let them start.
    
    Having telnet (or nc, or anything else in the world that can make a network TCP connection) all operates the same at the most basic levels of connecting out to a remote server on a specific port.
    
    9 0 Reply
  3. Tuesday 28th February 2017 20:25 GMT disgruntled yank
    
    So let's not use telnet:
    
    $ curl https://s3-external-1.amazonaws.com
    
    ^C
    
    [me@mine ~]$ curl https://s3.amazonaws.com
    
    ^C
    
    5 0 Reply
  4. Tuesday 28th February 2017 22:07 GMT Alister
    
    @Lusty
    
    I think you just blew any credibility you had to comment on networking subjects.
    
    6 0 Reply
    1. Wednesday 1st March 2017 13:06 GMT Lusty
      
      @Alister, see other response regarding TLS and Telnet. Right back at you.
      
      0 1 Reply
2. Tuesday 28th February 2017 23:14 GMT Doctor Syntax
  
  It's not "high error rates", it's total failure to accept connections!
  
  I suppose 100% counts as high.
  
  4 0 Reply
Tuesday 28th February 2017 19:33 GMT paul-m-w72

The internet is borked...........

Even down detector is down so you can't tell anyone your site is down.......

9 0 Reply
Tuesday 28th February 2017 19:36 GMT Alan Sharkey

I wondered why DPReview (a camera site) was down till I rwad this and realised that Amazon own DPReview.

Makes the "cloud" look more like cirrus (light and wispy) rather than Cumulus.

4 0 Reply
Tuesday 28th February 2017 19:37 GMT Fan of Mr. Obvious

Will the re-write history?

Since the Current Status on the service health dashboard shows green, it will be interesting to see if they update the Status History to show it was FUBAR on 02/28/2017.

4 0 Reply
1. Tuesday 28th February 2017 19:51 GMT Steven Raith
  
  Re: Will the re-write history?
  
  The status page may be running on AWS gear.
  
  Oh the hypocrisirony.
  
  Clicky for tweety that says "The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates."
  
  Steven "tempting fate" R
  
  6 0 Reply
Tuesday 28th February 2017 19:53 GMT nsld

FFS

If you have single points of failure you deserve everything you get.

That's not the fault of 'the cloud' it's down to incompetent shitgibbons who shouldn't be within a thousand miles of any critical technology.

7 5 Reply
1. Tuesday 28th February 2017 20:12 GMT Anonymous Coward
  
  Re: FFS
  
  Good sir, I have voted you upwards for using the concatenation of "shitgibbons." Excelsior!
  
  "my increased error rate is 100%"
  
  That was a winner too! I love it!
  
  'We had 10% errors, at FIRST, which is pretty bad, then it increased all the way too 100% errors which should be total failure, but until the dashboard that it clobbered recovers to tell us otherwise, we are calling this "increased error rate." It sounds nice. Like saying; We have no services for you at this time, but you're important to us so, have a great day!'
  
  5 0 Reply
2. Tuesday 28th February 2017 21:27 GMT albaleo
  
  Re: FFS
  
  "If you have single points of failure you deserve everything you get."
  
  Are you suggesting that multiple points of failure are better? Maybe I'm being pedantic, but I've never quite understood the expression. I've had to deal with people who wanted to put part of our system on AWS and another part on Azure to avoid "a single point of failure". That the two parts are required for the system to operate and thus the chances of the system being down would increase didn't seem to cross people's minds.
  
  5 3 Reply
  1. Tuesday 28th February 2017 23:08 GMT Anonymous Coward
    
    Re: FFS
    
    The concept does not imply that there be multiple points, each of which are required for proper operation of the system, but multiple redundant paths, processes, structures etc, such that failure of any one does not compromise the system. Think of a physical mass being held up by chain of links, wherein a failure of any single link would cause the load to fall, and a multi-stranded cable, wherein the failure of any strand would continue to hold the load.
    
    4 0 Reply
Tuesday 28th February 2017 20:03 GMT Anonymous Coward

1) us-east-1 is the cheapest was for a looong time the cheapest aws region (and it still joint cheapest), so plenty of people will have their eggs in that particular basket

2) Plenty of people are pretty dumb and trust the "multi-az" aspect of a single region. You're on the cloud. Use more than one region (or, frightening thought, more than one provider?). It's exactly the same effort and saves you from nightmares like this. Same as using more than one datacentre. The AZ should be thought of as a (very large) rack, not as a DC.

4 0 Reply
Tuesday 28th February 2017 20:04 GMT inmypjs

Amazon Music borked?

Been telling me everything is not available and try later for a couple of hours. Amazon prime video seems ok.

5 0 Reply
1. Tuesday 28th February 2017 20:38 GMT bsdnazz
  
  Re: Amazon Music borked?
  
  Yes. Our Echo Dot cannot play any music and while I can logon to our Amazon Music Library web site it cannot play any of our tracks - "We're Sorry We are unable to complete your action. Please try again later."
  
  Good job I also uploaded everything to Google Play and still have local copies for our Sonos system.
  
  2 1 Reply
  1. Tuesday 28th February 2017 20:41 GMT bsdnazz
    
    Re: Amazon Music borked?
    
    Wow! One post on El Reg and the Amazon Music service is working for me again.
    
    4 0 Reply
  2. Tuesday 28th February 2017 21:06 GMT Mage
    
    Re: Amazon Music borked?
    
    I just have multiple copies of all my music on our own multiple systems. Why would I store my media on the cloud to play it when:
    
    1) I have only one internet connection.
    
    2) I have a cap
    
    3) The "cloud" isn't available walking, cycling or in the car.
    
    7 1 Reply
    1. Tuesday 28th February 2017 23:30 GMT inmypjs
      
      Re: Amazon Music borked?
      
      "I just have multiple copies of all my music"
      
      Are we supposed to care?
      
      Amazon music comes free with Prime which I bought mostly for free delivery and a bit for Amazon produced video content. As I already paid for it occasionally I feel obliged to browse and listen to some of the music included with Prime.
      
      Today was one of those days and it went tits up. No great loss, the biggest annoyance being from thinking it may be a problem with the tablet I was using or Amazon account.
      
      0 0 Reply
2. Tuesday 28th February 2017 21:11 GMT W4YBO
  
  Re: Amazon Music borked?
  
  Prime Video is okay, but oddly, won't display captions. Tried several videos that have shown captions in the past. ???
  
  Edit: Damn if I didn't click "Submit", and the captions appeared.
  
  3 0 Reply
Tuesday 28th February 2017 20:12 GMT TReko

Weasel words

>AWS, for some reason, insists this isn't an "outage" but rather a case of "increased error rates" for its

>most popular cloud service.

"Outage" means that they will have to cough up money due to service level agreements.

We have the same issue with Google Cloud, it never has an "outage" when it goes titsup, it just has "issues"

15 0 Reply
Tuesday 28th February 2017 20:27 GMT Disk0

Fake news

now on an Amazon dashboard near you...

4 0 Reply
1. Tuesday 28th February 2017 20:54 GMT Adam 52
  
  Re: Fake news
  
  Well the AWS support console shows an outage under "service notifications".
  
  Starting to come back up now, btw.
  
  0 0 Reply
2. Wednesday 1st March 2017 00:06 GMT Anonymous Coward
  
  Re: Fake news
  
  .. yeah I blame Trump. Amazon is on the wrong side of the wall.
  
  2 0 Reply
Tuesday 28th February 2017 20:50 GMT ruet

Service Health Dashboard

Hah, they couldn't even update thier own status page correctly:

"Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue."

2 0 Reply
1. Tuesday 28th February 2017 21:28 GMT Anonymous Coward
  
  The whole cloud rests on a single turtle
  
  The dashboard was not the only central AWS system affected. The console and API also seem to rely heavily on the US-EAST-1 storage with increased errors rates when trying to deploy elsewhere or change entries in Route 53.
  
  3 0 Reply
2. Tuesday 28th February 2017 23:22 GMT Doctor Syntax
  
  Re: Service Health Dashboard
  
  "what we believe will remediate the issue."
  
  Sigh. remedy or maybe ameliorate but make up your mind.
  
  2 0 Reply
Tuesday 28th February 2017 20:52 GMT Anonymous Coward

CloudFog / Someone-Else's-Computer -vs.- Industry PR

~ S3 down, ouch! But it won't impact cloud business much. Why? Corporations are addicted to cost cutting. Its the 'New Innovation, Stupid'!

~ But industry still doesn't want anyone thinking about 'Someone Else's Computer'. Instead we should use buzzwords like hyperscale:

http://www.zdnet.com/article/stop-saying-the-cloud-is-just-someone-elses-computer-because-its-not/

http://www.techrepublic.com/article/is-the-cloud-really-just-someone-elses-computer/

~ Its a hyperscale failure today. And next time there's an even bigger cloud config / data center / net outage, its still not 'someone else's computer'...

4 2 Reply
1. Wednesday 1st March 2017 08:49 GMT Anonymous Coward
  
  Re: CloudFog / Someone-Else's-Computer -vs.- Industry PR
  
  Requires obligatory xkcd (908).
  
  4 0 Reply
Tuesday 28th February 2017 20:52 GMT Anonymous Coward

Accounts cloudy app Xero is down as well, always useful on the last invoicing day of the month, and lets not talk about people who get paid on the 1st of the month ... needing xero to process those payments may make it awkward in the office tomorrow morning...

2 0 Reply
Tuesday 28th February 2017 20:52 GMT AGFLawless

Drudge knocks Kellyanne off headline for AMAZON

Wow it took an internet outage to Knock Sweet Kellyanne off Drudge's Headline Pic?

Please put her back!

0 1 Reply
Tuesday 28th February 2017 20:53 GMT Anonymous Coward

Still up for me

S3 console is down, but the app using S3 is working fine. So it does look to be a partial outage.

0 0 Reply
Tuesday 28th February 2017 21:29 GMT Anonymous Coward

Fire Stick

So Amazon Fire Sticks become completely unusable during an S3 outage, can't run any local apps or do anything. Lump of plastic.

And my Motorola security camera also stopped working due to it relying on .... S3

A classic example of how much a bad idea it is to reply on cloud services from unreliable vendors.

3 0 Reply
Tuesday 28th February 2017 21:33 GMT Anonymous Coward

Maybe it's time to condsider Cloudian

Private S3 Cloud Storage

2 0 Reply
Tuesday 28th February 2017 21:36 GMT John Geek

amazon's own webpile couldn't deliver my order history a hour ago....

I love the CIO's that mandate all internal critical systems are running on high availability high grade hardware, with redundant fiberswitches, multipath network connections, san storage, etc, then decides its all too expensive so outsources things to the likes of Amazon and Google, who are using the *cheapest* of commodity hardware they can get away with.... The irony of this escapes the suits.

4 0 Reply
1. Tuesday 28th February 2017 22:11 GMT Bandikoto
  
  "Services" is a completely different accounting bucket from "capital hardware" and why do you hold a grudge against your CIO for enclouderating everything? He got a big fat bonus for that!
  
  2 0 Reply
Tuesday 28th February 2017 22:07 GMT Sirius Lee

Calling bullshit

Our storage is on S3 in the US East and we've not experienced problems or losses. Maybe there is some other problem some users have which manifests itself as an S3 problem.

On the back of this story we've wasted time checking our store on the S3 service and have not found any issues.

0 4 Reply
1. Tuesday 28th February 2017 22:42 GMT James 47
  
  Re: Calling bullshit
  
  If you'd checked it a few hours ago you might have seen something. All our buckets disappeared. All back to normal now.
  
  2 0 Reply
Tuesday 28th February 2017 22:08 GMT a_yank_lurker

Wondering What Was Happening

I noticed several sites having issues today while others were fine. This might explain what was happening.

0 0 Reply
Tuesday 28th February 2017 22:52 GMT Haku

Over exaggeration.

People who flippantly use phrases such as:

"That's like half the internet."

"It's all over Facebook."

"It broke the internet."

Should be made to stand in a cherry picker in front of a blackboard the size of a skyscraper and write out a billion times "I will not exaggerate ever again."

3 1 Reply
Tuesday 28th February 2017 23:10 GMT WibbleMe

Chaps don't put all your eggs in one basket AWS/Google/DigitalOcean/Linode spread them out.

My advice, switch it off and go to bed.

1 0 Reply
Tuesday 28th February 2017 23:14 GMT WibbleMe

Quotes...

Who brought coffee into the server room?

1 hour before I go on holiday and I'm the only guy who can fix it!

1 0 Reply
This post has been deleted by its author
Tuesday 28th February 2017 23:24 GMT Anonymous Coward

Bigger Problem

This dramatically illustrates the U.S. national-security vulnerability of the whole interweb to hostile take-down. Like so many other infrastructure constructs - e. g., the electric grid, gas & oil pipelines, etc, they have been developed with only the least expensive, most "efficient", criteria in mind, with security and reliability under duress, as an afterthought. To add such security after the fact is extremely expensive, more so than would have been case if designed into the original system. Such was the situation pointed out in the weeks following 9/11 by ex-CIA director Woolsey regarding existing infrastructure, and little or nothing has been done to remedy the problem in existing or new infrastructure since.

2 0 Reply
Wednesday 1st March 2017 00:02 GMT bombastic bob

cloud overrated for most things

"The Cloud" has its uses, like shared docs stored on google docs, or source on github. But if you don't have some means of "failure override" (like using a private repository, or e-mail documents to people) you're totally b0rked when the cloud has another 'technicolor belch'.

I can imagine people using Office 365, google's javascript document editors, or even a cloudy-based mail service, running about like chickens with heads cut off, if their entire business model has them as 'single point of failure'.

I have to wonder who didn't hear about "distributed load" "replication" and "automatic failover" over at AWS...

1 0 Reply
Wednesday 1st March 2017 00:02 GMT Anonymous Coward

The Irony...

Downdetector.com is Down... guess its an AWS site.

3 0 Reply
Wednesday 1st March 2017 00:02 GMT Lib Serum

No doubt. Another insidious Russian plot!

www.youtube.com/watch?v=2TszIJX-F4U&feature=youtu.be&t=7

0 0 Reply
Wednesday 1st March 2017 00:21 GMT Anonymous Coward

Definition of "The Cloud" ?

So I guess "The Cloud" now potentially means "you're system has gone up in smoke, and has been vapourised"

I guess The Cloud could be heaven for computers when they go and die.

"ah my machine is in the cloud ..."

As a society, I am now thinking Star Trek's Next Generation "Binards" were actually a prophetic warning to us all, and that was about 30 years ago.

( sorry for the obvious icon choice ;) )

1 0 Reply
Wednesday 1st March 2017 00:34 GMT DocNo

No surprise

Look, everyone piles everything on AWS East because it's the cheapest (or among the cheapest) of their datacenters.

It's the cheapest because it's the oldest.

It's not hard to do the math. Or it shouldn't be. It just proves that people really do stink at assessing risk.

Also as others have pointed out, it's not Amazons fault that applications fail when they have an eventual outage - it's why Amazon (and other cloud providers) have multiple data centers that are geographically dispersed. It's up to appliction owners/users to design redundancy into their applications. Indeed AWS makes it easier and far more accessable to everyone to build proper geo-diverse disaster recovery into their applications that has ever been possible before. Technology and functionality previously available only to the biggest organizations is now accessable to just about everyone.

People just don't want to pay for it, deluding themselves that it will never happen to them. Surprise!

3 0 Reply
1. Wednesday 1st March 2017 09:17 GMT Anonymous Coward
  
  Re: No surprise
  
  It's not just about where you put your snazzy app stuff. It's also a lot of support infrastructure (Console, Status Page blah blah) is hosted in US-East-1 and not replicated out to other Regions. So a failure of a important service like S3 (that seems to be the pillar of the supporting services) leaves in the dark to reacting to the incident.
  
  If you're in a co-lo DC, at least you can ring the DC support; ask a tech to check what's going on behind the scenes and make a local switch to another piece of kit. On AWS... you even need to automate that failover and even that might break if API breaks.
  
  1 0 Reply
Wednesday 1st March 2017 00:45 GMT Anonymous Coward

Perhaps Storm Doris

Blew the clouds away ...

2 0 Reply
Wednesday 1st March 2017 08:03 GMT Anonymous Coward

my survivable disaster

The android "walk my dog" app failed to sync last night's 4 mile walk with mitzy (my german shepherd) to magic cloud land which I'm going to attribute to this s3 outage debacle.

This is clearly an unacceptable disaster of biblical proportions. Not.

I'll be going out for an hour with the dog in the fresh air again tonight. During that I won't be worrying if virtual clouds are present but I do expect to be keeping an eye out for real clouds above

1 0 Reply
Wednesday 1st March 2017 08:09 GMT Jason Hindle

As the T-Shirt Says

It's not a cloud! It's someone else's computer*.

* And it can break.

0 0 Reply
Wednesday 1st March 2017 08:51 GMT Potemkine

Bloody solar flares!

For sysadmins, choose your explanation here

3 0 Reply
Wednesday 1st March 2017 10:45 GMT Frank Jennings - The Cloud Lawyer

Quick, dig out the contract to see what protections you've got.

Clause 10: The service offerings are provided “As Is.” We…make no representations or warranties of any kind…that the service offerings or third party content will be uninterrupted.” https://aws.amazon.com/agreement/

If you didn't like that one, you definitely won't like clause 11.

1 0 Reply
Wednesday 1st March 2017 10:46 GMT Anonymous Coward

But the "Cloud" is infallible, as we're always sold.

2 0 Reply
Wednesday 1st March 2017 11:01 GMT Anonymous Coward

Re: Whats this GUI thingy?

It broke our system in two places:

1. We take a data feed from TfL. That died for five hours so no traffic updates. Nothing we can do as its not our kit, we just consume the data when its there.

2. We then discovered that cdn.leafletjs.com was also down. We use their CDN. That was our fault as we relied on a CDN server being up. Lesson learnt and 15 mins later we were back up.

That was the worst outage we've had and it wasn't our fault, Highly annoying but since we paid exactly 0p for the lot we cannot complain.

I have no doubt that far bigger businesses are talking to Amazon re outages and service penalties. Amazon can use weasel words like "100% error rate" but I'd be gobsmacked if money doesn't start flowing from Amazon to big clients (even if its service credits).

1 0 Reply
Wednesday 1st March 2017 11:02 GMT stevebp

You built your cloud service on what?

Don't put all your eggs in one basket. If AWS is your 'cloud strategy', make sure another cloud provider (or your own private cloud) is in an 'active-active' configuration strategy as well. You won't regret it.

2 0 Reply
Wednesday 1st March 2017 16:13 GMT Howard Hanek

A Clue?

A poison cloudlet dipped in carre administered by a foul wind?

0 0 Reply
Thursday 2nd March 2017 15:56 GMT PeterM42

Rather than "remediating" the problem.....

.....wouldn't it be better to FIX it?

The problem with "clouds", as I have always said: they blow away in the wind!

0 0 Reply
Thursday 2nd March 2017 16:32 GMT Anonymous Coward

... yeah --- but AWS is cheaper than ..

hosting a DC on site -- you can blame someone else when it fails !

0 0 Reply
Friday 3rd March 2017 13:40 GMT anonymous boring coward

This should happen more often, in order to learn how vulnerable we are.

1 0 Reply

POST COMMENT House rules

Not a member of The Register? Create a new account here.

Topics

Special Features

Vendor Voice

Resources

COMMENTS

Strava

Re: Strava

Re: Strava

But....

Re: But....

Re: But....

Re: But....

@Geek

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Re: But....

Cloud selling and Pricing

Re: Cloud selling and Pricing

Re: Cloud selling and Pricing

Re: Cloud selling and Pricing

Re: Cloud selling and Pricing (@jMcPhee)

Re: Cloud selling and Pricing

Re: Cloud selling and Pricing

Re: Cloud selling and Pricing

I guess this guy finally broke AWS...

@Lusty

Re: @Lusty

Re: @Lusty

Re: @Lusty

The internet is borked...........

Will the re-write history?

Re: Will the re-write history?

FFS

Re: FFS

Re: FFS

Re: FFS

Amazon Music borked?

Re: Amazon Music borked?

Re: Amazon Music borked?

Re: Amazon Music borked?

Re: Amazon Music borked?

Re: Amazon Music borked?

Weasel words

Fake news

Re: Fake news

Re: Fake news

Service Health Dashboard

The whole cloud rests on a single turtle

Re: Service Health Dashboard

CloudFog / Someone-Else's-Computer -vs.- Industry PR

Re: CloudFog / Someone-Else's-Computer -vs.- Industry PR

Drudge knocks Kellyanne off headline for AMAZON

Still up for me

Fire Stick

Maybe it's time to condsider Cloudian

Calling bullshit

Re: Calling bullshit

Wondering What Was Happening

Over exaggeration.

Bigger Problem

cloud overrated for most things

The Irony...

Definition of "The Cloud" ?