back to article Amazon's AWS S3 cloud storage evaporates: Top websites, Docker stung

Amazon Web Services is scrambling to recover from a cockup at its facility in Virginia, US, that is causing its S3 cloud storage to fail. The internet giant has yet to reveal the cause of the breakdown, which is plaguing storage buckets hosted in the US-East-1 region. The malady kicked off around 0944 Pacific Time (1744 UTC) …

Mushroom

Strava

Strava is down due to this! How can I check how many miles I've ridden so far this month?

8
0

Re: Strava

If it's not on Strava it didn't happen

11
0
Silver badge

Re: Strava

Has the Ordnance Survey site gone down?

0
0
Silver badge
Paris Hilton

But....

Isn't the selling point of all this cloudy stuff that it does not go down???????

I guess the AWS cloud must have pissed down on someone until all the clouds disappeared.

Paris because she is good at shedding tears.

26
3
Mushroom

Re: But....

Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue. No matter how many outages Amazon, Azure, etc have, people still seem to think that it's made of magic.

Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure.

18
1
FIA

Re: But....

Isn't the selling point of all this cloudy stuff that it does not go down???????

No.

It's that 'IT stuff' has become a utility, as in you only pay for what you use.

This means you can build highly resilient and/or scaleable systems without huge upfront costs.

Doesn't mean people do though. ;)

12
2
Anonymous Coward

Re: But....

Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.

8
1
Silver badge

Re: But....

"better have a contract guaranteeing they get back more than the down time costs"

Why? If the downtime is less than you'd get elsewhere, or if the savings are more than the cost or if the faster time to market means you make massively more than the cost then you're still up.

5
2
FIA

Re: But....

Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.

No, that's exactly the opposite of what you should be doing, you're looking to apportion blame after the fact. This is little use if your business has gone bust due to the downtime. Better to design systems that minimise the risk of this happening in the first place.

Using the cloud allows you to build complex systems with little upfront cost.

That's it.

This does mean that smaller companies can build an infrastructure that's distributed and resilient in a way that wasn't financially feasible 10-15 years ago; and larger companies can potentially significantly reduce their DR expenditure.

It doesn't mean it'll never fail or require administration or backup or all the other things you should be doing with an IT infrastructure. It just means you don't spend a boatload upfront on kit.

10
1
Anonymous Coward

Re: But....

>It just means you don't spend a boatload upfront on kit.

And generally have less say on how things are setup and ran. Which is fine I guess for some but I personally wouldn't work for a company where I was responsible for production mission critical software running on systems not owned by my company, with a contract or not. The edge to building a lifetime of skills is getting a say directly and indirectly on such matters.

9
1
Silver badge
Devil

Re: But....

"Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."

always good advice.

/me uses github. that's cloudy enough.

1
1
Anonymous Coward

Re: But....

That said the cloud has it purposes. Definitely a cost saver for non mission critical non proprietary stuff. Still when internal manufacturing is your core mission the cloud is more a distraction for the bean counters than something to look forward too.

4
0
Silver badge

Re: But....

"Deploy in the cloud by all means but still backup, replicate"

We used to call that keeping a dog and barking yourself.

0
6
Silver badge

Re: But....

"Doesn't mean people do though."

Maybe because it's been sold as cheaper than running your own data centre.

When IT try to persuade the business to make provision for this sort of thing it's probably dismissed as IT being profligate again or even IT trying to bump up costs so their own service is still competitive.

3
0
Silver badge

Re: But....

"It just means you don't spend a boatload upfront on kit."

It also means your interests aren't necessarily at the front of the queue when it comes to recovering from this sort of (not) outage.

5
0
Silver badge

Re: But....

Guys, EVERYTHING goes down on you at sometime or another.

6
0
Anonymous Coward

Re: But....

>Guys, EVERYTHING goes down on you at sometime or another.

Of course but when you have a good working personal relationship with gentlemen equally professional to yourself and with badges that only contain a slightly different number to yourself then its causes a lot less panic and is much easier to contact the exactly right people on the exact right time and get the answers you can count on and the service you need without as others say having to worry about if someone is putting your company's interests first. If this is not the case with your company then you should start thinking about finding a new company.

2
0
Anonymous Coward

Re: But....

>Guys, EVERYTHING goes down on you at sometime or another.

Network goes down and occasionally hardware goes down but fun fact even after years of supporting it I have never seen an HP-UX OS crash due to software ever. Of course thanks to Red Hat and cheap commodity hardware rising (and not giving 2 shits about POSIX) and HP squeezing its last few customers I do probably sadly see more Linux kernel panics in my future sigh.

2
4
Silver badge

Re: But....

No - that's never been the claim of cloud. They specifically tell you it's not 100% guaranteed. That's why anything that matters should be designed not to rely on a single cloud region....

2
0
Silver badge

@Geek

"Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue."

True, but who's fault is that? Isn't this exactly their whole selling point to begin with?

I also don't think you should dismiss the whole argument that easily, because when properly set up you can get a redundant environment if you want to. The fact that it now doesn't work this way at AWS tells me more about their infrastructure than the (in)abilities of virtualized hosting.

3
0
Anonymous Coward

Re: But....

Yes, exactly. All our deployment and storage services are dependent on S3 or S3 backed apps and were all critically impacted but you wouldn't have noticed because our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm. A fortune 500 company managing many hundreds of web services.

3
0
Silver badge

Re: But....

Two words: Tier Caching. All our US sites use S3 in W.VA but not one was affected.

1
0

Re: But....

"Isn't the selling point of all this cloudy stuff that it does not go down???????"

Not without multiple levels of geographic redundancy. It's hugely expensive for an event that might only happen once every few years. Those dumb pipes known as the carriers have it in spades*. The likes of Amazon and Google, no so much. I like carriers (from a technical perspective).

* Even for voice mail, and no one uses that.

2
0

Re: But....

"Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."

Unfortunately, that is what they've done. This fault affects a specific region, each of which contain multiple availability zones. Each zone constitutes a logical datacentre, comprising multiple physical datacentres (between 3 and 6 in each AZ, I believe). Deployment across two or more AZs in a given region *is* removing the single points of failure. Supposedly. Didn't work this time.

AWS don't particularly recommend deploying across more than one region, because each region is effectively a completely different cloud, common in branding, usage etc, but connected only via the public internet. Replication between zones within a region is fast and free, but replication between regions is slower and costs.

Ultimately though, a well-designed AWS deployment, consisting of all the fault-tolerant bells and whistles, still has no upfront cost and is thus way more achieveable than doing it on-prem. Said bells/whistles will make nuclear outages like this the cause of the rare downtime you do get.

4
0
Anonymous Coward

Re: But....

"our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm."

Righto.

Cache doesn't have everything in it though, so what happens when something uncached is required from somewhere else ?

Works, but slowly ?

Total failure of that request and anything related thereto?

"High error rate"?

Interested readers want to know.

3
0

Re: But....

@macjules

Caching in the Cloudfront sense, or within S3 itself?

0
0
Anonymous Coward

Re: But....

This is why a proper public cloud should be 100% automated. Not mostly automated like AWS.

0
0
Anonymous Coward

Re: But....

"It just means you don't spend a boatload upfront on kit."

That is understating it.

One of the huge advantages of public cloud is that you pay for actual utilization vs scaling to peak. That is huge. It would be worth using public cloud just for that benefit. As anyone who has ever sized on prem infrastructure knows, you scale to peak (meaning that you are paying for infrastructure every day as though it is the busiest day in the history of the company, even though most days are not the busiest day in the history of the company) and then you add 20% to the sizing because no one can be certain that the peak will not increase at some point and you cannot just elastically add scale. That equals many, many billions of dollars every year in infrastructure which is purchased and never or very rarely used.

0
0
Anonymous Coward

I'll punt these up in advance:

"You can't trust the cloud"

"It's the NSA installing a tap"

"My data centre has been up for 30 years" (btw, so is Amazon's).

Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt.

18
2

"Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt."

This, times a thousand. Any website or service pinning itself to a single node of a by-design distributed storage facility deserves whatever arse-kicking their customers choose to administer. The cloud, as is so often the case, is not the problem here - it's how it's being (mis)used that is the cause of any woes.

17
3
Anonymous Coward

To be fair, s3 is supposed to be multi-AZ and resilient within a region but as we saw with the last us-east outage and the recent London PoP outage tropical storms and power failures are no respecters of architectural diagrams.

15
0
Silver badge
Mushroom

Cloud selling and Pricing

Yes, the "Cloud" is the problem. The way it's hyped, priced and marketed encourages beancounters to outsource to it.

Almost Zero regulation.

No 3rd party audit or oversight

No transparency on backup, resilience, security or privacy. Just vendor hype.

There are things that are appropriate for the "Cloud". However increasingly due to marketing of the Cloud vendors, the applications are inappropriate.

10
18
Silver badge

Re: Cloud selling and Pricing

No third party audit? Have you ever tried reading? AWS and Azure are probably the most audited data centres on the planet!

11
2
Silver badge

Re: Cloud selling and Pricing

"Almost Zero regulation"

Almost? Care to list any?

I'd like to see the actual energy bill. Not a percentage estimate of what you save, but a percentage estimate of what Amazon does NOT save. Where's that at, in a NSA vault perhaps?

"...most audited data centres on the planet!"

Audited for what? Do you actually know, honestly know? Do you believe everything you read? Read this: the USA doesn't spy on its citizens.

6
7
Anonymous Coward

Re: Cloud selling and Pricing

Just shows you how useless audits are. All the audit is "do you do dumb things"? Nope. Okay you pass. I'm sure those accounting folk who do the audits like getting paid the big chunk of money my company pays them to say, yep, they say they do this.

7
0

Re: Cloud selling and Pricing

You left out some key steps the auditors follow:

1) Pay us

2) Show us you don't do dumb things

3) Here are some pissant concerns/findings so we can say we did something. Oh, and here are some meaningless pain-in-the-ass findings to address because they are one auditor's special area of expertise - you should make his book mandatory reading.

4) Your own in-house staff know about the real problems. But, "A prophet is not without honor except in his own country, among his own relatives, and in his own house.."

5) Set up the next audit. Don't forget about (1)

9
0
Anonymous Coward

Re: Cloud selling and Pricing (@jMcPhee)

I've worked at a place where the internal risk reviews, done by an employee of a different department in the same company, were exactly like that.

Real serious issues were not allowed to be raised. By order of the management, the only issues that were allowed to be mentioned were the ones that could be acceptably mitigated at no cost.

So something like only having one developer who knew anything serious about the company's internally developed customer-specific architecture-specific version of gcc, one not used (let alone maintained) anywhere else in the world, wasn't considered a recordable risk by the auditor.

Then one year the developer in question went on holiday and didn't come back. Never seen again.

Still, it mustn't have been a problem, because it wasn't recorded as a risk.

2
0
Silver badge

Re: Cloud selling and Pricing

"Audited for what? Do you actually know, honestly know?"

Yes. I and everyone else who bothered to look do know. It's quite well covered actually, and has to be to allow architects to do our work properly.

Azure details are in the trust centre.

https://azure.microsoft.com/en-gb/support/trust-center/

AWS is in their compliance and assurance pages

https://aws.amazon.com/compliance/

1
0
Silver badge

Re: Cloud selling and Pricing

"Audited for what? Do you actually know, honestly know"

There are 2 main types of data centre audit - security and environmental.

Usually a security audit would be a once off and would certify the facility to a specific standard - or just generally that it was secure by design and process with no significant security risks.

An environmental audit should be conducted yearly on any critical datacentres, MERs, SERs, etc. Usually after your annual deep clean... This will give you an extensive report on everything from aircon, UPS and fire alarms to the type and size of the particles in the air! For anyone who has any of the above facilities who isn't do this then you should be. Two companies that can help are Bureau Veritas and Aquacair...

0
0
Anonymous Coward

I guess this guy finally broke AWS...

https://www.reddit.com/r/DataHoarder/comments/5s7q04/i_hit_a_bit_of_a_milestone_today/

2
0
Anonymous Coward

It's not "high error rates", it's total failure to accept connections!

$ telnet s3.amazonaws.com 443

Trying 54.231.82.140...

^C

$ telnet s3-external-1.amazonaws.com 443

Trying 54.231.33.168...

^C

These are the endpoints listed at http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

3
0
Silver badge

An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks.

0
22
Silver badge
Facepalm

@Lusty

"An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks."

How is telnet to port 443 a 'fake connection and a security risk'?

How can you drop telnet connections to port 443 but allow legitimate SSL traffic to the same port?

14
0
Anonymous Coward

Umm, do you know the basics of networking? Even if Amazon had the most amazing WAF that specifically looked for telnet vs. curl or code, they'd have to let them connect first on the standard port to start talking. Until a program starts talking specific protocols and going, the WAF is going to have to let them start.

Having telnet (or nc, or anything else in the world that can make a network TCP connection) all operates the same at the most basic levels of connecting out to a remote server on a specific port.

9
0
Silver badge

So let's not use telnet:

$ curl https://s3-external-1.amazonaws.com

^C

[me@mine ~]$ curl https://s3.amazonaws.com

^C

5
0
Silver badge

@Lusty

I think you just blew any credibility you had to comment on networking subjects.

6
0
Silver badge

It's not "high error rates", it's total failure to accept connections!

I suppose 100% counts as high.

4
0
Silver badge

Re: @Lusty

"How is telnet to port 443 a 'fake connection and a security risk'?"

The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.

If you lot think ping is a good way to test a network then you need to get out more. For ping to work, it needs the service accessible and running on the endpoint you're testing and requires that nothing drops the traffic in between. It's quite a common thing and might confirm a connection is up, but lack of a ping response tells you nothing about whether that connection is down, certainly not a non-ping service on that same endpoint.

1
1
Silver badge

@Alister, see other response regarding TLS and Telnet. Right back at you.

0
1
Silver badge

Re: @Lusty

@Lusty,

You put:

The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.

And just how do you imagine a TLS session starts? If you are using telnet to prove or disprove connectivity exists to a host, then the initial connection attempt is all you need, and that is the same for any tcp connection, whether it be a TLS negotiation or any other protocol.

I agree with you about ping, most secured environments block ICMP traffic nowadays, however, it and traceroute are still useful for investigating latency and routing so long as you temporarily enable it on the endpoint.

4
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2017