back to article Did you bet the farm on Amazon's cloud? Time to wean yourself off

Oracle is making hay over last weekend's mega six-hour Amazon Web Services (AWS) cloud outage. "You get what you pay for," tweeted Oracle's Phil Dunn, with the caveat that all views are his and don't necessarily reflect those of Oracle. But you get the point. Yes, Amazon's been left with egg on its face and rivals will be …

  1. Zog_but_not_the_first Silver badge

    With clouds come rain

    Who'd have thought it?

    Obligatory snarky response.

  2. Will Godfrey Silver badge

    Dual providers is all very well, but (in light of recent events) how can you be certain that your two 'independent' providers don't share any resources?

    1. Ian Michael Gumby Silver badge

      @Will re Dual Providers...

      If its not AWS then your second provider isn't sharing. Although since AWS is multi homed, and your other provider is multi homed, you may end up using the same network provider.

      The issue many forget is the costs involved in shipping data from AWS to another source. Very expensive.

  3. Pallas Athena

    Dual-provider for failover?

    If you have, like the author suggests in this article, your compute nodes on AWS and your storage on Google, there is a BIG chance - in fact, almost a certainty - that such setup will not guard you against any failure. In fact, it's even worse: you will experience a outage whenever _one of the two_ has problems.

    1. Arctic fox
      Thumb Up

      @Pallas Athena Re:"Dual-provider for failover? " As our compadres on the western side.........

      ..................of the pond have a tendency to say when they really, really agree with someone; This!

      These days it is not enough to check whether your "independent" provider is or is not financially in bed with someone you wish to limit your exposure to. You also have to check what systems they share/are dependent upon - even if, as far as the law is concerned, they are completely different companies with no shared ownership.

  4. Rick Giles

    The cloud

    Is for people that can't manage a data center or hire (and retain) a knowledgeable staff.

  5. Anonymous Coward
    Anonymous Coward

    My buddy runs an online financial service...

    He can take the primary system off-line (requires yanking cables), and the off-site backup system will transparently complete the in-progress transactions. Nobody even notices.

    It's been that way for *decades*.

  6. Hans 1 Silver badge


    Ok, lets say you wanna go with two providers ... can you mirror shit on Azure and AWS or Google or whatever easily ? I doubt it.

    OpenStack has many vendors, same tech, same stack - much easier to go with them and certainty you can just "switch over" in case of emergency with one of the vendors.

    However, my hands bleed (how many times have I typed this this week?), don't put sensitive data on a cloud, and never EVER put critical data on a cloud.

  7. Anonymous Coward
    Anonymous Coward


    AWS has egg on its face for violating its availability zones. It will learn and improve and make more money.

    Heads will not roll at Netflix because they are making money and getting profitable stuff done.

    Dual providers will not improve the cost/reliability ratio. Done right (big if), with considerable cost, one might increase availability by a few hours every few years, numbers from the author. It ain't worth it for movies, music, games, or shopping. If you're running a bank or brokerage, well good for you.

  8. smartypants

    Oracle has a short memory!

    Just a year ago, their 'CRM on the Cloud' product had an outage:

    But hey! If I roll my own Oracle then I at least can guarantee uptime...


    From a couple of years back... Primary database problems lead to data loss for Salesforce customers.... (Salesforce use Oracle for their primary RDBMS)

    What can we learn from this?

    1) Cloud versus non-cloud is not a discussion of failure versus reliability

    2) People jumping with glee on the misfortune of a competitor rarely draw attention to their own mistakes.

    3) It is formally impossible to guarantee 100% uptime, and *any* architecture is subject to changing conditions which might increase the the chances of failure... without anyone noticing till it's too late.

  9. Seanie Ryan

    netflix is grand as it is

    lets face it, netflix and such are luxury, non-essential services. If they are offline, someone can't watch a movie !! lets get real.. thats not life threatening. some gobshite has to get off the couch or change a channel. A few hours offline?? Not a biggie really. How many people are going to cancel a subscription for that? very very small percentage really.

    So the decision is based on cost of dual setup vs number of cancellations due to a few hours outage.

    My bet is that its cheaper to stay as they are.

    Big difference to Netflix offline Vs paypal/worldpay/sage/salesforce, which businesses rely on.

    let not make mountain out of molehills for not being able to watch a tv show for a few hours..

    Reality is, clients want a cheap service, so the company providing it have to keep costs low to match. If everyone voted that they would pay Netflix $/£/€ 150 a month, then yeah, you would expect them to have a better setup.

    Will someone not think of the children !!

  10. sysconfig

    Dual cloud is not that simple

    If you are doing stuff at scales similar to Netflix, you can't just easily run with two clouds. Each cloud has its own APIs (loads of them, to be precise), certain things you can or cannot do, or which you have to engineer just a little bit different due to unsupported features on one side, which exist on the other, and vice versa. The same would be true if you had two independent providers both running on OpenStack for example. Because OpenStack is a rather messy affair still and which features a provider offers is their discretion.

    So in theory running dual cloud is all dandy. In practice it will also increase costs, because your deployment process, sometimes even the application design, will differ significantly. Not to mention the additional logic of keeping them in sync, making sure that fail-over (and recovery!) between entire clouds is handled correctly. It's really not quite as straight forward in practice.

    (And as others pointed out, splitting parts of the application infrastructure between two clouds actually at least doubles your chance for error. What you do want to do is *duplicate* it.)

    It's a business decision whether increasing your costs by a fair margin upfront is the better option (to be protected against such failures), or if you just accept that even in clouds like AWS shit can happen -- albeit not very often and so far never on a global scale (they do have several DC locations world-wide, three alone in the US if I'm not mistaken; so there's your same-API disaster plan, if you really want one, without increasing your costs).

    I guess we all know what most businesses (or their bean counters) would opt for: take the risk and reduce your costs from the onset.

  11. Terry 6 Silver badge


    In another time, people would have cautioned strongly against relying on a single supplier for your critical IT needs. On the web, that's thrown out the window.

    Again, huh.

  12. Jon Massey
    Thumb Up

    Ovine diversity

    Nice to see some blackfaces as opposed to the usual Herdwick pic.

    1. Tom Paris

      Re: Ovine diversity

      I wonder just how many readers had to look at photo again to understand that one, I did!

      1. Jon Massey

        Re: Ovine diversity

        Most common breed of sheep in the UK - what else could I have been on about?

  13. brenthawkinsmd

    Better Than Having Traditional Roadblocking Network/Infrastructure Teams

    This is still much better than running your own data centers and getting bogged down with Infrastructure and Network teams that block progress and innovation. If Netflix had such legacy Network Teams, they would still be trying to talk the Network team into opening up firewalls for port 443.

    1. Anonymous Coward
      Anonymous Coward

      Re: Better Than Having Traditional Roadblocking Network/Infrastructure Teams

      Bollocks. "running your own" doesn't mandate "hiring only idiots".

  14. Anonymous Coward
    Anonymous Coward

    What's with all the Netflix bashing?!?!

    "Despite the embarrassment of Netflix this time around..."

    "Heads should roll at Netflix for its over-dependence on AWS. Increasingly, its status as an all-in-one-AWS pioneer is hurting it."

    I'm not sure from where you're getting this info on "embarrassment" and such... According to Netflix's own blog post from 9/25 (which you conveniently didn't link to) they experienced a "brief availability blip in the affected Region, but [they] sidestepped any significant impact..."

    Doesn't sound like anyone over there is regretting their decision to move to AWS.

  15. rkeiii

    This article seems contradictory to me (or at least confusing). On the one hand you take the time to say...

    "The best way to avoid going dark is to architect your service to fail over to different nodes within a region. Even better, different regions. The AWS outage was centered on the giant's US-East region – it has eight others across the planet."

    Yes, if you had proper redundant services in other regions you should have been "pretty much okay", If you did not take the time to build in proper fault tolerance you get to suffer (when is that not the case?).

    The contradictory or confusing part is at the very end... "Heads should roll at Netflix for its over-dependence on AWS. Increasingly, its status as an all-in-one-AWS pioneer is hurting it.". This could be more accurately stated as "Heads should roll at Netfilx if they in-fact suffered a meaningful outage as a result of not spinning up redundant services in a different AWS region, they are increasingly seen as Amazon's reference customer and should definitely be making use of best practices Amazon has described for years to increase reliability.". But I guess that's not such a fun way to sign off, is it?

  16. Seajay#

    You won't do better

    This isn't a good reason to give up on outsourced IT. Yeah sure it's annoying when (very very rarely) AWS goes down. However, if you think you can do better with a homebrew solution you're deluding yourself.

    Trust in the stats and if you can't beat 99.999 uptime, stay with someone who can. It's true that 20 minutes downtime when all you can do is refresh the status page may seem like forever to you while a couple of hours executing your own emergency restore might fly by. However, no-one gives a crap how long it feels to you. They care about how much buisiness is lost and rightly so.

    Obvious exceptions, reactor control software etc or basically any other situation where your company's area of expertise is reliability and hardware (unlike Netflix who's area of expertise is negotiating with studios).

  17. Phil_Evans

    Caveat Emptor

    "...having convinced themselves they are the companies who know best"

    Well, good luck with outsourcing the business itself and all the mitigation of operational risk to someone who knows nothing about your business(in the case of Netflix and others). This in itself underpins the 'good luck' service that customer can (and do) expect of technology services. Yes it's cool and yes it works and yes it's cheap, but much like life insurance you don't 'get' any value from it.

    All the cloud providers effectively own the risk of a huge and growing number of services (some commercial, some public). So you can expect issues like the ones outlined here beginning to impact the otherwise rock-solid government services like 'UK Passports' to be heading the same way.

    Like most clouds, there one minute, gone the next.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019