back to article Expired router cache sends Google Cloud Engine TITSUP

Google's Cloud Engine (GCE) has experienced Total Inability to Support Usual Performance (TITSUP) for about two-and-a-half hours. Incident 15045, as Google describes the outage, kicked off at about 22:59 on the evening of 18 February (West Coast US time) and then rolled on until 01:31 the next day. Virtual machines in the …

  1. Mark 85

    Refreshing admission...

    "We consider GCE’s availability over the last 24 hours to be unacceptable," the post says, adding that Google's cloudy teams are "completely focused on addressing the incident and its root causes, so that this problem or other hypothetical similar problems cannot recur in the future."

    Not your normal canned spiel. So will someone be found who was responsible and be banned from the company cafeteria for a few days?

    1. Borg.King

      Re: Refreshing admission...

      Said individual will be redeployed behind the counter in 'sudo make me a sandwich'.

  2. Neil Barnes Silver badge

    Remind me again

    about 100% uptime and availability, and why I should move to the cloud?

    1. Captain Scarlet

      Re: Remind me again

      It has a flashy marketed name

      That's it

      1. Anonymous Coward
        Anonymous Coward

        Re: Remind me again

        If it was marketed that much it would have 110% uptime.

        1. Captain Scarlet

          Re: Remind me again

          "If it was marketed that much it would have 110% uptime"

          The big players wouldn't be able to get away with that, so they would put 99.9% instead

    2. Blane Bramble

      Re: Remind me again

      Because when it goes wrong it's someone else's fault.

      Nobody got fired for using IBM / Microsoft / Amazon / Google / erm...

      1. chivo243 Silver badge

        Re: Remind me again

        buying cisco gear? You forgot one...

    3. Anonymous Coward
      Anonymous Coward

      Re: Remind me again

      I'd say its more subtle than that. Why would anyone want to use a Chromebook which is hugely dependent on its online services to function.

      1. fruitoftheloon
        Stop

        @Ac: Re: Remind me again

        Ac,

        They may choose a chromebook over a wintel because of the lower 'bother quotient'.

        'Tis horses for courses, different folk have different priorities and requirements...

        Cheers,

        J

      2. Robert Grant

        Re: Remind me again

        Why would anyone want to use the internet which is hugely dependent on its online services to function.

        Fixed!

    4. Lee D Silver badge

      Re: Remind me again

      Because local services NEVER go down...

      Cloud isn't evil, as such, you just have to know what its limitations are, like everything else. The dependency on your whole work network of simple things like: The clocks being in sync, connectivity being absolute and not overloaded, power being up etc. are all present still. There are few businesses or even home users that can approach the uptime of something like Google Cloud given the number of services it runs for them.

      Hell, I have to reset my home router about once a month and if it's out for five minutes at a time, that means in a year, I have an hour's downtime just doing that.

      Cloud isn't evil, but neither is it the answer to everything.

      Personally, I don't enjoy faffing trying to keep our Exchange server up and exposed to the world compared to my last workplace where we just had Google Mail for Domains. Sure, it can go down, but more likely WE went down as a workplace and that means we can at least check email on phones even with the connectivity and servers off.

      It all depends what you want, how you want it, and what you're willing to pay for it. Personally, there's a lot of in-house stuff here and I'd like to keep most of it that way. But a few things, I'd gladly push to the cloud and let someone else worry about it en-masse.

      And anyone who believes that ANYTHING is the complete answer on its own is an idiot. Sorry.

      1. Vince
        FAIL

        Re: Remind me again

        ""Hell, I have to reset my home router about once a month and if it's out for five minutes at a time, that means in a year, I have an hour's downtime just doing that."

        Clearly your router sucks. Get a better one.

      2. Peter Gathercole Silver badge

        Re: Remind me again @Lee D

        The difference is that if an in-house service goes down, you can investigate the problem after the fact, and try to so something to prevent that same problem from happening again, and this includes disciplining anybody responsible for the design or operation of the service.

        You have nothing like that level of control with cloud services. You might hope that the service provider may learn from the experience and do the same type of review that you might, but there is probably nothing in your contract that forces them to do so, and this probably includes actually being given an accurate and complete report of the issue. Unless there are specific uptime targets in your contract, it may ultimately be that the only lever you have over the provider is to threaten to leave them, with whatever the fallout that will cause.

        I don't doubt that there are some services where this is a perfectly acceptable risk, but there are many, many others where this is just not the case. Couple that with the uncertainty regarding control of access to your data, and these things together make it quite unlikely that I would recommend putting any business critical service in the cloud.

        1. theblackhand

          Re: Remind me again @Lee D

          It's horses for courses....

          If you have an environment that provides a suitable high availability environment for your companies requirements at present, then cloud may not look so great.

          If on the other hand you are the average Silicon roundabout company* where you interact with your customers via Internet services (email/web services/file transfer) and your Internet connection is whatever DSL/cable line you can get for under £50/month, cloud services provide a significant increase in availability.

          * Note: all comments based on recent media coverage. Journalists only tell the truth don't they?

        2. bigtimehustler

          Re: Remind me again @Lee D

          Most companies are not going to be paying the required salary to attract people who can design and keep running a system at anywhere close to the uptime of the best cloud providers. You can not discipline someone because you chose to hire someone of a lesser skill than is required or you can afford.

    5. Anonymous Coward
      Anonymous Coward

      Re: Remind me again

      "about 100% uptime and availability, and why I should move to the cloud?"

      Its 100% uptime in the time its actually up. They don't count the downtime cos that wouldn't look good in the marketing spiel.

      1. SleepyXuras91

        Re: Remind me again

        Who ever said 100%? Since in the SLA its 99.95%

        https://cloud.google.com/compute/sla

        Heck anyone would be stupid to quote 100% even in marketing!

        1. Lee D Silver badge

          Re: Remind me again

          http://uptime.is/99.95

          They can be down for 4h a year, on average, and still claim that uptime.

          If they play silly beggars about uptime not being 24/7 but usual business hours, or whatever, then it's even worse.

          And, remember, that's only a target. They can't "guarantee" that worse won't happen, only that they (might) compensate you some pittance in proportion if it does.

        2. Joseba4242

          Re: Remind me again

          "100% global availability SLA" http://www.akamai.com/html/resources/cloud-architecture.html

          "Rackspace guarantees that its data center network will be available 100% of the time" http://www.rackspace.com/pt/information/legal/cloud/sla

          Many people would call 100% stupid yet accept 99.95% as perfectly valid. However if measured monthly any long outage would breach a 100% SLA as much as it would breach a 99.95% and hence in reality 99.95% can be guaranteed as much or as little as 100% can be.

          Stupid? Reality of SLAs is much more difficult than looking at a single number.

    6. Roj Blake Silver badge

      Re: Remind me again

      Newsflash: non-cloudy servers go wrong too.

  3. Crazy Operations Guy

    And this is why you have Hybrid clouds / multiple cloud providers

    No data canter or host can have 100% uptime, so its stupid to trust any single one with your data / services. I've had a lot of luck placing about 2/3 capacity on Amazon's and another 2/3 of capacity on Rackspace's cloud. Both deployments are split between two regional DCs each, so that even if, say, Amazon's US-West region keeled over, we still have full capacity. And if their whole cloud goes down then we still have services running on RackSpace. Even one entire cloud and half of the other could go down and we'd still be able to limp along without seeing any services down (assuming the load doesn't kill them).

  4. bozoid

    "Total Inability to Support Usual Performance" is now my favorite acronym.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like