back to article OVH goes TITSUP again while trying to fix its last TITSUP

European web hosting outfit OVH has reported its second major outage and Total Inability To Support Usual Performance* in a month and admitted the new outage was caused by its attempts to fix the cause of the last one. OVH's attributed its November outages to power problems and cable cuts. But this incident notice filed by …

  1. Griffo

    Ouch

    When SDN goes bad hey? You gotta almost feel sorry for them.

  2. Anonymous Coward
    Anonymous Coward

    That tone

    Simon, you were going OK until almost the last paragraph, then you felt the urge to go with this gratuitous snipe:

    "But there'll also be plenty who were impacted, and irritated, and wondering why they give their business to a company that's also experienced flood damage and can't configure routers well enough to avoid this sort of thing."

    Maybe you would like to stroll over and show them how it's done?

    I am not a customer of theirs, but when one of my suppliers drop the ball I do not shout at them: I offer to help them¹ getting back on their feet so that we both can carry on with our respective businesses. Everyone makes mistakes and we need to factor that into the equation, else we're not running our show properly.

    ¹ Usually a great way to do this is to leave them alone so they can spend their time actually fixing the problem, something to which answering questions from customers does not usually contribute.

  3. John Smith 19 Gold badge
    Unhappy

    "something to which answering questions from customers does not usually contribute."

    Except in the sense that ignoring customers when you're company has f**ked up is a good way to ensure they stop wanting to be your customers.

    And if enough of them do so with your business then pretty soon you're not going to be needed anymore.

    Keeping customers (internal or external) informed should be part of any generic "S**t-hits-the-fan" DR plan. IT should not have to do it but there should be some kind of SITREP process that can be used to inform customers and someone on both ends of it who deals with it.

    Let's be real. S**t will hit the fan. Anyone who's thinking "That never happens to us" is deluding themselves. It has simply never happened to you yet. So fail to plan or plan to fail.

  4. Danny 14 Silver badge

    Re: "something to which answering questions from customers does not usually contribute."

    OVH are really cheap compared to similar offerings. At the end of the day you get what you pay for (well most of the time, im talking informed decisions not PC world.)

  5. Anonymous Tribble

    Re: "something to which answering questions from customers does not usually contribute."

    Cheap they certainly are, very good value. That's why we use them and also why we use another supplier in case of failure. Last night was fun...

  6. Anonymous Coward
    Anonymous Coward

    Re: "something to which answering questions from customers does not usually contribute."

    > Except in the sense that ignoring customers

    Where does the above say anything about ignoring customers? Of course those affected (customers or otherwise) by an incident need to be kept informed, and that is a standard part of any contingency plan.

    However, customers (or anyone else) just jumping up and down and calling every five minutes thinking that is going to get things fixed any quicker is counterproductive.

  7. Anonymous South African Coward Silver badge

    The joys and pains of being in IT...

  8. Lysenko Silver badge

    Maybe you would like to stroll over and show them how it's done?

    Evidently, quite a few people here could. You do not go around rolling out patches and upgrades like this on primary production systems. You have a staging environment, which is also your tertiary failover system. Once you're happy that staging is updated and apparently idling happily, you temporarily promote it to secondary and then do a failover test (which should be a routine, monthly event) by taking the primary offline. If things go TITSUP then the regular secondary system cuts in and you immediately bring up the primary back up again and investigate at leisure.

    The point is, you always have three levels of redundancy and you always have two systems in known good (as in previously production tested) configuration. This isn't rocket science. It's a simple, sequential procedure. It costs money of course and it may not represent appropriate ROI for every business but in that case, say so and don't pretend to act all surprised when things crash and burn - it just looks like incompetence rather than the commercial risk/benefit/cost calculation that it (hopefully) actually is.

  9. patrickstar

    Are you seriously suggesting they build three backbone networks instead of one?

    Your approach works very well with servers. It doesn't work for networks.

  10. Lysenko Silver badge

    Are you seriously suggesting they build three backbone networks instead of one?

    Your approach works very well with servers. It doesn't work for networks.

    There's no difference. Tier IV is defined in terms of overall system resilience and ability to mitigate TITSUP conditions. It's irrelevant whether it is the network or the servers or the HVAC that goes down. All critical components must be fault tolerant and/or redundant. If that means you need to string a whole new fibre pipe across the Atlantic then that's what you have to do.

    However, repositories of cat photos don't exactly justify Tier IV, so I would never expect most businesses to invest in that. What I do expect them to do (as I noted at the outset) is to say what service level they're aiming for and not act all dazed and confused when their Tier II (or I) infrastructure crumbles beneath them. You expect that with Tier II. That's what defines it as Tier II. That's why it's (comparatively) cheap.

  11. Anonymous Coward
    Anonymous Coward

    > There's no difference.

    While not disagreeing with the general idea behind your post, not all systems are equal and in some cases (not necessarily OVH's, I do not know) working on live systems is unavoidable.

    I knew of a heart-lung machine repairman, whose job was to fix the thing when it broke in-theatre. Apparently the guy was an ace with a soldering iron.

  12. Lysenko Silver badge

    While not disagreeing with the general idea behind your post, not all systems are equal and in some cases (not necessarily OVH's, I do not know) working on live systems is unavoidable.

    I completely agree. I "beta test in production" regularly and, as expected, I regularly take down said systems because of bugs and mistakes. The difference is, those are non-critical systems and I send out messages several days beforehand saying that the system is scheduled for maintenance and should be expected to be offline both during the maintenance period and immediately afterwards because work might overrun (translation: we might cock it up or encounter an unforeseen problem).

    I knew of a heart-lung machine repairman, whose job was to fix the thing when it broke in-theatre. Apparently the guy was an ace with a soldering iron.

    High-pressure job. I bet he wasn't handing out 99.98% patient survival guarantees though. There's nothing inherently wrong in working with no safety net, it's just unprofessional to act all surprised when you eventually come crashing down and break something. If they advertised: "OVH is a Tier I service provider with DR provisions as limited as our fees." then I would have no problem with that.

  13. patrickstar

    Well, there is no backbone network built to your principles.

    Even when you build everything important fully redundant, the way the routing protocols work mean that a single configuration error or software bug can bring down the entire thing. See Level3 disaster a number of years ago.

    There is also no backbone network built with enough vendor diversity that a single bug (such as, say, configs magically disappearing) won't have widespread effects. When it comes to fancier features, interoperability is still so crappy that you need to stick with a single vendor to use them.

    The only alternative would be having two identical (but with different vendors) but separate networks in a passive/active configuration. And for the obvious cost reasons, noone even considers doing something remotely like this on a backbone-wide level.

  14. John Smith 19 Gold badge
    Unhappy

    "do not go around rolling out patches and upgrades like this on primary production systems"

    I've worked development on several large bespoke systems. Some had complete development and test environments, some did development testing on the live system.

    The latter were substantially more stressful to work on.

    So some people do.

    But if you're at the design stage it's much better to set up a way to switch the whole system to a "test" company and make all (well IRL as many as possible) of your mistakes in that system.

  15. trevorde

    The cloud

    it's just someone else's computer

  16. IneptAdept

    Re: The cloud

    Nothing to do with the cloud, You are aware that OVH are one of the biggest backbones in Europe

    https://www.ovh.com/us/about-us/network.xml

    Run multiple data centres and offer pretty good bang for buck services.....

    F*cking idiot

  17. Defiant

    Spammers

    One of the biggest spam networks going

  18. Anonymous Coward
    Anonymous Coward

    Re: Spammers

    Defiant - can you back that with facts?

    OVH proactively monitor outbound mail from services and if your a spammer you won't be online for long - repeat and you'll be out.

    I know it's great fun to be able to throw some "cool" comment on a topic you don't understand - but try not been that twat... I think you'll find life is much more fun

  19. IneptAdept

    Re: Spammers

    Hmmm,

    As has already been said spammers get nuked very very quickly now a days the problem I have had a few times though is getting a IP that was previously used for spamming.....

    But guess what 30 seconds of typing a support ticket and hey presto new IP block :O

    I really dont get a few of the commentators in this thread, you either have no idea who OVH are, Or do know but dont realise what they have actually done and become....

    I repeat my sign off from the last post

    F*cking idiots

  20. Anonymous Coward
    Anonymous Coward

    Re: Spammers

    OVH won't accept the return of any IP which is blacklisted or has other reputational problems attached to it.

    They force you to keep it until it's clean - and you pay a monthly fee for the IP's as you are no longer using them on a active machine.

    So your theory is b*ll*cks. F*cking idiot.

    In addition when you ask for a new block, they will see you've previously abused and you won't get.

    Once you get caught at OVH you don't get second chances. They don't want your business - your out.

    Also... it may take you 30 seconds to write the support ticket, but one thing OVH are not so good at is responding to email tickets... so it could be a couple of weeks before you get that new IP block via ticket.

    So I believe it is you who have no idea who OVH are, what they have done and become.

    Put down the keyboard and walk away...

  21. Anonymous Coward
    Anonymous Coward

    Hurray! More, please!

    IMHO, OVH doesn't have to bother with uptime monitoring.

    Just take stats from a couple of test websites on the Net: when the level of hack attacks drops, it means OVH is having a problem. Long may it last.

  22. Anonymous Coward
    Anonymous Coward

    Re: Hurray! More, please!

    Wow, big claims. Can you prove this?

    OVH have some of the most comprehensive and open monitoring tools I've seen of any major provider.

    Status site: http://status.ovh.net/ (detailed breakdown for all services)

    Network Mon: http://smokeping.ovh.net/ (monitoring across hundreds/thousands of links)

  23. 101

    Sometimes you deserve the Grinch....

    I know OVH based on my firewall logs which seem to snag OVH addresses by the ton.

    No tears from me.

  24. Anonymous Coward
    Anonymous Coward

    Re: Sometimes you deserve the Grinch....

    I believe they are the third largest cloud provider in the world currently, and the largest European based. So based on how stats work, I'd kinda expect to see them regularly in firewall logs more than the competition.

    Sure, been good value they will attract a certain type of customer (as well as major enterprise/genuine business users), but I think you'll find they are not very tolerant of users who do not work in there interests.

  25. IneptAdept

    Re: Sometimes you deserve the Grinch....

    Alot of this comes from the Kimsufi side of the business not the actual OVH side....

    Although same network so I suppose if you dont want to know you wont get to know....

  26. This post has been deleted by its author

  27. Anonymous Coward
    Anonymous Coward

    Worth noting that as much as people are picking up on the script kiddy side of OVH, OVH actually has some seriously large clients, custom accounts / solutions and provision services which are well beyond the realms of kids aimed squarely at enterprise.

    But posting the facts isn't cool or sexy, so yeah - we'll ignore that.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2018