back to article OVH goes TITSUP again while trying to fix its last TITSUP

European web hosting outfit OVH has reported its second major outage and Total Inability To Support Usual Performance* in a month and admitted the new outage was caused by its attempts to fix the cause of the last one. OVH's attributed its November outages to power problems and cable cuts. But this incident notice filed by …

Ouch

When SDN goes bad hey? You gotta almost feel sorry for them.

6
1
Anonymous Coward

That tone

Simon, you were going OK until almost the last paragraph, then you felt the urge to go with this gratuitous snipe:

"But there'll also be plenty who were impacted, and irritated, and wondering why they give their business to a company that's also experienced flood damage and can't configure routers well enough to avoid this sort of thing."

Maybe you would like to stroll over and show them how it's done?

I am not a customer of theirs, but when one of my suppliers drop the ball I do not shout at them: I offer to help them¹ getting back on their feet so that we both can carry on with our respective businesses. Everyone makes mistakes and we need to factor that into the equation, else we're not running our show properly.

¹ Usually a great way to do this is to leave them alone so they can spend their time actually fixing the problem, something to which answering questions from customers does not usually contribute.

8
14
Gold badge
Unhappy

"something to which answering questions from customers does not usually contribute."

Except in the sense that ignoring customers when you're company has f**ked up is a good way to ensure they stop wanting to be your customers.

And if enough of them do so with your business then pretty soon you're not going to be needed anymore.

Keeping customers (internal or external) informed should be part of any generic "S**t-hits-the-fan" DR plan. IT should not have to do it but there should be some kind of SITREP process that can be used to inform customers and someone on both ends of it who deals with it.

Let's be real. S**t will hit the fan. Anyone who's thinking "That never happens to us" is deluding themselves. It has simply never happened to you yet. So fail to plan or plan to fail.

10
2
Silver badge

Re: "something to which answering questions from customers does not usually contribute."

OVH are really cheap compared to similar offerings. At the end of the day you get what you pay for (well most of the time, im talking informed decisions not PC world.)

7
2

Re: "something to which answering questions from customers does not usually contribute."

Cheap they certainly are, very good value. That's why we use them and also why we use another supplier in case of failure. Last night was fun...

7
0
Anonymous Coward

Re: "something to which answering questions from customers does not usually contribute."

> Except in the sense that ignoring customers

Where does the above say anything about ignoring customers? Of course those affected (customers or otherwise) by an incident need to be kept informed, and that is a standard part of any contingency plan.

However, customers (or anyone else) just jumping up and down and calling every five minutes thinking that is going to get things fixed any quicker is counterproductive.

4
0
Silver badge

The joys and pains of being in IT...

8
0
Silver badge

Maybe you would like to stroll over and show them how it's done?

Evidently, quite a few people here could. You do not go around rolling out patches and upgrades like this on primary production systems. You have a staging environment, which is also your tertiary failover system. Once you're happy that staging is updated and apparently idling happily, you temporarily promote it to secondary and then do a failover test (which should be a routine, monthly event) by taking the primary offline. If things go TITSUP then the regular secondary system cuts in and you immediately bring up the primary back up again and investigate at leisure.

The point is, you always have three levels of redundancy and you always have two systems in known good (as in previously production tested) configuration. This isn't rocket science. It's a simple, sequential procedure. It costs money of course and it may not represent appropriate ROI for every business but in that case, say so and don't pretend to act all surprised when things crash and burn - it just looks like incompetence rather than the commercial risk/benefit/cost calculation that it (hopefully) actually is.

12
1
Bronze badge

Are you seriously suggesting they build three backbone networks instead of one?

Your approach works very well with servers. It doesn't work for networks.

6
2
Silver badge

Are you seriously suggesting they build three backbone networks instead of one?

Your approach works very well with servers. It doesn't work for networks.

There's no difference. Tier IV is defined in terms of overall system resilience and ability to mitigate TITSUP conditions. It's irrelevant whether it is the network or the servers or the HVAC that goes down. All critical components must be fault tolerant and/or redundant. If that means you need to string a whole new fibre pipe across the Atlantic then that's what you have to do.

However, repositories of cat photos don't exactly justify Tier IV, so I would never expect most businesses to invest in that. What I do expect them to do (as I noted at the outset) is to say what service level they're aiming for and not act all dazed and confused when their Tier II (or I) infrastructure crumbles beneath them. You expect that with Tier II. That's what defines it as Tier II. That's why it's (comparatively) cheap.

0
0
Anonymous Coward

> There's no difference.

While not disagreeing with the general idea behind your post, not all systems are equal and in some cases (not necessarily OVH's, I do not know) working on live systems is unavoidable.

I knew of a heart-lung machine repairman, whose job was to fix the thing when it broke in-theatre. Apparently the guy was an ace with a soldering iron.

1
0
Silver badge

While not disagreeing with the general idea behind your post, not all systems are equal and in some cases (not necessarily OVH's, I do not know) working on live systems is unavoidable.

I completely agree. I "beta test in production" regularly and, as expected, I regularly take down said systems because of bugs and mistakes. The difference is, those are non-critical systems and I send out messages several days beforehand saying that the system is scheduled for maintenance and should be expected to be offline both during the maintenance period and immediately afterwards because work might overrun (translation: we might cock it up or encounter an unforeseen problem).

I knew of a heart-lung machine repairman, whose job was to fix the thing when it broke in-theatre. Apparently the guy was an ace with a soldering iron.

High-pressure job. I bet he wasn't handing out 99.98% patient survival guarantees though. There's nothing inherently wrong in working with no safety net, it's just unprofessional to act all surprised when you eventually come crashing down and break something. If they advertised: "OVH is a Tier I service provider with DR provisions as limited as our fees." then I would have no problem with that.

1
0
Bronze badge

Well, there is no backbone network built to your principles.

Even when you build everything important fully redundant, the way the routing protocols work mean that a single configuration error or software bug can bring down the entire thing. See Level3 disaster a number of years ago.

There is also no backbone network built with enough vendor diversity that a single bug (such as, say, configs magically disappearing) won't have widespread effects. When it comes to fancier features, interoperability is still so crappy that you need to stick with a single vendor to use them.

The only alternative would be having two identical (but with different vendors) but separate networks in a passive/active configuration. And for the obvious cost reasons, noone even considers doing something remotely like this on a backbone-wide level.

0
0
Gold badge
Unhappy

"do not go around rolling out patches and upgrades like this on primary production systems"

I've worked development on several large bespoke systems. Some had complete development and test environments, some did development testing on the live system.

The latter were substantially more stressful to work on.

So some people do.

But if you're at the design stage it's much better to set up a way to switch the whole system to a "test" company and make all (well IRL as many as possible) of your mistakes in that system.

4
0

The cloud

it's just someone else's computer

0
4

Re: The cloud

Nothing to do with the cloud, You are aware that OVH are one of the biggest backbones in Europe

https://www.ovh.com/us/about-us/network.xml

Run multiple data centres and offer pretty good bang for buck services.....

F*cking idiot

6
1

Spammers

One of the biggest spam networks going

1
7
Anonymous Coward

Re: Spammers

Defiant - can you back that with facts?

OVH proactively monitor outbound mail from services and if your a spammer you won't be online for long - repeat and you'll be out.

I know it's great fun to be able to throw some "cool" comment on a topic you don't understand - but try not been that twat... I think you'll find life is much more fun

3
0

Re: Spammers

Hmmm,

As has already been said spammers get nuked very very quickly now a days the problem I have had a few times though is getting a IP that was previously used for spamming.....

But guess what 30 seconds of typing a support ticket and hey presto new IP block :O

I really dont get a few of the commentators in this thread, you either have no idea who OVH are, Or do know but dont realise what they have actually done and become....

I repeat my sign off from the last post

F*cking idiots

0
0
Anonymous Coward

Hurray! More, please!

IMHO, OVH doesn't have to bother with uptime monitoring.

Just take stats from a couple of test websites on the Net: when the level of hack attacks drops, it means OVH is having a problem. Long may it last.

0
4
Anonymous Coward

Re: Hurray! More, please!

Wow, big claims. Can you prove this?

OVH have some of the most comprehensive and open monitoring tools I've seen of any major provider.

Status site: http://status.ovh.net/ (detailed breakdown for all services)

Network Mon: http://smokeping.ovh.net/ (monitoring across hundreds/thousands of links)

2
0
101

Sometimes you deserve the Grinch....

I know OVH based on my firewall logs which seem to snag OVH addresses by the ton.

No tears from me.

0
2
Anonymous Coward

Re: Sometimes you deserve the Grinch....

I believe they are the third largest cloud provider in the world currently, and the largest European based. So based on how stats work, I'd kinda expect to see them regularly in firewall logs more than the competition.

Sure, been good value they will attract a certain type of customer (as well as major enterprise/genuine business users), but I think you'll find they are not very tolerant of users who do not work in there interests.

1
0

Re: Sometimes you deserve the Grinch....

Alot of this comes from the Kimsufi side of the business not the actual OVH side....

Although same network so I suppose if you dont want to know you wont get to know....

1
0

This post has been deleted by its author

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2018