yup DR is a still a requirement across data centres.....'the cloud' isnt magic and neither is bog standard hosting
HostingUK and its big brother, iomart, are still struggling to restore online services more than 12 hours after a bit barn blackout. A fibre cable break took down the St Asaph data centre in Wales last night and, judging by the howling on social media, pretty much all of HostingUK. Users began reporting problems from 2055 UK …
A cloud is great.
As ONE item of redundancy.
You could spread across several different clouds, across in-house and cloud, across in-house, externally-hosted and cloud. But a particular cloud is ONE point of potential failure.
Anyone who thinks otherwise ends up with problems like this.
Anyone with a brain, in such an instance, would go "Okay, time to failover to our secondary site which has NOTHING in common with the primary... not a company name, not a cable, not a service, not a switch".
"You could spread across several different clouds, across in-house and cloud, across in-house, externally-hosted and cloud. But a particular cloud is ONE point of potential failure."
Depends which cloud. Go to one of the big boys - AWS or Azure say - and you'll get a data centre network with the capability* for you to host and run your applications with proper and complete redundancy and fault tolerance built in. Go to one of the relative minnows who have been bought and sold by a bunch of VCs with costs stripped to the bone each time and you'll get what you pay for. For better or worse this whole sector is becoming like the big supermarket chains vs the Mom'n'Pop corner stores - the bigger guys do it mostly better and cheaper, but diversity dies.
* although they give you the capability to do it right, they're kind enough to let you do it wrong if you are so inclined, so you'll be just as vulnerable to this kind of failure unless you tick the right boxes on the web console too
"Anyone from advertising would probably sell a one drived desktop computer as a cloud computing service, to me its just marketing and I take no notice until actually seeing what it behind the marketing rubbish."
They do. Just look at the marketing for consumer NAS boxes (or even just external USB drives with a built-in in web interface FFS!)
I've seen this time and time again over the past 10 years..
A small data center, initially setup to serve local businesses (for small hosting outfits, or online companies), they market themselves as having multiple connections but fail to mention they all use the same pipes.
This is not always their fault, as periodically diverse geographical routing to multiple POP's is not feasible or too expensive. But once acquired, a company the size of IOMart has no excuse for not addressing this.
in spite of what the telco salesman said
Or in our case, after the supplier of one fibre bought the supplier of the other and decided to reduce costs by 'simplifying' the joint network. It takes ongoing monitoring to track that sort of behind-the-scenes shenanigans.
It also takes money. I remember St Asaph, but can't remember who originally came up with a cunning plan to locate a datacentre there. I do remember a couple of pitches to buy it or PoP it though, and quickly found a.. snag.
Network wise, it's pretty much in the middle of nowhere, so the civils costs to get either our own fibre, or persuade other carriers to run fibre to it were horrendous.. And then there would be minor logistical challenges, like persuading the Welsh HA to grant permits to dig up kinda key roads in and around N.Wales. Since then, there's been work for supporting wind farms off N.Wales, and sorting out backhaul for some new cables that land around there, so there may be some more options. Wales isn't (or wasn't) very well served by competing fibre providers, especially if you were after dark fibre.
But such is telecomms. When looking for services, it's critical to specify exactly what you're after. If you want true diversity, or to specify minimum route seperation, you have to order that as a service. Otherwise providers can (and often will) re-groom capacity to suit their needs, and unless you have a contract specifying route seperation, you're generally SOL. Or just offline. Just ordering 2 circuits and hoping for the best may work out cheaper in the short term, but it's best to try and speak to the planners so you get what you need.
And in our case the secondary provider who was using KComms fibre network when we signed the contract, while our main route used BT, then subcontracted his service to Openreach. This ended up with a site with dual entrance of separate fibre networks with disparate paths back to separate cabinets being routed back to the same single fibre 100 meters up the road. all completely transparent to our network monitoring kit.
all completely transparent to our network monitoring kit.
Ah, well.. You can monitor changes with the right kit. So either an OTDR and occasional tests to see if route length has changed at all, or that's often a function built into decent DWDM kit. Which can then alert on changes, cuts, and sometimes degradation. But those only work on dark fibre routes.
"periodically diverse geographical routing to multiple POP's is not feasible or too expensive"
In a lot of cases the telco sells a customer "geographically diverse connections" and then as soon as they're out of sight of the premises both connections end up shunted down the same duct.
BT has been particularly bad for this wee rort. Make sure you get a map of your entire fibre route from start to finish and get the salestwat to sign off a statutory declaration that it's been done to spec.
"they market themselves as having multiple connections but fail to mention they all use the same pipes."
Yes, that struck me too on reading the article. Surely those "converging fibres" ought to converge inside the DC, having taken different routes to get there.
In North Wales one would have thought that the farming would mostly be livestock rather than arable.
Moreover, putting in an occasional crop of winter barley or similar would only involve ploughing, harrowing and seeding to 12 inches at most.
Trench your fibre down a couple of feet and no normal farming practices will ever touch it - as evidenced here where it was cut by a trencher cutting a micro-trench to a depth of two metres, not a plough, disc harrow or other common farming implement.
In this specific case though, the fibre to St Asaph runs from Chester all the way out to Holyhead and across to Dublin as well as looping down around West Wales.
Seems short-sighted that all the transit for their DC would be going East/in-bound to England and that they can't fail over to a Westerly route, even if traffic has to go via Ireland in the process. This is presumably what they refer to in their status updates when they mention transit re-route and new fibre provisioning as two of their three resolution paths (in addition to getting the existing links repaired!).
Hedge cutters can be a bigger risk than the dreaded back-hoe fade. Depending on how cables are buried, soil creep can move them. Then hedge cutters being a big spinning spikey disk can quite happily dig into soil, yoink the cable up and break it. And running hazard tape above the cable is easily ignored. Billing the farmer for damage might be less easily ignored, depending on how the wayleave was granted.
I've lost home connectivity (I work from home in a field down a lane) twice now with hedgecutters borking it and have had to fallback to 4G connectivity.. fallback to 4G connectivity... fallback... Hey, I have a great idea for a money earner! We'll get all the kids together, and put on a show and call it Ethel the Aardvark Goes Business Continuity Planning.
"Who exactly thought it was "a good idea" to plant a fibre line in a farmers field "
When I was last involved in a project of that nature, the "planting" involved a pair of Caterpilar D8 bulldozers and a mole plough putting the armoured fibre _at least_ 2 metres down with plenty of warning tape on top before the trench was reclosed.
It's not so much "who thought" of the path as "who cut corners" when implementing it.
A Datacenter I dealt with found out the hard way that despite spending a fortune on having 3 separate power lines heading out in different directions, they all ended up at the same substation which went poof when an idiot with a hacksaw removed himself from the mortal plane trying to steal a copper bus bar!
Rules of natural selection, as my Spanish-wife calls it (shes Dr). Her favourite mentions:
- you'll rarely find a yellow line or "mind the gap" signs in train stations in Spain, if people are too stupid or drunk to see the gap, well that's their fault
- They don't close a whole motorway after an accident, just the lane, people should see the results of stupidity
Have an upvote and a beer :)
I've honestly lost count of how many DC's have single suppliers and/or single points of failure.
I remember one company I worked for, now gone, that boasted how one of theirs had redundant-everything, including power and fibre.
However, both power and data were from a single provider. Both came into the DC together down the same routes.
I know of at least one large gov department could be shut down just by digging a trench outside their office. All of their redundant network lines go down the same physical route, side-by-side until they branch off in opposite directions as well.
Back in the early seventies, when I was working for a very large electrical engineering company in the Midlands, we were all beavering away making Arnold's Millions for him, when the whole site went dark. A construction company was improving the main road out from town out towards the motorway and a very big digger had cut right through the main incomer to our site. In the resulting explosion, the bucket and outer half of the arm had melted, and the molten metal had cooled and fused into a solid lump in the resultant hole. It took most of the week to reinstate the high voltage supply to the factory, but the offices were given an emergency supply by means of a couple of diesel generators in the car park. ( Icon because that's what it looked like from our office).
We did have to different fibre paths ( 2 fibre trunks with 12 strands each) between our two DCs. We even had (under NDA) the full schematics for these paths through the city including the layout of the electric, gas and waste lines. Inlets into the buildings were on different sides and the cables were never crossing. Only on one street bend they were on opposite sides for about 10m. And the cables were inside the old gas pipes that were used by the local gas/electric company for fibre deployment.
One day HP Openvie not only screemed red for on side of the network interconnect, but even the SAN links on that side. Our monitoring guy had that cross-referenced and Openview alerted a full cable break.
We went int our DC used the OTDR equipment and found the break distance exactly at that point.
Cue 3 networkers and 2 SAN people running flat out to that position (on foot faster, because we could use a park) with our team lead alerting facilities about an incoming power outage (the substation main line was about 1m below our fibre).
We could stop the back hoe driver from destroying out backup link and municipal utilities did have some serious words with them (and much swearing). On the paperwork the construction company used, the fibre conduits were still classified as active gas lines. How they could even think to continue after hitting one, the mind boogles.
- Not to request a Google Earth map with routes of the fibre prior to signing any paperwork.
- Not to put it contractually that any changes to fibre topology are to be made clear in advance and potentially to allow for contract break (pun intended) in case of changes or altogether to prohibit shenanigans like that in advance?
We did have redundant paths with our AS and telephone exchange to two different providers. They were contractually obliged to have them at least 5m apart. about 50km south the lines had to cross the river Elbe at Hamburg. The lines were in different pipes and the distance was kept. But they used the same bridge for the crossing only on different sides. One day a lorry carrying diesel fuel caught fire and damaged the bridge. The fire destroyed all cables inside the pipes on both sides of the bridge. We lost all connectivity for a day.
Legal had to fess up that they had not required different crossings or risk mitigation beyond 25km.
My company supplies services to toll gates on a specific route in South Africa.
In our case the project managers (quite a few) discovered something exciting.
The existing fiber-optic cable, which was trenched back in 1980 was starting to surface at a point where the road goes through a hilly area.
Further investigation revealed that the contractor responsilble for trenching for the fiber did a half-assed job - as the section was very rocky, he did not want to waste money and manpower doing a proper job, and only dig the trench half as deep as it should be, then skedaddled off.
And further investigation also revealed that at manhole checkpoints the fiber was trenched at the correct depth levels, but between the manhole checkpoints, the trench was halfway deep.
A lesson was learnt from this - daily physical progress checking was to be done by a designated person on trenching, so the next trenching contractor will not be able to do the job the "easy, quick and cheap" way...
Fortunately another company trenched a new fiber cable along the same route, and this was properly done.
Anon, because I may be identified.
This goes to show that engineers will say "cable A goes along route A and cable B goes along route B" but they will not be able to control the quality of the physical trenching and installation unless a designated person do regular checks with the power to deliver a sixpack whoopass should that be required - also should the two trenching companies be in cahoots and decide to do one trench run (share equipment, share the money left over) instead of two separate trenches.
You may think you have a bullet proof separate feeds but we have completely separate links into two different completely separate countries, taken out by a major outage in a 3rd country.
You can separate all you want, by if some numpty in a back haul carrier f**ks up, then you just have to give up, grab a coffee and wait for the shit storm to clear.
It's a way of ordering cables to make sure they can't both be damaged by the same event. Usually you have fibres going through different places. If that is impossible, you can still put a slab of concrete in between the fibres making the "yellow fibre finding aparatus" break before it can find the other fibre.
I was a customer of Melbourne in the early days, back when it was Northern Colo, and still am an iomart customer. Perhaps not for much longer.
There is no doubt that the service has degraded since the Melbourne days. Back then they had multiple transit / peering points around their Manchester MAN for resilience, but the recent outage revealed that all of that had gone to be replaced by a national iomart loop network. (something which hadn't been shared with their customers)
Subsequently, they've demonstrated continuing lack of understanding of the word 'diverse' (hint: if you purchase all your connectivity from one supplier it is certainly lacking diversity). More so, in asking questions about their true resilience and network design, I've been met with continuing stalling responses and my ticket is now being ignored despite regular chases. In many ways, this is the bigger concern: s**t happens, but it is taking accountability and transparency of comms which sets out the true professionals. Melbourne got this right; iomart don't seem to have a clue.
"Melbourne got this right;"
But it cost more, which meant it couldn't compete...
"iomart don't seem to have a clue."
...with the outfits who don't do it right
The lesson seems to be about paying attention to who your providers really are and whether things have changed due to mergers/takeovers.
Oh, and LARGE penalty clauses for fucking things up.
RAIN: Redundant Array of Infrastructure Networks.
If you don't use RAIN with CLOUDS, then you've MIST the point.
(MIST: Multiple, Independent Service Technologies)
RAIN helps you avoid DROUGHT when running services on CLOUD.
DROUGHT: Distributed Randomly Over Unreliable Global Hosting Technology
They may be part of IOMart and their website (HostingUK) certainly makes much of the 10 data centres , but they actually operate out of just two data centres.
The noise about ten datacentres is just interesting since they actually only use two of them.
This time they managed to have in place a emergency answering service who had the same level of information available on Twitter and various blog sites, so lesson learned there.
The last thing i've heard from them is that they are ordering another line, but the Twitter feed was interesting in that it mentions staff working from wednesday on issues, so that was probably Fibre 1 cut , with little action until the Farmer cut the Fibre 2 on Friday. Probably not the 2km dash by the Farmer in the middle of the night.
I started asking about moving Data Centres but then re-read the two data centre weasel words so now i'm having to look round for big boy geo redundant hosting.
Biting the hand that feeds IT © 1998–2019