back to article Crashed and alone in a remote location: When paid help is no help

This Damn War image via Shutterstock I took the plunge and became a freelance IT consultant in 2001. Through an unlikely series of coincidences (former colleague from London goes to travel show in France and bumps into two guys from Yorkshire who are looking for a software and database architect) I ended up in North Yorkshire …

  1. MJI Silver badge

    Interesting read

    Real life stories like this always worth a nose

    1. NotBob

      Re: Interesting read

      Indeed, but the title seems to have nothing to do with the content.

  2. John Hawkins

    Wilds of Yorkshire

    "The Third World" sketch in the Python film "Meaning of Life" springs to mind here for some reason...

    1. John G Imrie

      Re: Wilds of Yorkshire

      Every sperm is sacred.

      1. Antron Argaiv Silver badge
        Windows

        Re: Wilds of Yorkshire

        Lost me job down 't mill.

        It's medical experiments for the lot o' you...

    2. gypsythief

      Re: Wilds of Yorkshire

      That was actually filmed only a couple of miles down the road from where this story was based.

      I've got to say though that Twistleton Scar is distinctly more wild than the rolling drumlins around the location in question.

      Still, nice to have a story from my backyard!

  3. itzman
    FAIL

    24 hour service response...

    Yup Cisco tried to sell this to my customer. 'Anywhere in the UK, sir'.

    The customer was sceptical 'and how doe's that work in a gale when Guernsey airport is closed and the ferry cant dock, for three days due to high wind and seas?'

    Instead, they carried a complete set of spares, themselves.

    1. Uncle Slacky Silver badge
      Holmes

      Re: 24 hour service response...

      Easy - Guernsey's not in the UK...

  4. Steve Davies 3 Silver badge

    The Server CPU Swap game

    There we were working away in Central Asia when one of the Two servers died. We found that it was a CPU board failure. The nearest one was in Moscow (3 time zones away). After several long phone calls we dispatched one of the client team to the Airport. He caught a flight to Moscow where he was met by the Field Service Manager. A CPU Board swap took place in Sheremetevo Airport and the return flight was duly caught. The Local went because he didn't need a visa to enter Russia. Us westerners woul dhave needed one. The airfare at the time for locals was 1/4 of that of us rich westerners.

    A little under 10 hours after the crash the System was up and running again.

    The IT director took us out to dinner for fixing the system in the way we did.

    This was in the Mid 1990's. Those were the days.

    1. Rich 11

      Re: The Server CPU Swap game

      Those were the days.

      Those were also the days when one or another Babyflot which had skimped on its maintenance schedule would have a bird drop out of the sky every month. Did your local ask for danger pay?

  5. PickledAardvark

    Quality service from DEC

    An employer in the 1980s had scheduled an upgrade to a VAX. The engineer was supposed to be on site for a few hours during which the system would be unavailable -- timed to cause minimum impact to customers in Europe and Japan.

    Everything was going well until the engineer stood on the last board to be fitted. Ouch. DEC found a replacement in Manchester and it arrived three hours after the accident. That's a pretty good time for handling a distress call after normal working hours, finding the board in a warehouse and driving it to the Midlands. The VAX was back in service later than scheduled but no customer complained.

    It was an early learning experience:

    * Competent plans go wrong in unexpected ways;

    * Great suppliers are the ones who do a good job when things go wrong, not organisations that (unrealistically) never make mistakes;

    * Set honest expectations -- with staff and customers -- and be frank about errors and problems.

    1. Dabooka

      Re: Quality service from DEC

      *Don't leave kit lying around on the floor

      I still haven't come to terms with that last one depsite the 'learning opportunuties'

      1. PickledAardvark

        Re: Quality service from DEC

        Not a lot of room between the back wall of the server room and the server rack. You are right though -- don't put kit on the floor unless there is nowhere else to put it. It's human to trip over.

        My argument was about how the organisation providing a service responded to a foul up.

    2. Anonymous Coward
      Anonymous Coward

      Re: Quality service from DEC

      DEC found a replacement in Manchester and it arrived three hours after the accident. That's a pretty good time for handling a distress call after normal working hours, finding the board in a warehouse and driving it to the Midlands.

      DEC were good at that, sadly it's probably the cost of that sort of service that killed them.

      Our office in Belfast was bombed late one Friday afternoon (we had the misfortune to share the building with a tax office). No-one was hurt, everyone evacuated in time, and the servers all came back up OK on Saturday once we had the all-clear, but the offices were uninhabitable (smashed windows, ceilings down) and most of the terminals on peoples' desks were wrecked.

      Our boss put the DR plan into effect on Friday evening, and phoned DEC. Saturday lunchtime, while our own guys were cabling up spare space in the building next door, DEC arrived with a vanload of new terminals and other kit, driven up from Dublin. Local DEC guys helped get them set up, and by 9am Monday morning everyone had a desk and working terminal.

      I'm not sure it would happen that smoothly these days...

  6. HmmmYes

    Well, it does reiterate what a redundant system is:

    One box here, the other box over there. Hopefully on a different network + power supply.

    In another building would be good.

    Its not a good idea to have a 'redundant' server that can wiped out by a single cup of coffee.

  7. captain_solo

    First off, it's always almost faster to listen to your user base and their ability to detect service failures than to rely on autonomic monitoring systems. Even remote telemetry solutions like Sun/Oracle, etc have are generally slower than the user picking up the phone to cut a ticket. Still helps with failures that don't cause an outage that you might not notice until you scrubbed a log, but, when the server completely craps out you will likely know before your monitoring tools do.

    Remote locations often consider an onsite parts agreement so that critical components are in the DC already for the engineer or the customer themselves to use to restore service without waiting for delivery of parts that could be delayed due to weather, traffic, or because the one part you need is out of stock at your local stocking location.

    Most of the time its probably cheaper (long term TCO) to have N+1 redundancy than to rely solely on Premium support SLA to keep you in business. Depending on the costs of an outage you might be able to get by with business hours support on gear that you can afford to lose availabilty on for a few hours. Clustering, load-balancing, now "serverless" application designs or VM/container mobility strategies can buy you time to diagnose and restore individual nodes without having to make the panic call to the vendor at 0-dark-thirty. Of course back in the day of this story there were less options on that front and the redundant gear tended to be a little pricey to be left idle.

    Cool Story Bro

  8. Tom 7

    Tisleton Scars - not very remote

    I could be in the Hill Inn in twenty minutes from there.

  9. Destroy All Monsters Silver badge
    Thumb Up

    That photo looks like something from "Dear Esther", with a better engine

    Effing rooohmanthic!!

  10. Matt Bryant Silver badge
    Happy

    Failed CPU crashing server, not uncommon.

    IIRC, this was an issue for the different UNIX flavours of the period, they could swap a failed CPU out as long as that wasn't the monarch CPU running some of the kernel strings. TBH, it was a great way to scare manglement and get budget for a second system and clustering software, to point out that in a 4-way server a CPU failure was 25% likely to be the monarch, a crash and a total loss of service. "25%" sounded scary, I just used to omit the small likelyhood of a CPU failure into the maths.

    As for no "SSDs" - ahem - yes, there were solid state devices available. In 2001 I was using Texas Memory Systems' Ramsan solid state boxes to boost Oracle databases.

    1. Marshalltown

      Re: Failed CPU crashing server, not uncommon.

      Mmmm, unless the system somehow rolled random numbers during boot up, the odds are that the very same CPU was boss after every boot up simply because of the physical layout of the system. That would mean that one CPU would likely see greater wear and tear so-to-speak than all the others. So those 25% odds were probably weighted toward the house more than you might expect.

  11. Anonymous Coward
    Anonymous Coward

    Hmmm

    Then one evening at about 6:45 I was having dinner

    I do think sysops should really get into the habit of saying things like "18:45 ZULU". It lightens the load on ticket wrestlers and possibly lawyers...

    1. Will Godfrey Silver badge
      WTF?

      Re: Hmmm

      Who on earth wants to 'lighten the load' on lawyers!

      1. Crazy Operations Guy

        Re: Hmmm

        "Who on earth wants to 'lighten the load' on lawyers!"

        I worked for a law firm: a smaller load on the lawyers means that they in the office less; being in the office less means that they don't have quite as long to break their systems in new and exciting ways.

  12. OzBob

    Interesting from the point of view of support

    Always found support in the midlands to be 3-4 hours for one vendor, fascinating to hear that Yorkshire has much better response (but I guess they were paying for it).

  13. Stoneshop

    SSD wasn't even heard of back then

    Well, NAND flash SSD, maybe.

    Basically, core memory is SSD too. And in the 1990's several manufacturers had a couple of solid state drives in their program DEC had one, physically the size of a HSC50 (can't recall the model number; ESE50?) which was essentially a backplane filled with 150MB worth of DRAM boards and a SDI interface, plus a MVAX board with an RD54 hooked up and an UPS. If the power went out, the UPS was to keep the lot running while the memory contents were transferred to disk. Later they had a drive with a 3.5" form factor, SCSI interface, static RAM and a rechargeable battery. Couple hundred MB, IIRC. No idea of the list price of either, but definitely well over that of their size in spinning rust.

  14. Marshalltown

    Service? What is this - service?

    I have only worked for one business where the owner was willing to pay for a service contract. For the rest, we made up a song, "The Electron' Swap" to cover how "service" was done:

    "I entered the office late one night,

    The hardware systems were a ghastly sight,

    Our two 'hardware specialists' had their screwdrivers out,

    there were pieces of gear all strewn about.

    They did the swap, the electron' swap..."

    The nearest Frys was over an hour away.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon