back to article Lateral thought saves sizzling server

I learned a long time ago that generating random numbers (really, truly random numbers) is a non-trivial exercise. However, I completely failed to apply that computer science lesson to the real world of computing and continued to believe that events in the Newtonian world could happen without a cause. Such a belief system is …

COMMENTS

This topic is closed for new posts.
  1. Dale

    Heat

    I used to be responsible for an old Netware server that had run continuously and relatively painlessly for five years in conditions that were definitely not ideal. It started its life on the dusty floor of a garage that had been converted into a makeshift studio for the new community radio station it was serving. It endured several moves and took a few good knocks in its time. Summers in that town were hot and humid and there was never any airconditioning. The six full size SCSI hard drives would literally burn my finger if I accidentally touched them.

    After five years of faithful and mostly trouble free service in these appalling conditions, it eventually started crashing (AbEnding, in Netware-speak) randomly very much like in the article, which obviously wasn't very desirably considering it was serving up live audio for the radio station. After many late night callouts I eventually deduced by experimentation that the SCSI controller was on its way out, and took the opportunity to convince the bosses to replace all the SCSI drives with a pair of mirrored IDE drives. By that stage IDE had caught up with SCSI in terms of performance and capacity, and was significantly cheaper.

    We eventually also installed airconditioning in the server room but, having no budget for pretty much anything, we couldn't even install a water pump to pump away the condensation. So for a couple of years the aircon was draining into a large barrel, and there was a staff member who had to remember to empty it every evening. Occasionally they forgot... which is why the server was finally raised off the floor and got a space all of its own.

  2. Andrew Carpenter
    Flame

    The lesson here is:

    That NetWare servers don't crash at random.

    Fire icon for obvious reasons.

  3. Anonymous Coward
    Anonymous Coward

    That's nothing

    I had a 'faulty' ceiling mounted projector in a Primary school classroom. Often it would run for the whole morning, then switch itself off at lunch. Sometimes it would switch itself off after a couple of hours. Sometimes (as far as I knew) it would work fine all day. It always started up immediately after switching itself off and had no unusual settings. I tried changing the bulb, but that didn't help.

    Unfortunately I didn't have a spare at the time, but the holidays were coming up. The following week, I swapped the projector over with another of the same model. The problem stayed with the room, and not the projector. So, it must be the mains - but everything else in the room was fine, and not cutting out. The ceiling socket looked fine, and Estates new of no fault with the electrics in the building.

    OK, so more investigation. I connected an anglepoise lamp to the projector and left the room for half an hour. When I came back I peered in through the classroom window, and the light had gone out. I walked back into the room, and, as if by magic, the lamp was back on! Then a moment of inspiration. In the corner of the room was what looked very much like some kind of motion sensor. I left the room for half an hour, waited for the light to go out, then walked to the doorway and stuck my arm in. The light came on.

    After Estates took a look, it turned out that the electricians who had put the extra ceiling socket in for the projector had tapped it off a mains power cable running along the top of the room. This cable was from an old lighting circuit connected to a motion sensor that everyone had forgotten about. The teacher of the classroom had her pupils working so quietly that sometimes the motion sensor thought the room was empty and so switched off the projector.

  4. Matthew Ellen
    Alert

    Coincidence

    I was thinking of visiting thedailywtf.com but then came here instead, only to read your story.

    It's as if the IT web press is reading my mind...

  5. amanfromMars Silver badge
    Paris Hilton

    Hot stuff always causes lateral [horizontal] thoughts. :-)

    Some would posit that nothing is random, Mark. And waste no time or effort in arguing the point. Life is just too short to argue about IT.

    Spookily enough, that would then render random number generation for security, a bit of an impossibility.

    Thanks for the heat, Paris.

  6. Anonymous Coward
    Boffin

    eh

    So the server was randomly crashing and it took you how long to consider it might just be overheating?

    Weeks!

    Shocking :P

  7. Tanuki
    Thumb Up

    Cause and Effect

    Reminds me of a network glitch I investigated 20-odd years back.

    Freakishness on a cross-site link between 08:30 and 08:40 wednesdays/fridays. UARTs locked-up in utter confusion; X.25 protocol errors for 10/15-second bursts.

    Much time was spent hunched over a Hewlwtt-Packard 4951A "electric handbag" protocol analyser.

    Turned out that the site had one employee who only worked wednesdays-fridays.

    Her partner was a taxi-driver.

    He'd drop her at work - stopping right outside the data-centre - then use his radio to call the taxi-despatch for the first job of his day.

    1980s-era synchronous datacomms often had poor RF-immunity!

  8. Anonymous Coward
    Stop

    Eh?

    Is it just me or is this not one of the first things you would look at with a randomly crashing server?

    Once Operating Environment anomolies are discounted by checking the installed base, environment is the primary area of investigation in any such escalation followed quickly by human factors.

    Any server worth it's salt would warn that it was getting too hot. Even PCs do that nowadays!

  9. Chris Thomas
    Unhappy

    STFU

    Dude, I use this "it's random" argument sometimes when I want to get the boss off my back, don't give him ideas about why saying: "It's random" is bullshit, I need that excuse to give me more time to fix something before he calls me on it and demand that I explain it in other terms than "it's random" at which point, I have to say,

    "I dont know, errr, hey, look at that hot chick on the second floor with the nice boobies!"

    and hope he just forgets about it!

    Ruining my day man! STFU!

  10. Anonymous Coward
    Anonymous Coward

    Phone problem

    When I was a student, the hose phone started behaving oddly. This was about six months or so, after changing from BT to NTL (against my wishes!) sometimes the phone would chirrup and there would be someone at the other end, sometimes people would complain that they had been ringing us all night and no answer, other times we would pick up the phone and someone would be there, but it wasn't ringing. All this, while the phone was, occasionally, appearing to work normally. NTL visited about three or four times over a three month period until I got fed up and demanded that BT be put back in.

    The BT guy came round and asked where we wanted their box, I told him and he asked if we minded if he used the old NTL wire across the house, which we would no longer be needing, I said that was fine. About twenty minutes later he said that something was wrong with the wire and he needed to go out to the van to get some test kit. Upon connecting the tester he pinpointed the problem being behind a radiator. The NTL guys had put a metal staple through the cable.

    The phone was fitted in the summer, when there was no need to use the radiator, when we turned the heating on it caused expansion which made a short, when the radiator was off, everything was working fine.

  11. Steve Evans

    Ah, sounds familiar...

    I had a machine that suffered from sunburn too..

    My favourite weird one was the machine that would never crash when left on it's own.

    I could access it remotely, and run it's little cpu 100% 24/7, no problem at all.

    But if I sat down at the desk with it, within an hour it would crash.

    Until the one day it didn't... The one day I used it for hours whilst sitting at the desk, and it didn't put a foot wrong. The one day when my mobile phone was off being repaired!

    Since then I've never owned a PC with a plastic case.

  12. ian

    Dataplex

    Many years ago, I gleaned from a far-away operator that the kit only went wrong when the sun came out and she had the covers off.

    The kit was controlled by photo-sensors....

  13. Robert Moore
    Coat

    Insightfull

    Sorry, meant waste of time to read

  14. Seán

    WOW

    It's amazing how simple problems seem difficult if you don't think.

  15. Anonymous Coward
    Happy

    Maybe ...

    ... you should have persuaded her to play badminton instead?

  16. Anonymous Coward
    Alert

    Or maybe not...?

    When you find a correlation, it's often just the discovery that your measurements reflect the assumptions inherent in your model, thereby seeming to confirm it when in fact it directed them. Or in other words:

    >"So the server was crashing when the weather was good. [ ... ] it gets hotter. Hmm.

    Servers don't like heat. Where is the server? Sitting on a bench. In front of a south facing window [ ... ]

    So, Rosanne plays tennis when the weather is good, the sun shines and it's cooking the server. The Newtonian world is back in balance, yin has a yang and effect does have a cause."

    ... thereby completely blinding you to the reality that in fact, every time Rosanne goes out to play tennis, the juniors (she being supervisor and all) decide to slack off and fire up a game of network quake on the server!

  17. Martin Proffitt

    A new lease of life for an aging processor

    I've had similar experiences with a desktop machine which I converted to a web and file server for development purposes at home.

    This particular machine (an AMD XP 3000+) was forever failing on me. First it told me that the hard drives had gone. Not having the cash to buy new drives, I was in a bit of a panic about this until for some unknown reason I decided to try replacing the IDE cable instead. Funnily enough, this solved the fact the drives had seemingly packed up. Never did work out why I decided to try changing the IDE cable but I'm glad I did. A cable is a damn sight cheaper than 2 new hard drives and the frustration of restoring 200GB of data from backup

    Around the same time as this happened the system started to shut itself down, freeze, reboot and do all kinds of peculiar things. At first it would only do these things when it was hot. I was forever cleaning it out from dust and grime (I was living in a pretty grotty hole when this started). Some times it would run for weeks or months on end without incident and others it would fail 8 or 9 times a day.

    My initial thoughts were that it was likely to be an overheating problem. Most of the time it played up was during hot or humid weather although the fans never seemed to be working unduely hard. With this in mind I began to suspect that perhaps the temperature sensor on the MB had packed up or was in the process of failing.

    This problem has been going on for about 3 years (yes, I'm still that skint I cant afford a new system yet) although since march it had been getting much much worst to the point I actually couldn't boot the system for more than an hour at a time or until I tried to open any applications.

    About 2 months ago I decided to look more closely at the matter.

    After careful thought I concluded that the system only crashed on me during hot weather or when I was placing undue load on the system. This narrowed it down to one of two things. Either memory or the CPU. Seeing as the memory had been upgraded, I wondered about the CPU.

    Rather than replace the CPU itself (again through lack of money) I thought to try throttling it back. The clock base defaults at 166MHz. I brought this down to 100MHz and it was stable over the weekend. Wanting to find out how much the system could handle, I took it up to 150MHz and the system was stable overnight but died in the morning with a bios error relating to clock frequency. I've now dropped it down to 140MHz and its been stable for just under 2 months, and has taken everything I've thrown at it so far including indexing over 110GB of audio tracks, a feat that has not been completed in a single session in nigh on 3 years.

    So now its time for a new processor although this one is currently performing quite nicely even though it is on its last legs.

    I probably could have solved this a long time ago but to be honest, until March I wasn't all that bothered. I rarely use it as a desktop itself, preferring to do most of my work from my laptop and just use it as a server, and whilst it was frustrating at times, I could live with the occasional 5 minutes downtime whilst it reset and cooled enough to boot up again.

  18. Pete Silver badge

    Nightmare on Elm St.

    Many years ago when I was "last level" support for a hardware supplier, we got a call from a disti who had installed a network for a high-profile client. It suffered from occasional, but disastrous networking problems. Back then, networking was a black art (remember those old, thick, yellow networking cables?). After piling in their own people, analysers, reflectometers, tracing software and everything else they could throw at it, they finally called us in.I sat in a small cubby room for days (on charge, natch') squeezed in with seized/evidence equipment - yes it was No. 10 Elm St. and all the disti's kit, and nothing happened. After a while, the account manager thought the problem must've somehow fixed itself and was considering declaring the problem "solved". You guessed it - massive packet loss, collisions, machines crashing. The TDR showed up which cabling segment was at fault and off we went to find out what was going on.

    It turned out that the cable was running under a staircase - with one rocky step, just by a window. When a courier had made a delivery, or a pickup, he would call in to get the next job. It turned out that mobile reception in the building was lousy and the only place his phone would work was near that particular window. There he'd stand for a couple of minutes, squashing the 10 base-2 cable and causing networking ructions. While he was not the only person to use that staircase, no-one else loitered on that particular step for any length of time, so the problems from people passing by were too small to notice.

  19. Pooper Scooper
    Paris Hilton

    It took you that long?

    How long have you been in the industry?

    Paris, because you seem to have little more IT clue than she does.

  20. Anonymous Coward
    Anonymous Coward

    Heard a better one

    Can't give any specifics, but where I used to work there was a story going around about a network connection failing every time the toilet was flushed.

    Dodgy wiring to the water pump.

  21. Don Bannister

    Solving another intermittent problem

    When I was at Uni studying Engineering some years ago, we had a Professor who did a bit of external consultancy. He would occasionally tell us about some of the jobs, and this one was my favourite.

    He was called to a paper mill to try and find out why just every now and then the paper was coming out with uneven thickness and, at considerable cost, had to be thrown away.

    He looked around at the operation and asked them a few questions. He then said that he could fix it and named (by his own admission !) a hefty price. They were so keen to get it resolved they went for it.

    He said "See that window up there - put some blinds on it". He'd worked out that when the sun was in a certain position it was falling on just part of the mill rollers and they were expanding with the heat !

    I do hope there are still guys like him teaching ....

  22. QuietLeni
    Alien

    Which reminds me of this one...

    GM had a similar problem a number of years ago. Have a look here:

    http://sabbah.biz/mt/archives/2006/01/23/car-allergic-to-vanilla-ice-cream/

    Regards,

    QuietLeni

  23. Anonymous Coward
    Boffin

    laser links failing in the early morning

    Long time ago a laser link between two sites kept failing every morning near sunrise. The lasers though weren't point/facing towards the sun. network engineers had to stake out the roofs of the building and watch for sunrise to see what was going on..

    Turns out flocks of birds would take off en-mass during sunrise obscuring both sets of lasers!.

  24. David Perry
    Flame

    Another corker

    Friend of mine's uni had a building that was joined up to main hub building via a line of site microwave link. Same time every day, for same length of time (a couple of hours or something) the link goes down. After several months of trying all sorts of things they did a path analysis on the beam - a tree that had overgrown thanks to the council was being pushed into the beam's path by the dew on its leaves weighing its branches down first thing in the morning then as the day warmed up and dried up the dew the branches and leaves would rise back up again and stop blocking the beam! A stern talking to the council from a uni bigwig later, the tree was pruned heavily.

    Flames cos of council stupidity interrupting students dossing *cough* learning sorry online.

  25. The Mighty Spang

    never trust control panels

    One place I worked, room full of Vax's (ooo 18 years ago). One day im round the back of the big one, feels a bit warm. i tell the sysadmin, he wanders over to the air con control panel, no error lights, everythings ok. "yes xxxx tends to get a bit warm, its a big machine"

    first really hot day of summer, mid morning i cant log into any of the vaxes. helpdesk reports come in saying the same. i go over to the other building to see whats going on, only to find the loading doors wide open and the sysadmin powering down the (19"/340mb) hard drives.

    the 5 machines log printers were going mental as being in a cluster everything got reported across the whole suite. one goes "I've lost connection to A!" everything else going "B has lost connection to A!" as well as their own problems.

    what had happened is 2 out of the 3 aircon units had failed but due to a fault the fail lights weren't working on the panel. 1st hot day, the last working one gives up the ghost, machines overhead and its goodbye productivity for 1000 people.

  26. Tim J

    @David Perry

    I can see why pruning trees so they don't collide with double decker buses (for example) is the council's job, but I can hardly see how it is the council's job to ensure that line of sight RF links don't get obstructed by trees. The council isn't the stupid entity in your story, rather it's the university. That is, if your story isn't apocryphal...

  27. Dave

    A Bit Harsh

    Come on, guys, we all have to learn some of them painfully. Netware servers are old enough that it was probably many years ago when overheating wasn't so much of a problem, especially in Scotland. I've had occasion to impress the management a few times when a problem appeared during a customer demo (actually a spurious signal on a bit of radar test kit) which I solved by turning off the nearby monitor. All of a sudden the 18kHz spurious disappeared. What they didn't know is that a few months before that, I'd spent most of a day trying to find out why one of my test circuits, that was working happily the day before, seemed to be oscillating at about 21kHz. Then near the end of the day someone who'd been using the department computer (in the days when EGA was the bees knees and an IBM AT cost seven grand) shut it down, the monitor went off and my problem disappeared.

    Some problems are only obvious now because of hard, painful experience or, if you're lucky, a good tale of woe from a colleague down the pub who had the experience.

  28. James Anderson

    re: eh

    "So the server was randomly crashing and it took you how long to consider it might just be overheating?"

    It was in Scotland! Trust me overheating is not the first thing you would think of!

  29. Anonymous Coward
    Anonymous Coward

    @Martin Proffitt

    >I was forever cleaning it out from dust and grime (I was living in a pretty grotty hole when this started).

    And

    >After careful thought I concluded that the system only crashed on me during hot weather or when I was placing undue load on the system. This narrowed it down to one of two things. Either memory or the CPU. Seeing as the memory had been upgraded, I wondered about the CPU.

    PSU, either dodgy or needs cleaning, in a dusty environment probably the latter.

    Cash only please

  30. Anonymous Coward
    Coat

    @AC

    "Often it would run for the whole morning, then switch itself off at lunch. Sometimes it would switch itself off after a couple of hours. Sometimes (as far as I knew) it would work fine all day."

    Was that the projector or the teacher?

    Mine's the gown and mortar board.

  31. Anonymous Coward
    Alert

    More please ... its amusing

    Good article and comments. I have nothing to add to these, the bloody things should just damn well work. Like a car ... innit ? If my car threw a "I can't let you do that Dave" moment, you just get the hammer out.

  32. Anonymous Coward
    Anonymous Coward

    <no title>

    A stern talking to a council actually had an effect besides being passed to someone else time and again? Wonders will never cease. Must have finally spoken directly to the council's tree pruning chap/chapess.

  33. Anonymous Coward
    Anonymous Coward

    Networking

    I remember a network hub that was run by our engineers (before standardisation and telling them It kit was our role only) that would shut down at "random".

    We eventually found it connected up to a power switch that they had taken off the same line as the nearest hand dryer.

    When the senior engineers used that one rather than the usual toilet ones it would spike the power and kill the hub.

  34. Anonymous Coward
    Unhappy

    @The lesson here is:

    Nope, the lesson here is that someone needs to get better sys admins. Nothing is random and when it appears so, heat related issues (usually bloody CPU fans if present) are often the culprit, closely followed by dodgy PSUs. All pretty basic stuff I'm afraid :(

  35. Colin Millar
    Stop

    @David Perry

    Er - shouldn't you blame the dumbass who positioned a line of sight without proper clearance?

  36. Anonymous Coward
    Coat

    @Martin Proffitt

    I am **so** glad I wasted 5 minutes reading about how an old processor needed to be underclocked to stop it crashing. Thanks!

    Please take your coat and leave.

    **** mumblemumble****

    Oh! You haven't got one? Well then no reason to hang around then, bye!

  37. Anonymous Coward
    Anonymous Coward

    Three biggest causes of failing systems

    Dodgy wiring.

    Faulty Power Supply.

    Heat.

    Check those first and 9/10 you'll fix the problem.

  38. David Haworth
    Boffin

    Vacuum cleaners and processors that are afraid of the dark

    Had a PC installed at a customer's site once (a prototype system, which is why we developers were looking after it) that started rebooting for no apparent reason at night. But not every night. Turned out that the cleaning staff had decided that the power outlet the PC was plugged into was more convenient than the one just outside the door that they were supposed to use, and simply unplugged the machine when they wanted to clean the floor.

    An then there was the mainboard that was afraid of the dark. The PC failed one day - just wouldn't start, no lights, nothing. Onto the bench, case off - works fine. Reassemble - won't start. After a couple of cycles of this (making sure that no connectors were getting disturbed during reassembly) we decided to reassamble under power to see at what point it failed. Simply putting the cover on caused it to fail. Lifting the back of the cover to let a bit of light in - starts working again. One of the guys came to the conclusion that the machine was afraid of the dark ;-) ... Turned out that an LED in the front panel had got one of its legs bent at some stage, and the insulation had gradually chafed away. Unfortunately that leg of the LED carried the 5V rail of the mainboard and a short to the case took out the whole system, but the short was only present when the cover was fully pushed home.

  39. TeeCee Gold badge

    I recall....

    ..doing an installation in a quarry of an IBM cluster controller with a few screens 'n printers linked to a S/36 up the way over a BT leased line with BT sync modems.

    Couldn't get the bloody thing to work at all. To add insult to injury, every time I reported the line faulty to BT, they insisted that they'd tested it and it was OK. The only evidence of something being up was a set of cable clips running up both sides of a dividing wall and ending at a hole that didn't go through the (rather thick) wall.

    Eventually, I managed to get BT to get their engineer at the Exchange to call me directly while I was on-site. He was seriously pissed off 'cos he'd tested the same line repeatedly for the thick end of a month now and he normally only worked nights (big clue here). "It's up and fine, I can see carrier from both ends", he says. "So it is" say I. A couple of moments later, "It's bloody down now", I say. "F*** me! So it is." he exclaims. This up 'n down process continues for a while and he agrees to send an engineer on site.

    The clue was the cable clips. That was where the original installing engineer had found that his drill wouldn't reach through the wall (not even half-way through). He'd then noticed a phone in the back office where the modem was required and "borrowed" a couple of free pairs off the existing cable. One of said pairs was a tad dodgy and every time a truck went over the weighbridge outside, the resulting vibrations caused an intermittant short and the line would drop.

  40. Mr Larrington
    Paris Hilton

    J Random Crasho

    Back when dinosaurs strolled through WC2, I was a BOFH on a site with a PDP 11/44, which would fall over with depressing frequency. Out would come the Field Circus bods, who would yank the machine out of its home in the rack and prod it, and poke ,it, and leave it hanging out of the rack for twenty-four hours, and nothing *ever* happened.

    "No problem" said the Field Circus bods. They closed the lid, pushed it back into the rack and went down the pub. Half an hour later the wretched thing would crash again.

    After roughly four months of this, with the WP department threatening to quit en masse, a Field Circus bloke slightly sharper than average figured that it *only* crashed when the lid was down and the machine was ensconced in the rack.

    Short circuit.

    He wedged a matchstick between a pair of neighbouring boards and the thing ran happily for the rest of my time working there.

  41. John
    Flame

    Only on hot Days

    I had one of these "at random problems" once. And it too only happened on hot days. But only if a certain member of staff was not at work. An apple talk network between three Macs drifted in and out of connection. I was up there staring at the screen and sure enough watched the macs appear and disappear on the network. As I was watching it thinking how could this be, I leant back and noticed an oscilating fan moving at the same speed that the computeres were blinking on and off. The fan was on the other side of the desk to where an employee usually sat if he was at work. He wasn't that day! Aha I noticed that a plastic wall plate holding the network socket was lifting slowly when the fan blew air at it un obstructed by the missing employee. I redid the didgy network connection and put the wall plate back on the wall and voila!

    flames - because it's only ever on a hot day!

  42. Anonymous Coward
    Boffin

    Had a similar one with a monitor

    Many years ago, I was installing PC-based EPOS systems in various shops in Ireland. One customer in Dublin called saying that their monitor would "go blank" for an hour during their busy lunchtime period, but only on random days. Simple things like "turn it all off & back on again" had no effect. As I was working nearby that week, I asked them to call me next time it happened. Sure enough, they called a day or two later & I rushed over to witness the strange phenomenon for myself.

    It turned out that their description of the screen "going blank" wasn't entirely what I was expecting. More accurately, when the sun was shining, it would bounce off the mirrored windows of a nearby office block and overcome the screen's ability to compete with direct sunlight. As the sun moved, the light would move off the screen & presto, it could be seen again. Re-positioning the counter solved the problem...

  43. GrahamT
    Happy

    My random error story

    Back in the 80's I had to fly out to Hong Kong (yeah, life's a bitch) to sort out a problem on a network monitoring system we had installed. It too would crash "randomly", but this was in a huge air-conditioned computer suite on a UPS. We knew it wasn't really random as it only happened at night or early in the morning. As the computer room was freezing cold, no one wanted to sit up all night waiting for it to happen. Several days of scouring logs didn't help as each time it was a different bit of code executing when it crashed.

    However putting a mains moniter on the supply showed that there were some mains spikes about the time of the crash. The Operations Manager said it was impossible because they had a fantastic UPS that stopped all that sort of thing.

    Eventually we decided that one of the PFYs on night shift would sit shivering by the machine and wait for it to crash and see whatever else was happening at the time. Got it first night: at about 5 am, the cleaner came in, plugged her old Chinese hoover into a wall socket, switched on - Crash!

    She was plugging into a UPS fed socket, and because the socket was near our server, we were getting the full mains spike, and the UPS was designed to stop spikes from outside coming in, not internally generated ones. I guess UPS's and server power supplies are better now at supressing mains spikes, but this was over 20 years ago.

    No one had thought of her because she was one of the army of invisible people that work when the rest of us are asleep.

    Still I got to sight-see in Hong Kong and eat lots of great Chinese food, the cleaner got a new hoover and instructions about which sockets not to use, and the crashes stopped, so everyone was happy.

  44. Strappy

    It sounds like an urban legend...

    ...but it really happened.

    Way back, the company I worked for had an NCR mini-tower server in the main office of a local theme park. We'd get regular weekend call-outs from staff based in other offices because the database had crashed. The server was fine, sitting there waiting for logins but the database process had terminated. Easy enough to restart the database but it was annoying as the software (Progress) was usually pretty stable.

    Didn't take long to check the server logs and find that it usually went down on the Friday evening so I asked one of guys in the main office to work late on Friday and see if anything happened.

    Six o'clock and the cleaning lady came in and asked if he minded her doing the office while he was there. "No", he replied, "go ahead". So first thing she does is unplug the server and plug her hoover into the socket.

    Problem solved for the cost of a sticker with "DO NOT UNPLUG" written on it.

  45. Anonymous Coward
    Flame

    i got a stupid one for you

    5 or so years ago in my last year at university i was trying to compile some java code in a class and it wouldn't compile didn't show any errors just a run time problem. i'd compiled the code before and it should of worked fine. i called my teacher over and we spent awhile playing with it but it didn't work. so i compiled it on some one else machine showed him got my makes. but being engineers most of the class spent the next hour sitting in class tring to figure out why this one machine wouldn't compile my code.

    the machines are reimaged weekly and the machine should have been identical to it's working neighbours.

    i later found out why turning up to lectuer a day later early the teacher told of a grad student having a fit apparently he'd been "issued" the machine i'd been working on for his software programming masters project. and he'd loaded his whole project on to the machine and left it there.

    his project an enhanced version of the compiler the university uses( was actually built by a student as a project) with it's own library's

    he'd set it to use the same commands as the old compiler. the problem for him was my lecturer had ask for the machine to be rebuild as the compiler didn't work .

    (should probable explian before some one ask why not just use the jdk compiler the compiler in question was also a dev tool it was a gui that let you model the program in uml and then add code to the model then compile and run all in one. but you had to open it from the command line)

    AC in case that students reads the reg (was popular with the final year network and sys students don't know about the programmers though)

  46. michael

    I am shure I typed

    www.theregister.co.uk

    not

    www.anoldfokeshome.org

    must be wed morning

  47. Anonymous Coward
    Alert

    @AC - i got a stupid one for you

    Are you sure you finished your university course? I thought someone who took, or at least seems to have taken, a computer science degree should know how to use punctuation and the shift key. The fact I had to parse the text two or three times to understand it means it clearly isn't any good!

  48. Peter Gathercole Silver badge
    Thumb Up

    Random number generator

    Come on all of you. All you need is a Bambleweenie 57 sub-meson brain attached to an atomic vector plotter suspended in a strong brownian motion producer, say, a really hot cup of tea! (RIP Douglas)

    I had an expererience with an educational computer-controlled robot arm that used IR sensors to make optical shaft encoders for the motors (it was a really good design of arm that did not use stepper motors as was the rage at the time, but proper electric motors, so was much faster and more impressive, and with six seperate independent movements). It worked really well, but unfortunately, the IR emmitter/detectors were covered in translucent plastic, which when used in direct sunlight caused ALL of the active motors to run to the end-stops of the respective movement. The whole arm contorted, and dumped itself off the bench, and led to red faces and a difficult-to-justify repair bill!.

  49. Jamie Kephalas

    @ Pooper Scooper

    ditto.

  50. GrahamT
    Happy

    @michael

    At least us old "fokes" can spell folk's!

  51. Anonymous Coward
    Boffin

    Experience is directly related to the amount of equipment ruined!

    Now I've been in the business for a relatively short 7 years, and have seen all of the above situations.

    Servers dying in the heat

    Cleaners plugging into UPS circuits

    Dodgy wiring

    Everything

    Best one was one of mine... a hotel being renovated, electricians had butchered a telephone cable... I went in and replaced a 3m length of cable (at suitable cost). 2 weeks later we had a call from the same place - the phone was going dead regularly. I went out and investigated and it would stop working as it rang, and would then be completely dead for about 10 minutes... Eventually worked out that there was a nail through the cable almost touching the wires, so there was a micro arc with ringing voltage, which would be maintained by normal line voltage!

    Of course, the carpet and everything was back down by this stage so I had to put a new piece of wire around the skirting in the room... customer not too happy but also didn't want the inconvenience of pulling up the floor again!

    And I'll be amazed if anyone has actually read this far :)

  52. Anonymous Coward
    Happy

    A story from the www.anoldfokeshome.org files

    Mid 80's A Naval training Establishment on the south coast had started it's own ADP (Automatic Data Processing) section. We had cutting edge equipment; Sirrius 1 / Victor 9000, 10MB HDD & twin 256K FDDs. I was the No 3. The bain of my life was a 7 user Epson computer. I was tasked to write a report on this system which had been in service for 6 months.

    Well firstly it crashed whenever someone wearing a woollen jumper walked by. As the department of about thirty people who all wore woollen jumpers this was a quite frequent occurrence.

    This was cured by wrapping the perspex front cover with aluminium foil and earthing the foil.

    Secondly, for a multi user system it was a bit of a failure. For one person to enter a record in the Db - 10 seconds; Two people 5 minutes; 3 people, 37 minutes. I didn't try to test four people simultaneously saving records.

    As I was only a qualified avionics technician, I obviously wasn't going to be trusted to open up one of these new fangled computer thingies. So I couldn't look inside the box (cabinet) for any obvious signs of trouble. My report said that the system was crap. Had always been crap and would always be crap. _AFTER_ handing in my report I found out that this pile of crap^W^W^W computer system was my bosses pride & joy. Needless to say I was crapped upon _but_ the Epson was replaced with a Crystal 68000 running PICK. When the Epson was dismantled there was carbon deposits and evidence of major arcing all around the main power distribution point. The HDD power feed came directly from this PDU and the HDD must have been suffering major power spikes all of the time. While the replacement system was technically far better than the Epson would have ever been, a huge amount of public money had been wasted because a techie was not allowed to look inside the box.

  53. Alan
    Stop

    @ Anonymous Coward

    Didn't you consider for a minute that the person who wrote that is possibly from one of our foreign call centres, or may even have some illness that prevents him/her from spelling correctly. If you really cared you'd be sympathetic to his/her needs and not be so scathing and politically incorrect...

  54. Anonymous Coward
    Flame

    More stories

    1. Some folks I knew had a phone line for a fax machine. The phone line had had trouble on and off over the ten years they had been using it. The phone company could never find the problem and kept telling them they had a bum fax machine. I got into it when they decided to get rid of the fax and use the line for a modem. I installed the modem and hooked everything up - no dial tone. Hmm. Check all my cables, then - knowing that they'd had trouble on the phone line before - checked the phone connection through the building. Connection in the wall jack OK, followed the cable through the house to a junction box outside. It looked OK too. On random impulse, I gave the wires a tug. One came loose in my hand. It had been broken about 1/4 inch inside the insulation and making intermittent contact for 10 years - the ends of the broke wire were brown with corrosion. Stripped the wire and reconnected it - modem got a dial tone, all good. TEN bloody years and the phone company hadn't found it.

    2. I started work in a new place. A couple of weeks on, I find that one of the buildings is getting network glitches - file corruption, lost exchange connections, print jobs going haywire, all manner of crap. The users tell me it's been that way for at least a couple of years. I start looking, and find that all the workstations are reporting a complete loss of network connection at the same time - not always the same time everyday, but all of them at the same time and usually in the afternoon. I call the network admin, and he checks the logs on the switches - same time that I'm seeing network drop outs, he's getting reboots on one of the switches. Happens to be the switch that the local servers connect through. I go and check, and find the UPS units on the servers are all showing powerline spikes at about the same time as the network problems. Hmmm So, I spend the next afternoon in the switch room, watching.the UPS and the switch. A boring two hours later, the AC kicks in, the UPS beeps and the lights on the switch blink, shut off, and come back on. I check, and the PCs are all showing the network loss again. Somebody had wired the AC into the same outlet the switch was on, and it didn't have a UPS. AC threw spikes, the switch threw a fit. First thing I did was get a UPS on the switch, second was to have building maintenance put in a separate line for the AC, and the third was to spend several hours cleaning the junk out of the server room that had hidden all the power connections and kept everyone from noticing how the AC had been plugged in. Turned out later that the network admin was new, too. The previous fella was clue less, and the new guy hadn't yet gotten any monitoring software setup to manage all the switches.

  55. Fatman
    Unhappy

    Cleaning crew strikes again

    Good old 'invisible' cleaning crews struck once again.

    A close friend of mine was plagued with server failures between 5 and 7 PM every weekday. The only variant was the actual time. This went on for about a week before one of the sysadmins noticed a correlation between failure, and a member of the cleaning crew.

    The sysadmin waited until the cleaning woman came into the area with the vacuum cleaner, and plugged it into the UPS. She was about to turn it on, when he shouted out "STOP"!!

    A couple of questions later, and they learn that she has been plugging into the "plug" because it was easy to get to, and the closest one was some 25 feet away. When I heard about it, I had an "idea". We spoke to the facility maintenance staff about who did their electrical repairs, and whether or not the cleaning crew provided their own equipment. The repairs were done by employees; and so were the cleaning crew. Perfect.

    I then suggested that the head electrician install some special locking plugs and receptacles for use by the cleaning crew, thus preventing them from plugging into the UPS. After that, vacuum cleaners could ONLY be plugged into specific outlets. No more server failures from clueless cleaners.

  56. Anonymous Coward
    Flame

    @@AC - i got a stupid one

    What did you expect? He SAID he was a java programmer. I think I've dealt with exactly two in my life who didn't deserve to be homeless.

  57. Anonymous Coward
    Go

    @Alan re @ Anonymous Coward

    BAAAAAHHHHH!!!!!!

  58. Mark
    Thumb Up

    @@@AC - i got a stupid one

    Dude that is f*cking excellent!

    I think everyone in my building heard me laughing at that one!!

  59. Anonymous Coward
    Anonymous Coward

    Network Issues

    Here's my silly system story.

    Many years ago, I was working on a packet switch product that plugged into a PBX. The external interfaces were all X.25 links, but internally the cards talked to each other using ethernet - big thick wire ethernet, with twenty pin connectors.

    All the early integration tests had been with a handful of cards, but the system tests called for a fully loaded configuration which meant running many cards simultaneously - double digit count, certainly. It didn't work. Well, it worked a bit - the cards in the middle of the ethernet bus could talk to all the others, but the cards at either end of the bus couldn't see cards at the other.

    The problem was the cables. They were custom cables which had emerged from the cable making unit of the company (it was a telecomms company - of course it had a cable-making unit). The cables were all the same length to make them simpler to build, but that length was loooong - two or three metres, certainly. String twenty or so of those together and you end up with an ethernet bus that was longer than the limits which that version of ethernet supported. Without reliable signal propagation, the cards would just drop off the bus.

    We got some shorter cables made, and it worked.

  60. Anonymous Coward
    Thumb Up

    A story of magic...

    http://www.catb.org/jargon/html/magic-story.html

    The toilet story as I heard it: PC would reboot every time someone flushed the toilet. Remarkable as it sounds, it was verified to be true. Turns out the problem was the home used well water, and flushing the toilet caused the well pump to kick in---which caused a temporary brownout.

  61. conunstradamus

    Did you hear the one

    about the PFY replacing a switch with a hub - in a CAD office? Shot that one was.

    Or the modem line with call waiting?

    Or troubleshooting network connectivity issues on a non-English desktop through a non-techie translator in a non-English environment - in the middle of the desert in the American southwest?

    Or the server room with regular water fire sprinklers?

    Or high (>110' F) ambient outdoor temperatures combined with a decades old, overhead copper wire plant and analog switches that rendered useless anything other than analog voice communication? A good wind in a dust storm nipped even the basic phone call.

  62. Andrew Moore

    Remember the BOFH gene???

    It was discussed here a lot a few weeks back. One of the gifts the gene gives you is the ability to feel heat and hear fans that are about to go pop.

    Back in the early days, one of my main pieces of diagnostic kit was a max/min thermometer...

  63. Anonymous Coward
    Unhappy

    @conunstradamus

    "Or the server room with regular water fire sprinklers? ... Or high (>110' F) ambient outdoor temperatures..."

    Our server room has overhead sprinklers (heat triggered) and NO AC. Temperatures in this region hit 40 degrees C in summer. (Call it 105F or so.)

    The MD refuses to pay for AC because it's too expensive. Instead we have a couple of exhaust fans, and a couple of BIG fans pushing air around the room.

    Oh, and the building hot water heater is in the server room, about a half metre from one of the racks...

    Fortunately the facility is in the basement where it's fairly cool. Less fortunately when it rains there are ground water leaks that partially flood the basement.

    (breaks down sobbing inconsolably).

    My only consolation is that I've pointed out the inherent problems with this setup repeatedly, so at least I'll get to say "I told you so" before spending a week rebuilding the mess there will be if something goes wrong...

  64. Tim J
    Coat

    Cleaners & their hoovers...

    ...certainly seem to cause a lot of problems. Two observations:

    (1) From many of the above stories it really does seem as though people never even consider the cleaners' actions as a possible cause of the problem, simply because they never ever consider the cleaners, i.e. the mere thought of the cleaners is something that simply never passes through their head. I daresay this says something quite profound about the mindset of many, that their worldview fails to accommodate the people who quietly clean up their mess up after them.

    (2) A few of the comments above do carry the tone of blaming the cleaners for their actions in unplugging servers or plugging their hoovers into the server room UPS. But are they really to be blamed if they've never been informed of what they should or shouldn't do when it comes to plugging their kit in? In many cases I doubt this has ever been communicated to them... but of course why would anyone ever think of communicating some information to people who are invisible...

    Full respect to the cleaners out there - I dare say most of them have done a lot more honest hard graft in their time than a lot of the slackers reading this here ever have.

  65. Anonymous Coward
    Dead Vulture

    Three more war stories ('cos I'm old)

    First a weather one:

    About 20 years ago I installed a Xenix system for a firm that grew plants. They had a small office in the centre of several acres of land, a large amount of which was greenhouses. Installed in summer it worked fine for several months and then started crashing. I went in many times to try to determine the problem but it seemed to go away, only to recurr some months later. While onsite, looking out of the window in the hope of inspiration I spotted the cause. The greenhouses had vents, thermostatically controlled and operated by twenty year old electric motors. In summer they would be open all day or perhaps just close overnight. In winter they remained closed, but during the middle of spring and autumn they would open and close several times a day, spiking the mains and crashing the server. A small UPS was installed and the problem was solved.

    A human factors one:

    About 15 years ago I was working for a software house maintaining their old PDP-11 products since they'd migrated the core product to Vax/VMS. Because of this I was one of the few that had access to the computer room, a large air-conditioned room in the basement with a handful of PDPs, Vaxen, HP, Nixdorf and *nix boxes. While waiting for a tape to rewind I pondered on how hot it was and looking round spotted a thermostat on the wall set to 30C. "No wonder" I thought as I reached over and swiftly turned it to 20C only to realise an ohnosecond later that from the sound of silence it was an overtemperature cutout.

    Being a quick thinking sort of person I turned it back up, dashed out of the room and across to the kitchen and wandered out all innocent, mug in hand Wally style a minute later as the sysops charged down the stairs and into the computer room. I put down the coffee and joined them. "The power dies every other day for some reason and we've no idea why" they wailed. A while later I commented on the thermostat being so high and when they told me it was a cutout I suggested it be labelled as such. They did and the problem never happened again.

    Finally, one from a friend of a friend:

    Back in the days when state-of-the-art was an IBM PC with dual floppies a small local computer firm sells an small local business a word processor. After a few weeks the floppies fail, work is lost and the computer firm has to supply a new DOS disk. Another few weeks go by and the same thing happens. This repeates a few more times with diagnostics proving fruitless and the supplier accusing the secretary of switching the computer off without removing the floppies. Finally there's an ugly scene at the golf club when the business owner collars the computer firm owner and tells him in no uncertain terms that he wants it sorted.

    The computer firm sends in the techie with instructions to sit in their office reading the paper and keeping a discreet eye on the computer. All is done by the book until she shuts down, first opening the drives, removing the floppies and carefully sticking them to the side of a filing cabinet with a couple of fridge magnets.

    Mines the old moth-eaten one.

This topic is closed for new posts.