Reminds me of a good one on Computer Stupidities some time ago (note, not my anecdote):
A customer logged a call that he occasionally finds his VAX 11/725 (one of the few of that model in The Netherlands) powered down when he comes in in the morning. As I was the site responsible engineer for that customer, I went over to investigate the problem. Didn't seem to be one of the usual: of course I'd read about janitors and cleaners unplugging power cords to run their vacuum cleaners or floor mops or what not. But in this case the machine was in a recess, side by side with a printer, and there was a perfectly good, unused wall socket in plain view, in the wall to the left of the recess. They'd have to stoop over the machine and unplug its power cord from the barely visible wall socket behind it to do that trick, and also plug it back in afterwards. Also, the power cord was snug; you couldn't trip the machine just by bumping into it.
But the machine did just power down occasionally, as evidenced by the console printout. No bug check or machine check, just opcom messages being printed, followed straight by the power up sequence the next morning when the customer came in and powered it up again. Timestamps showed the machine quitting early evening, between 18:00 and 19:00. If it did, that is; it didn't do it every day.
Ok, it's flaky somehow. But why that particular time? I put in a new power supply, as that'd be the most probable cause. Nope, that's not it. A couple of days later, the customer logged a repeat call, with the exact same symptom. I went on site again, exercised the machine, measured supply voltages. It ran without any sign of any problem. Looking over the possibilities, I wondered if it was an overheating or airflow condition. There's more than one sensor that can trip the machine the way it is tripped, and we hooked up a small logic probe that would show which one it actually was. And sure enough, a few days later it got tripped with an airflow problem. Now, I had already cleaned out the filters and the fans when I replaced the PSU -- pretty standard procedure to do whatever preventive maintenance you can when you go on site for a hardware call. So I couldn't imagine there would be a real airflow condition. But the sensor might have been woky, so I checked it. It was a pair of thermal sensors, one exposed to the airflow, the other not. Pretty simple. No mechanical parts that might have binded or gotten stuck. So no problem there. For good measure I replaced a power harness that showed vague signs of chafing, and I also replaced the monitoring logic.
Didn't help. The customer called once more, and sure enough the probe showed an airflow condition. Support is still on the case, and they authorize a swap unit to be brought on site, so that I can take the ailing 725 to our product repair center and go over it with a fine-toothed comb. Which I did. Stripped it down to the bare chassis, cleaned every sensor, every connector, every slot, every everything. It was the squeaky-cleanest 11/725 in the Western hemisphere that wasn't fresh out of the factory. I inspected every wire, checked every fan, and replaced anything that wasn't to my liking. It was arranged that it could sit in the PRC for a few weeks, running, with power monitoring probes hooked up. It passed without a hitch. In the meantime the replacement unit is humming along nicely too, without any problems whatsoever. Quite a bit of head-scratching happens. The temporary replacement was an 11/730, basically the same hardware in a different cabinet, so maybe that was a clue. In the meantime, a power logger had been running at the customer site, to check whether the flakiness is coming in from the main power supply. It wasn't. So, we handed back the 11/725 to its rightful owner.
And sure enough, it tripped a few days later. Yes, early evening yet again.
Running out of ideas, one of us decided to go on site every day at closing time and just sit there to see it go. And sure enough, he observed the problem right the first evening.
The cleaning crew came in. The vacuum cleaner was not the problem. The floor mop was not the problem. One of them took the waste bag from the paper shredder, tied it closed, and set it aside -- right in front of the air intake of the 11/725.