And a boss that doesn't understand what a 'cluster lock disk' means.
One node barfed, and volume was unreadable. Entire cluster shuts down.
"Allocate from new array, restore from backup"
Umm, no, give me a few minutes to look at the volume spindles and see what's happening.
"Just restore on new spindles, its faster"
Umm, no, 400+Gb, and 100Mb ethernet for the restore.
The problem with large clusters and shared volumes and cluster lock disks is that some idiot somewhere will *eventually* use a volume for the wrong thing just because they can't see that it is in use elsewhere in the cluster. Just like the poor bugger that had added the cluster lock disk from the dead node to the DB tempspace elsewhere in the cluster. Because, you see the idiot yelling at us to "restore from backup" had yelled at him to "Just add that unused #$%@#$% disk to the tempspace NOW" 25 minutes prior to the one node crashing. I was on the PM for that. With emails, and since we were using the in house conference call tools, the audio recordings.
I've worked for some truly dickish humans. And I've worked for some amazing bosses. I've learned how to speak the right language.
And I always cover my ass.