Reply to post: The "leave me alone and I'll fix it" scenario

Hell desk to user: 'I know you're wrong. I wrote the software. And the protocol it runs on'

DougS Silver badge

The "leave me alone and I'll fix it" scenario

I've done this a few times, where I knew that explaining what I figured out would 1) take too long and 2) have people wanting me to prove some logical leaps I made.

The most memorable was back in 2000 when I was consulting on a SAP migration for a Fortune 500 company most everyone will have heard of. It was being migrated from Texas to Toronto, and it had just been brought live over Labor Day weekend - getting downtime for this was almost impossible so it had to succeed. It was as super complicated process that involved among other things a courier on a private jet taking the past week's redo logs from Texas to Toronto. I wasn't involved in this part of the process, so I was taking some much needed vacation.

So on Labor Day I was doing some boating/drinking with some friends but had brought my laptop on the trip (to the condo, not the boat) 'just in case'. In the middle of the afternoon I get a call from the guy in charge of the project, but cell service on the lake isn't that great so I just ignored it at the time, figuring it was probably some question that could wait or someone else could answer. When we got back that evening and I was walking up from the dock he called again and was in a panic - apparently all hell had broken loose and SAP was down and the DB corrupted.

Once I called into the bridge and dialed in my laptop (good thing my friend's condo had a phone line) I learned none of the filesystems containing the DB could be mounted (this was before Oracle commonly used raw volumes) and everyone was running around like chickens with their heads cut off because thousands of users were going to come in tomorrow morning and expect to use the system. While my friends went out to the bars and had a good time I stayed behind and began trying to troubleshoot with the others.

After some back and forth for a while I eventually got an idea and did a little digging, and figured out what had happened. Its been so long I actually don't remember the details of what the problem was, something to do with the pairing relationships between the primary copies BCV copies (there were like 4 copies of each volume) on the Symmetrix scrambled the Veritas disk group information so none of the volumes could be imported. The upshot was that the data was still there, nothing had been lost.

I told everyone on the call (there were probably at least 50 by this point) that I knew what happened and I could fix it, I just needed some time to concentrate. Cue a half dozen people wanting me to explain it, and me insisting that it would be easier for me to just try to fix it, and promising that what I did wouldn't change any data on the drives so if it didn't work we'd be no worse off I just had to make sure no one else was going to touch the storage in the meantime. There were like 500 primary volumes, so it would take forever to fix by hand, but luckily I was able to determine exactly how it got messed, and I able to write a script to reverse the process. Once I ran that I was able to import all the disk groups and mount the filesystems, and shared the good news. The Basis lead then checked things out, verified all was good, started it up and everything worked. All those users were able to login the next day, none the wiser.

They wanted me to explain further but it was like 2am by this point and my friends were back so it was too noisy, so I just told them I'd explain it in a couple days but told them what NOT to do that created the situation in the first place so there wouldn't be a repeat! I wrote about five pages to include in the RCA as to what happened and how I fixed it, spent countless hours in meetings explaining it, and but I think only about three people really understood it...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019