Full tests are good
I did most of the technical design for the backup/recovery and DRM of UNIX systems at a UK Regional Electricity System back in the late '90s.
The design revolved around having a structured backup system based around an incremental forever server and a tape library.
One of the requirements of getting the operating license for the 1998 deregulated electricity market in the UK was passing a real disaster recovery test. A representative of the regulator turned up on a known day, and said "Restore enough of your environment to perform a transaction of type X". The exact transaction was not known in advance.
We had to get the required replacement hardware from the recovery company, put it on the floor, and then follow the complete process to recover all the systems from bare metal up. This included all of the required infrastructure necessary to perform the restore.
First, rebuild your backup server from an offsite OS backup and tape storage pool, and reconstruct the network (if necessary). Then rebuild your network install server using an OS backup and data stored in the backup server. Then rebuild the OS on all the required servers from the network install server and data from the backup server. All restores on the servers had to be consisntent for a known point-in-time to be usable. Then run tests, and the requested transaction.
And where possible, do this using people other than the people who designed the backup process, from only the documentation that was stored offsite with backups, using hardware that was very different from the original systems (same system family, but that was all).
Apart from one (almost catastrophic) error in rebuilding the backup server (the install admin account for the storage server solution had been disabled after the initial install), for which the inspector was informed, but allowed us to fix and continue because we demonstrated that we could make a permanent change that permanently overcame the problem while he was there, the process worked from beginning to end. Much running around with tapes (the kit from the DR company did not have a tape library large enough!), and a frantic 2 days (the time limit to restore the systems), but was good fun and quite gratifying to see the hard work pay off. I would recommend that every system administratror gos through a similar operation at least once in their career.
We were informed afterwards that we were the only REC in the country to pass the test first time, even with my little faux pas!
When supply and distribution businesses split, we used the DR plan to split the systems, so having such good plans is not always only used in disasters, and I've since done similar tests at other companies.