A note on message passing.
In IBM MF land message queues (msgq in the AS400 command language) are effectively named pipes which can link processes. They can expand if the "writer" is producing a lot more data than the "reader" can accommodate at any one time. IIRC they can also do character set translation (EG EBCDIC to ASCII) which is handy give a lot of stuff is not EBCDIC as standard.
BTW there is also an MS version of MQ series.
I can't recall if the reader dies wheather that can pause the writer process or if the queue just keeps getting bigger (the simple programming option is the MQ just deals with it. No special case handling required).
I can see the joker in the pack being different processes dying at different times given different queues holding mixed amounts of good and bad data that are not synchonised, making it very difficult to decide which entries (BTW they are called "messages" but the definition of "message" is very flexible) to discard.
However these issues are completely predictable and MF devs and ops have been dealing with them for decades. BA should definitely have some tools to manage this and some procedures in the Ops manual to use them.
As for configuration I find it very hard to believe that in 2017 a business this big does not have a set of daemons checking all its network hardware and recording their actual (working) configurations.
This is also one of those moments when labeling all those cables and power plugs with what they power and what they should be plugged into turns out to be quite a good idea.
So much HA and DR is not in the moment. It's in the months of prep before the moment.