Does no-one test anything any more? I don't understand how exponential consumption of finite resources can even happen? Did the code not have any form of load limiting?
When I was in mainframe land one of our 6-monthly jobs used to be to test the disaster recovery plan. We took the weekly tapes to the place that would hire us an IBM in a trailer, and timed how long it took to configure the hired gear (following an already prepared checklist), load the tapes, and run 1000 terminal-based transactions. Part of my job was to measure the current drain from the generator while we did that, and the fuel consumption.
Any new system installed, hardware or software, was given a resilience test. We would turn bits of it off and make a timed log of what failures occurred when, and what the consequences were. At weekends during the year we would repeat those failures on the production system and ensure there were no significant deviations in the timing or propagation. We would look for failure to centrally diagnose or report failures, and any such 'hole' had to be demonstrably fixed within a month. (even if this sometimes meant another red lamp over the operator's console and a long cable)
Every single bit of kit deployed had a failure policy, ranging from warranty and maintenance contracts to a minimum of two ways to continue production without it. We used to test the users, too, to make sure they knew how to find the failure policy and use the alternatives.
Managers have to lean that difficult != unnecessary, and that expensive != unnecessary.