Not a failure of testing - a failure of change enablement
To be fair, this is not really the fault of the automated change system not being tested properly - though that is probably one contributing factor.
It's really a failure of the change remediation not being tested properly.
If you want to move quickly, and accept that failure is a possibility - a luxury not afforded those running nuclear power stations - then you really do need to make sure you have a very effective roll-back solution, that is bulletproof.
That it took them six hours, and a site-visit, to roll back the faulty configuration change, establishes that it was not properly designed and tested.
The moral of the story is that, if you're modifying BGP automatically, you need, first, to design the safety-net, by writing, and testing, code that will reset it all to its last known working state -- reliably, every time.
To fail-fast, you must be able to reset-fast.