A few years ago a company I worked at engaged a consultant to carry out a health check on a UNIX platform. The platform had virtual instances which had LAN and SAN provisioned via redundant server instances on each box. The vendor consultants ran a script to health check each of the IO instances and noted as he ran the check on each server the SSH session disconnected. He carried on to run the script on all instances and then said these will run for a while and left for the night.
The reason the SSH session disconnected was a bug in the script that crashed the instances. This cut all the IO to the running guests in the platform. The guests suddenly lost all their LAN and storage and the net result was every production system went down with prejudice. The calls from customers were not long in starting and the escalation was immediate.
The situation was made worse by the fact the lead sysadmin was away on holiday and a call was made to see if he could come back in to help sort the mess out. He did turn up and sorted out the mess. What was less clear was why the vendor consultant kept running these scripts when the first attempt caused his SSH session to terminate, normal people would think that was a little odd.