GitHub lost a network link for 43 seconds, went TITSUP for a day

Phil O'Sophical Silver badge

Re: re: Why did GitHub take a day to resync

And this is why you shouldn't have automatic failover in disaster recovery situations. For local HA, with redundant equipment, when a disk, switch or server goes down automatic is fine. For long-distance DR it's well-nigh impossible for an automated system to have a full view of what happened (recoverable network outage versus primary site disappearing in a ball of nuclear fire, for example). With a person in the loop they could have looked at the situation, perhaps called an admin on the other coast, and said "oh, it's just a transient network outage, best solution is wait until it comes back.". Automate the changeover by all means, once the decision is taken, but don't make it automatic.

