Reply to post: Re: re: Why did GitHub take a day to resync

GitHub lost a network link for 43 seconds, went TITSUP for a day

Phil O'Sophical Silver badge

Re: re: Why did GitHub take a day to resync

And this is why you shouldn't have automatic failover in disaster recovery situations. For local HA, with redundant equipment, when a disk, switch or server goes down automatic is fine. For long-distance DR it's well-nigh impossible for an automated system to have a full view of what happened (recoverable network outage versus primary site disappearing in a ball of nuclear fire, for example). With a person in the loop they could have looked at the situation, perhaps called an admin on the other coast, and said "oh, it's just a transient network outage, best solution is wait until it comes back.". Automate the changeover by all means, once the decision is taken, but don't make it automatic.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019