Reply to post: Rolling update causes outage

Kubernetes bug ate my banking app! How code flaw crashed Brit upstart

Walter Bishop Silver badge

Rolling update causes outage

"apiserver timeouts after rolling-update of etcd cluster"

Back in the day of steam driven computers, we were taught to never ever update a live system. First test on a test rig before rolling out to the live system or at least have a working roll-back procedure in place. One that won't fail because it can't find its configuration data, because the rolling update borked communications to the server.

"Beattie posted an analysis of the incident and lay the blame on Kubernetes"

NO NO NO, the blame lies with whoever at Monzo rolled-out the update without first verifying it, or at least have a working roll-back procedure in place ..

"To restore service, they turned to an updated version of linkerd being tested in the company's staging environment."

Is this the same 'staging environment' you didn't bother to test the rolling update on in the first place?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019