Re: So what was actually wrong?
More than that, you're thinking of transactions in a universe that functions differently to the one we live in. This is a well-known long-know problem, there is no way to know a message has not arrved at its destination. It's variously called The Two Generals Problem, or Caesar's Generals Problem. yes, understanding it is that old.
I wrote up this almost four years ago. While it is likely to not be the exact problem with Horizon, it is a near enough description to explain the problems:
In my understanding, what it was was:
Correct functioning:
PO sends ‘credit £x’
HQ receives ‘credit £x’
HQ credits account
HQ sends ‘acknowledge credit £x’
PO receives ‘acknowledge credit £x’
PO removes item from queue
Failed functioning:
PO sends ‘credit £x’
HQ receives ‘credit £x’
HQ credits account
HQ sends ‘acknowledge credit £x’
PO /doesn’t/ receive acknowledge
PO retries
PO sends ‘credit £x’
HQ receives ‘credit £x’
HQ credits account
HQ sends ‘acknowledge credit £x’
PO receives ‘acknowledge credit £x’
PO removes item from queue
PO now has one ‘credit £x’ recorded, but HQ has two ‘credit £x’ recorded.
It’s a classic network transaction confirmation problem. In fact, a Networking 001 problem. It’s not even undergraduate level concepts. How do you know where a failed message has failed? Has the message to HQ failed, or has the acknowledge failed? The solution is to either use a sequence chain, or *not* transfer ‘change’ messages, but transfer ‘updated balance’ messages:
PO sends ‘account balance is £x’
HQ receives ‘account balance is £x’
HQ updates account
HQ sends ‘acknowledge account balance is £x’
PO /doesn’t/ receive acknowledge
PO retries
PO sends ‘account balance is £x’
HQ receives ‘account balance is £x’
HQ updates account
HQ sends ‘acknowledge account balance is £x’
PO receives ‘acknowledge balance is £x’
PO removes item from queue
This results in the PO recording a balance update to £x and HQ recording a balance update to £x.
Of course, this has it’s own problems of multiple access/single resource (what happens if somebody else does a ‘balance is X’ between your retries) but is solid if you have exclusive access during the whole transaction. To do that you’d wrap it in ‘open for exclusive access’/’close for exclusive access’.
Tom Scott described it quite well here where it happened with ordering pizzas.