My best diagnosis was about 15-20 years ago - i was working for a bank in Ireland. we were developing a new backend authorisation system for retail point of sale devices. They were dialling into banks of analog modems and then generating an X25 payload into the backend system for authorisation.
We had a nagging 1-2% transaction failure rate in testing which was totally unpredictable and driving me bananas.
After spending endless hours running live captures trying to capture the interrmitent failures in flight, I eventually found that the X25 packet length for the failures was different (shorter) than all the successes. Then going back form that we established that transaction only failed if the credit card number contained a particular value (card number ended with 00 or something like that).
So it was a software bug in the originating POS terminal software and nothing to do with the backend sytem we were testing. If the card number met a certain criteria then the POS terminal did not append a CRC value that was a required part of the transaction payload. Which resulted in a payload that was 4 bytes too short and rejected by the backend system.
Serious needle in haystack stuff.