High pitch, accents, background noise -
If all of those make it difficult for ASR systems to do their job, it might be they‘re no better nor worse than mere humans. I sometimes find it difficult to understand people talking eith an accent - remember a London cabbie saying what sounded like „o‘rite mite“ when apparently they meant „alright mate“. It can be hard tounderstsnd, no doubt.
Obviously with the accents a possible solution will be to train the ASR with all sorts of them, and possibly add some better heuristics for short phrases. With voice pitch, a higher frequency should carry fewer overtones in the audible spectrum, so maybe the signal quality really is worse. The test case for a fair comparison would be high-pitched male voices. Say boys. And f it‘s so, the AI will simply have to increase its effort with the pitch of the voice. As we do.