"16 bit, 44.1kHz isn't an arbitrary playback standard. It's chosen to match the capabilities of the human ear - the complete capabilities of the perfect human ear."
Nope. It was chosen to match the vertical-blanking insertion period used by 60Hz U-Matic videotape equipment. In the late 1970s, it was the only affordable recording medium with the bandwidth to hold a CD master, so it dictated the sampling rate. ( https://cardinalpeak.com/blog/why-do-cds-use-a-sampling-rate-of-44-1-khz/ )
So, that gives a maximum reproducible frequency of 20.05 kHz. While it's true that few humans can sense audio signals over 20kHz, there are many steps in the chain of reproduction that make 44.1kHz not quite good enough to reproduce the full audio spectrum, especially if you wish to provide a stereo signal.
First off, Before you can get any kind of digital signal, you need to encode it. That means sampling. However, before you sample a signal, you need to remove any signal components whose frequency is too high for you to sample. If you don't do this, you get aliasing, and a worthless digital input (https://en.wikipedia.org/wiki/Aliasing). Thing is, the analogue filters you need to do this removal of un-sampleable signals do not have a perfect on/off response - in effect, if you want a filter that will pass frequencies of, say, 16 kHz, you may also have to allow allowing frequencies as high as 25 kHz through too, because they're still within the tail-end of the filter's "pass band". You can make that cutoff sharper, but it can create "ripples" in your pass-band, and/or allow higher frequencies through again (analogue filter design is a special kind of hell...). But, if you were to raise your sampling rate to 48kHz, then you've got at least 4kHz of headroom above the highest frequency you need to preserve.
Down-converting a multiple of 48kHz to 44.1 kHz is possible, but if it's not done correctly (and it often isn't), it introduces similar artefacts to the aliasing problems during sampling.
The second reason for higher rates is for better preservation of signal phase. The human auditory system uses phase differences between higher-frequency signals to determine spatial positioning of sound source, but phase and amplitude interfere with each other in digital sampling systems as you approach the maximum permitted signal frequency. The extreme case is that a signal with a frequency of half your sampling frequency will not register at all if it is 90 degrees out of phase with the sampling signal (the sampling points would fall on the zero-crossings of the input, so you get 0,0,0,0,0... as your output). With mono, phase isn't usually an issue, which is why most sampling tutorials gloss over it; with stereo, phase accuracy is very important.
The third reason is that most modern replay equipment processes its signal before converting it back to analogue. Equalisation, driver response correction (as used in "direct digital" speakers and headphones), room parameters, delay, noise cancellation and dynamic compression all happen on the digital signal, but all take their toll on the output. If you start with more information, even if that information is not audible, the accumulated errors from DSP will still be in the inaudible part of your signal (you don't get the same benefit by simply "upsampling" to 192KHz/24-bit before processing, because upsampling itself cannot add information; in fact, it removes it).
Finally, your hearing isn't linear, but PCM audio is. 16 bits is about 100 dB of dynamic range, but your hearing has about 130 dB of dynamic range, albeit with a non-linear response. You could use non-linear PCM to extend the same 16 bits over a wider range of amplitudes, but that means non-linear DACs, which are much harder to make than linear ones (and it can increase audible distortion where high-amplitude, but very low frequency, tones are overlaid with higher frequency tones - as often occurs in music). It's easier to just use more bits, and capture the full dynamic range of human hearing.
With lossless coding, high bitrate audio doesn't take very much more space than 44.1/16 (mainly because of the signal is only 0-24 kHz), and as it makes improved reproduction much simpler to implement, there are plenty of reasons to prefer it to 44.1.