Basically Amazon, Apple, Google, Microsoft are using 10 to 20 year old technology (envisaged 30 years ago) made accessible via gadgets.
It's great to read about some progress.
There is more work to do in terms of parsing of phrases and context so as to not just have fairly dumb (but speaker independent) Speech to text on an existing search engine. However that starts to move into the edge of real AI.
Compare using Google translate (essentially Rosetta stone + brute force), to your OWN language when it's a subject you are familiar with compared to trying to explain something to another language user (especially NOT English) who doesn't understand the subject. You'll realise that current speech based real time translation (needs voice recognition and then text translate) is mostly hype.
There is decent speech synthesis, but 2010 Kindle DXG is barely better than 1980s, and decent speech synthesis, like recognition, ultimately needs phrase / sentence parsing though for a different reason (intonation and timing which is not in written dialogue or narration and lead pipe vs lead on a dog, or polish wax vs Polish person etc). Spoken languages are not identical to written, certainly in English, where even written dialogue is nothing like real speech.
Icon of someone trying to understand.