Microsoft will phone home with lots of data? I'm shocked! Shocked, I tell you! I would never have guessed Microsoft would do something like this!
Posted on July 29, 2015
If you use Skype's AI-powered real-time translator, brief recordings of your calls may be passed to human contractors, who are expected to listen in and correct the software's translations to improve it. That means 10-second or so snippets of your sweet nothings, mundane details of life, personal information, family arguments …
... to a friend of mine who is the proud father of two teenage boys. He is not a techie. The boys play Xbox games.
Disclaimer: this does not affect me directly. I do not use Windows 10, or Xbox. I use Windows 7, on one laptop only, and that's only because I have no choice: professional grade photo processing software is only available for Windows or for Mac. I chose not to pay the Cupertino Tax. Windows Update is disabled, and I spent a lot of time removing the slurp-happy Updates from Windows 7. Yes, these exist.
Said father went to check on the Microsoft Voice Data Privacy Dashboard.
To his great shock and surprise, he discovered numerous voice recordings of his two sons, stored on Microsoft's servers. He does not recall ever giving Microsoft permission to record his sons' games conversations. Nor does he recall ever being asked - by Microsoft - if it would be OK to record.
There is, apparently, a way of disabling this
blatant invasion of privacy feature. But there is no conclusive evidence that it works. Microsoft uses weasel-words to confuse the issue:
Clearing your voice history will not stop Microsoft from continuing to collect your voice data from its voice-enabled products and services and associate it with your Microsoft account. To stop voice data collection, refer to the settings offered by the Microsoft product or service you’re using. [ ... ] You should be aware that clearing your voice activity will remove your audio recordings but may not remove all information associated with your voice activity.
What does this mean? Clearing the voice recordings doesn't actually delete them? Are transcripts being stored and kept even after deletion? Disabling voice collection on the Xbox is done differently than on Windows 10? Why is it not possible to disable voice collection from the Microsoft Voice Data Privacy Dashboard? Making it intentionally difficult for, and confusing, to, the user?
Microsoft gets customers’ permission before collecting and using their voice data.
Indeed, it's buried in the fine print out back, behind the shed. Or more accurately, in the terms and conditions fine print that users seldom ever read before clicking on "ok.. let's go see what this does?" type thing.
what it says in the title - spend more time screaming profanity at your voice-enabled device, using epithets, acting like a crime boss, announcing you'll commit crimes, and things like that. Have fun with it!
and the moment they turn you in, get the PRESS involved in your "joke" and show how they VIOLATE privacy!
(There's no way THEY can win this and YOU get to have fun with it!)
Or get a life?
If you want the voice response type functions whoever thought that they were developed by the tooth fairy measuring what you thought you said against what you really wanted to convey?
At some point a human has to be involved in error removing, when in straight dictation mode to the screen, you can correct the errors. When in voice mode it is not so easy and who has the time to train an electronic box? When using translator services, who has enough language skills to verify the translation? If they do why use the service in the first place?
Too many people are far too keen not to think things through and far too keen to get all hot under their collars.
I have never seen any value that a machine responding to my voice would bring to my life - I know from experience they do not understand my way of thinking or talking anyway, so it is easier to press a button or go without the action.
Hands free on Nokia devices worked brilliantly using voice tags I provided, but since then hands free calling has ceased to exist for me, it just fails to work. If simple stuff fails the complex bits will be worse.
"If you want the voice response type functions whoever thought that they were developed by the tooth fairy measuring what you thought you said against what you really wanted to convey?"
There's two type of people. Those who think it's magic and those who expect feedback from the device to confirm with the user that what they said was correctly understood. The former are the target market.
What if one party to a Skype conversation gives permission but the other doesn't? Does it do the logical thing and block recording or does it wrongly take one person's permission as approval for all?
A naive question, perhaps, but you never know - there might be someone at Microsoft with a bit of integrity.
"samples awaiting playback and analysis, which are, apparently, scrubbed of any information that could identify those recorded"
And those samples have been scrubbed by what again ? A human, obviously. So I'm thrilled that MS employs someone to scrub the samples it submits to translators, and I understand that there isn't really other way to do it since machines have to be taught. I'm guessing MS is counting on the fact that everyone should know that only humans can possibly train a machine to translate, so it'll play dumb and run behind the EULA when confronted on this.
I take this as a storm in a teacup. Politicians are going to go ballistic over this ? Don't think so. MS has a rather solid position from a legal standpoint, so you can try to raise a stink, but unless the users react negatively, it won't go very far.
And the users don't care. They're buying and using stuff that they know listens to them and they don't give a damn until it goes "wrong" from their point of view. And even then, they don't send it back.
Make them stop this, and tell them to hire plenty of people to speak into these things all day long, reading all sorts of stuff, so that they can optimise their speech recognition and translation software by recording them. Rather than using people's private conversations as your free test material.
Last time I looked, Apple, Microsoft, Google and Amazon weren't exactly short of cash that they could pay them with.
"tell them to hire plenty of people to speak into these things all day long, reading all sorts of stuff, so that they can optimise their speech recognition and translation software by recording them."
You know why that doesn't work right? It's not just a case of translating what someone said from one language to another, I'm English, I'm living in Scotland but my accent carries North Yorkshire, Geordie and now a very small amount of Scottish. Google Assistant still knows what I am saying even though I am nit necessarily speaking pure English. I am not alone, it would be impossible to recruit people with every variation of accents to cover all use cases.
My friend is from Glasgow, and it manages to understand him about 75% of the time, which to be fair is a better success rate than me, I have to actually see him speaking to figure out what he's saying, if he phones me to have a conversation, I only understand every 3rd or 4th word and have to fill in the blanks based on context.
The fact that his Google Mini already has a better chance of understanding what he is saying than I do, is exactly why humans teaching the machine is necessary.
There are 2 possibilities - either 1) we all have to speak in unnatural ways to these devices and ensure we don't use any slang, a machine could probably learn this entirely without extra help. 2) the machine is constantly retrained with outside help and we can all speak to our devices the way we would speak to another person, without having to alter the way we construct our sentences. Option 2 is working reasonably well for most of us - but there is still some way to go. "Turn on the Greenroom lights" for example - constantly gets me "Sure, turning the lights green" which is both irritating and nonsensical since I don't have any lights that you can set the colour of.
"we all have to speak in unnatural ways to these devices and ensure we don't use any slang, a machine could probably learn this entirely without extra help."
Personally, I don't have a problem with that. I naturally adjust the way I'm speaking depending on the environment and the people I'm speaking too. It's the polite thing to do and the way I was brought up and taught at school. I'm not ashamed of my Geordie accent and use it with family and local friends but I'm also very aware that when dealing with non-Geordies, using a strong accent and slang words will likely be incomprehensible to them. Gan canny man!
Andrew Jones 2: "You know why that doesn't work right?" -- Individual people have different accents, etc.
Obviously, the tech companies just have to hire a lot of people with a lot of different accents. Think what it will do for employment in the UK! Fifty thousand or sixty thousand full-time positions for working to improve Cortana's accent, dialect, and regionalisms training!
Or MS, Apple, and Amazon could pay users to assist them in the task. No recording unless the company pays the user for the info. That's probably more reasonable.
But neither will happen, as has been discussed elsewhere. As long as user information can be "harvested" surreptitiously at no cost, that's what will happen. It's up to the user to protect himself: no voice assistants, lock down cell phones, block browser ads and tracking, falsify online info wherever possible and legal. Etc.
Personally, I don't need Skype. No Siri in my house, no Alexa, no Cortana. Well, I suppose Cortana lives in my grandpa-box where Win 10 lurks, but that box is not connected to any network. If "my" Cortana was human she'd be in solitary confinement, slowly going insane.
Sorry, but speaking two languages (as I do) there is no way of "listening in" to "correct a few word". Language doesn't work like that - there are nuances, subtleties and a vast layer of uninflected meaning in spoken communication.
The real headline here (apart from how easy it is for the tech giants to snoop despite promising not to) is how pisspoor "AI" must be if it needs an army of humans to actually provide it.
Not really very "A" is it ?
A much better solution would be to promote learning a second language - *any* second language - in schools.
I have heard them being done.
A company called Clickworker has assignments and quite a few have been for Microsoft and involve listening to difficult to understand or unusual voice commands.
A lot are accidental triggerings, but many are stupid questions, where a drunk asks a really random illogical question.
But I won't forget the Welsh chap asking for big gay dicks.
No privacy at all.
Algorithms that understand speech are still very much a work in progress. The practice of practically giving away voice assistants is a way of randomly seeding a large and diverse population which in turn will allow the collection of an enormous number of data samples. These samples are used to refine the speech recognition algorithms; you'd expect most speech samples to be identified correctly but those that are not need to be analyzed to figure out what went wrong. One way you can do this is in the companion application -- here you'll find everything you asked the assistant to do along with what it thought you were saying and you're invited to verify whether it heard you correctly. Those samples that were not understood need to be analyzed further so some human interaction is going to be needed.
I must be one of the few people who understand what these companies are doing and why but that's probably because I have had a lot of experience writing and testing software so recognize the methodology. I don't think of their actions as snooping, the prize that they're after is much bigger than listening in on any individual. For the paranoid I'd offer these crumbs of discomfort -- the combination of directional microphone arrays and speech recognition means that you can be tracked anywhere unless you have a strict vow of silence (and even then there's no guarantee the systems won't be able to recognize your footfall). Combine this with facial recognition and its spinoff, identifying people by the way they look and behave, and there truly will be nowhere to run or hide.
Nicely put, martinusher. If I may condense:
"...the prize that they're after is much bigger than listening in on any individual. ...the combination of directional microphone arrays and speech recognition means that you can be tracked anywhere... Combine this with facial recognition and its spinoff, identifying people by the way they look and behave, and there truly will be nowhere to run or hide."
I feel damned stupid not to have connected the dots and come up with the possibilities that you've noticed. Have a cold one.
It's called MTurk and I'm one of those people who listen. It is true that all identifying information is scrubbed from the audio, unless it is a part of the audio, an address or name that someone says is certainly identifiable. This really isn't a big deal; it's no different than listening to the radio because the chance of me knowing the person I'm listening to and translating/transcribing for, is very slim.
It is a lot of fun to hear these things. There are some interesting things to hear, but the most fun are when Alexa doesn't understand you and you request that Alexa dig deeper, that's when it comes to me. There are also another fun task where I am researching the top five of pretty much any subject. One can learn a lot.
But the bottom line is that I don't know you as much as you don't know me, so there's no big deal here.
Biting the hand that feeds IT © 1998–2019