Decryption is difficult and computationally expensive. So what if, instead of decrypting the content of a message, you found a correlation between the encrypted data and its meaning – without having to crack the code itself? Such an approach has been demonstrated by a group of University of North Carolina linguists working with …
British Sign Language?
Could you not sign your secret message? Naturally, using no sound what so ever
Surely this is easy to fix by padding the frame with a random seed so they are all some arbitrary but long length?
Only downside is some transmission overhead.
That's always the biggest problem with the likes of Skype - everything has to be near-real time and so presumably gives you limited options in terms of encryption.
Code excited linear _prediction_
... Not projection
" ... patterns end up being reflected in the size of the data frame ... When the data created by CELP is encrypted, it retains the original frame size ... "
Even if you have no expertise in crypto at all (I certainly don't) that's pretty shocking. I suppose oversights can't really be avoided completely but this looks like a total absence of healthy paranoia.
It isn't shocking
to those of us who do knew about encryption of communication systems before most people had heard of internet
Someone had to say it.
Cunning linguists? But you're a vegan!
Someone just had to go there!
The problem with Skype and other VoIP data is that they have to deal with the dual constraints of data and time. Time because voice communications has to be as close to realtime as possible in order for it to be of any practical use. That doesn't leave a whole lot of wiggle room for dealing with the data constraint of necessarily varying sizes (varying because of the need to optimize bandwidth in a tight pipeline like mobile or remote phoning).
Perhaps the best approach for the problem would be to find a computationally-modest but acceptable-quality voice codec that outputs at a constant bitrate. Then, with help from a key rotation, phoneme reconstruction gets stymied because cryptanalysts will have to deal with uniform packets.
randomly sending section reversed but with a flag to be put the right way round when decoded - let's see those linguists get their tongue around that ;)
Who's to say that this sort of analysis isn't already being done by the security services considering their resources?
How much original speech recovered?
Did they get "wash tit gold hat the football catch vast weak" or peanuts style" waaah wah waaa"?
Anon - in case it's a stupid question.
....wasn't there a rather significant prize on offer from the US's secret squirrels for cracking Sk(h)ype?
Have a group of people at the University of North Carolina suddenly got a lot richer?
Wouldn't something as simple as background music screw things up? The human ear is good at separating that sort of thing out, while digitally it’s just more random data on the line and the gaps between words will disappear and the phoneme's would be distorted.
but then you get a visit from the PRS...
"The human ear is good at separating that sort of thing out"
Maybe YOUR human ear... put background music in and I find hearing VERY difficult. (I am still unsure if this is to do with my hearing or my attention span...)
This is my best guess based on having previously read into the research this was based on (concerning VBR in VoIP).
The music would have to be loud enough and varied enough (e.g. DnB as opposed to classical) in order to make a significant impact upon the bitstream (such being the nature of VBR encoding) in relation to the voice. Not sure if that makes sense.
If you had two people speaking simultaneously with short pauses between words and they both spoke with the same loudness, it would be harder to separate the words. If one person said one word, and the other another, the resulting bits would be as if only one person had spoken, and what he/she spoke was a single messy mash of the two words.
Perhaps an analogy is in order... if quiet background music is represented by a drop of yellow paint, and loud voice is a pot full of blue paint, mix the two together and you get a very-slightly-green blue paint. The yellow wasn't substantial enough to significantly alter the result and anyone looking at the paint will say it's blue, despite there being some yellow in it.
If you have a *pot* of yellow paint (*loud* background music) and mix the two together, you have a completely green paint. You have no idea if this was the original colour paint, or a combination of a range of colours, and there is increased difficulty in determining what the original colours/shades were.
tl;dr - Music would need to be noisy and make your voice pretty indistinguishable to a machine
Ballmer's gonna love the timing of this paper
How is this news?
Exactly how is this news? Statistical analysis and pattern matching to derive content or crack encryption has been in the standard toolkit of cryptanalysts and cryptologists forever. That is why a good encryption algorithm needs to have good diffussion. Unfortunately good diffussion is not possible for the almost-realtime and streaming nature of VOIP. In this case having streaming with a fixed bitrate would fix most of the problem.
Re: How is this news?
It's news because they have improved upon previous methods in such a way that the feasability of the attack is increased and the accuracy of which can be constantly improved upon through sampling and training. Also because Skype is the main target for such an attack (popular and thought to be secure).
Don't Look At This Title
I'm no crypto expert, but surely when designing crypto for voice transmission, the primary design goal -- the very first thing you want out of it -- is that no-one be able listen to the encrypted transmission and work out what it says?!
But with the constraint...
that it must be done and transmitted/received in near real time, which affects the possibilities quite drastically.
As with most side-channel attacks, they're generally either not thought of at the time or considered to be so theoretical in nature that given the application it is safe to ignore.
I think in this case...
...it's partly due to time. The CELP family of voice codecs are rather old and based more on the need to optimize limited bandwidth. Security wasn't exactly in mind at the time, so perhaps a newer codec is needed. Then again, mobile devices have computing and power constraints as well, so developing one that is CBR, good quality, AND low bitrate will have some problems of its own.
...To brush up on my Navajo.
This doesn't work with Judoon speak.
(ROL, FOL GOL ROL LOL BOL ROL)
If you're going to do statistical analysis on spoken English, Indian call centres are going to mightily fuck up your statistics. Not to mention the ones based in Newcastle, Liverpool, Glasgow and Cardiff...
where I can get a text-to-speech engine with a geordie accent?
Just synthesise your voice through a Cheryl Cole Voice-o-matic generator. Apparently a whole nation canna unnerstan what' she sez, like!
Haddaway & Sh!te!
Steganography should fix it.
My reading of the paper suggests that the root of the problem is that a certain phenome of a certain accent will encrypt to the same/similar string of bits every time with the same key. Essentially, they need to make the datastreams look truly random. Steganographic techniques should do the trick.
Unfortunately, this will mean more data has to go to and fro', and/or more computation will be required.