back to article Your health, tax, and search data siphoned

Google, Yahoo, Microsoft's Bing, and other leading websites are leaking medical histories, family income, search queries, and massive amounts of other sensitive data that can be intercepted even when encrypted, computer scientists revealed in a new research paper. Researchers from Indiana University and Microsoft itself were …

COMMENTS

This topic is closed for new posts.
  1. Ken Hagan Gold badge

    Nice work

    "They also showed how the auto-suggestion features in Google, Yahoo!, and Bing can leak the search terms users enter, even when traffic is encrypted over WPA. That's because the resulting packets are easy to identify by their 'web flow vectors.' [...] The most obvious solution is to 'pad' responses with superfluous data that confuses attackers trying to make sense of the traffic."

    Um, no. The most obvious solution is to stop auto-suggesting, at least from the server end. You could presumably continue to auto-suggest on the client, so the end-user might not even notice that the facility had disappeared. Even if they did, it's hardly the end of the world to have to type stuff out in full.

    Still, this is an unexpected leak, at least to me and probably also to the people who wrote these applications. Just as well independent security research isn't illegal, eh?

    1. Mark 65

      Surely

      Surely the point of the server suggestion is that in order to perform the same service at the client end the whole dataset needs to be present (or at least a suitably large one)? In which case the only real mitigation is to stop doing it.

      I also found the concept quite interesting and somewhat unexpected.

  2. Anonymous Coward
    Anonymous Coward

    A Generic Solution

    ... would be to profile the system for the request frequency and request lengths being transmitted. This would allow for a customized, random-length padding and decoy message strategy to be implemented.

    The safest strategy is to saturate the channel with random garbage as long as no useful data is to be transmitted. I bet Cisco and Juniper will like the latter strategy :-)

  3. Anonymous from Mars
    Thumb Up

    This is pretty cool.

    This is pretty cool. Auto-suggest has like 37 common combinations per keystroke. It wouldn't be hard to map them all out in real-time.

  4. Combat Wombat
    Badgers

    So they are saying

    1) Wi-fi is insecure..

    2) Most Web masters deploy monkey level security for their SSL (more likely their managers MAKE them do it)

    3) Web security is HARD man! (insert crying here)

    I want to get paid to produce these common sense level reports. I am sure this is stuff that has been said over and over by the people on the front line, but requires a freaking consultant paid at millions before they will listen.

    I say we sic the badgers on them !

    1. Ken Hagan Gold badge

      Re: So they are saying...

      Not as far as I can see, no. The whole thing depends on the fact that a lot of small packets are being transmitted in fairly formulaic conversations.

      Therefore you don't need to know the exact contents of the packets. You can guess what web-site the client is talking to, use public knowledge of how that site operates and what the possible questions and answers are, and then match those possibilities onto the actual conversation by using the lengths of sent and received fragments.

      It's an "almost-known" plaintext attack.

  5. Roger Stenning
    FAIL

    So WiFi is unsecure... again.

    Not very earth shattering, really. If you want something to remain private, you don't spread it over the airwaves, encrypted or not. You either use a landline connection, or you do it in person wherever the person you're dealing with conducts their business from. Common sense, that.

    Fail, because these academics just can't see that it's old news.

    1. Gulfie
      FAIL

      Fail to you too...

      Did you not see the part of the article that points out that this works as a man-in-the-middle attack too? You don't need stuff to be put over the air, or even decrypted - you just need access to the data packets flowing between the web server and the end user, wifi or no.

  6. John Smith 19 Gold badge
    Thumb Up

    unencrypted, uncompressed back channel?

    The implication of this is that the updated "suggestions" are sent in clear and in effect you employ an inference approach to figure out who has conditions like this. Jeopardy in software, or "inverse design" if you have a CAD background.

    Given one of the aims of this is to reduce large downloads by only downloading options based on a series of questions the *obvious* answer is to apply data compression to reduce the volume, padding to make *all* responses a standard size and encryption to make the packets a lot harder to read.

    The question is do these protocols *support* such options in the first place. If you can some kind of frequency analysis and a fairly small dictionary should yield substantial compression. If you can't do this behind the scenes in the client then you're looking at some *major* rewriting.

    Good article. But *boy* has it taken some people to consider this *might* cause trouble.

  7. Anonymous Coward
    Anonymous Coward

    Scares the hell out of me

    Autocomplete just always felt like a privacy risk. Just too many superfluous packets going back and forth, and now here's vindication. Personally I have always avoided it by using Opera's shortcuts where possible (for ebay, google, etc), or else disabling javascript for the landing page.

    Trying to communicate this to friends (who insist that consulting Google on very personal health matters is somehow better than going to a GP) is a real struggle though.

    Back to patching the problem though; how about sending the typed characters in a bunch after a random number have been entered, e.g every two, three or four. That ought to make the statistical analysis a bit harder with less of an overhead. Or, like the poster above suggests, send a larger dictionary based on the first character or two of each word, and possibly cache them at the client side. The dictionary of common words would probably still be less than the bunch of javascript Yahoo send with each of their Webmail client pages nowadays.

  8. Michael Wojcik Silver badge

    Good work, but not really surprising

    This is a classic resource-consumption side-channel attack. Kudos to the researchers for thinking of it and doing all the hard work to demonstrate it, but it shouldn't be surprising to anyone who pays attention to information security. Attacks of this class against computer security systems go back at least as far as the 1972 TENEX password attack (see eg http://osvdb.org/23199).

    What's worth noting is that once again the people who build these systems operated under the assumption that they had a "secure channel", as if that were some sort of absolute. What they had was a channel that raised the work factor for extracting bits of message entropy. An attacker who can perform traffic analysis on the channel and find patterns that correlate to the message contents has extracted some of those bits. If there isn't much entropy in the message to begin with...

  9. TSilva
    FAIL

    Perspective

    Typical to mainstream media to make such a big deal of security issues like those described in this whitepaper but for this technically inclined web site to title their articles " Your health, tax, and search data siphoned" and scare readers by stating that everything on the internet is insecure is irresponsible and lowers the credibility of this writer and the web site itself.

    If indeed there is a hacker that is capable of doing 10% of what the white paper suggests to intercept, decode and identify me and my personal data I would personally hire this very smart person as it would be someone who could actually benefit my company.

    I would think it would be a lot easier to steal your laptop and just access ALL of your information in one shot, or bribe your company's network administrator which most likely monitors your email and web access anyway and has access to all of your files at any time that she wishes. Or your 11 year old kid that probably can hack into your computer faster than you can log in to it and access all your email and see take a quick look at the hsitory of all the web sites you visited in the last couple of months (now you should really be scared).

    Stop selling news by creating fear of the unknown and banking on people's ignorance to gain one extra web hit. Place news items within context and into perspective!

  10. TobyG

    The sky is not falling - no surprise here!

    As Michael points out -- no surprise that there are weaknesses in Web applications to conjecture information despite the fact that encryption is in place. We find quite typical here at VeriSign and some of the described attacks also require a man-in-the-middle attack (MITM) to be performed first. Extended Validation SSL Certificates can go a long way in defeating the effectiveness of MITM attacks by reliably identifying that the user is connected directly to the desired site and not by way of a malicious MITM proxy seeking to steal information.

This topic is closed for new posts.

Other stories you might like