back to article A docket, tweet and selfie can reveal your identity, boffins find

Scientists have revealed it is possible to determine the identity of shoppers using credit card purchase and location metadata, in research that throws a spanner into national privacy laws. The research published in the journal Science found shopping receipts could be matched with four sources of external location data …

  1. Neoc

    "Intrepid reporters matched the de-anonymised ride information..."

    If the data was de-anonymised (i.e. had its anonymity removed), why did they need to match it with photos to find out who took what cab?

    1. Anonymous Coward
      Anonymous Coward

      They used the wrong word there. If you read the original paper it was anonymised credit card info. I'm assuming it's the same with the cab data.

    2. Anonymous Coward
      Anonymous Coward

      Nope

      The cab data was not - to my knowledge - de anonymised.

      1) There was a published - anonymised - database of cab trips in NY over a certain period with details of start point&time/end point & time

      2) Some freak celebrity stalker matched the start (or end) points and times of these journeys with papparazi photographs of celebrities.

      3) From that he guessed that Celebrity A who was photographed stepping out of a Taxi in front of a 5th Avenue hotel on July 21st around 2pm probably took a taxi from a Greenwhich village flat a bit earlier... (because there was only one taxi that stopped at this location at that time)

      The taxi trip database was anonymised but the paparazzi photo was not ! Hence allowing a correlation.

  2. Anonymous Coward
    Anonymous Coward

    Stop the "big data scare" sensationalism.

    Or how to spin basic statistics into sensationalist non-news !

    What the "scientists" have discovered is a way to get talked about in the news....

    - If you have a non anonymised data set that links your identity I to some feature F

    (John eats a lot of crisps and works near Woking station)

    - if you have an anonymised data set that links feature F to behaviour B

    (anonymous Shopper x buys a lot of crisps at a Tesco near Woking station)

    - Then you can correlate Behaviour B to Identity I

    (Anonymous shopper x has a high probability of being John)

    Duh...... in other words IF you know John already then you could identify patterns that correspond to his behaviour in anonymised data. NOT the other way around.

    It is not by looking at an anonymised database of - let's say - Tube journeys and my facebook posts of - let's say - cute kitten videos. That you can get my name, address and other personal details !

    1. Anonymous Coward
      Anonymous Coward

      Re: Stop the "big data scare" sensationalism.

      They "discovered" what NSA, GCHQ, etc have been doing for the last 7 years.

      Metadata if available in sufficient quantities allows identifying and tracking an individual. News at 10.

    2. BasicChimpTheory

      Re: Stop the "big data scare" sensationalism.

      "It is not by looking at an anonymised database of - let's say - Tube journeys and my facebook posts of - let's say - cute kitten videos. That you can get my name, address and other personal details !"

      But that is precisely the issue here, AC. The method as described in the report (not this article, description is pretty poor here) shows how using various "correctly" anonymised sources an actor can simply de-anonymise those anonymous behaviours. Let's look at a (purely hypothetical) perfectly anonymised dataset of public transport records, as per your example, and communication metadata. It is not difficult to see how an organisation who legitimately knows your details (for example your employer/prospective insurance company) could pretty accurately determine your location/call history/spending habits from a legitimately aqcuired collection of anonymised data sets. Been talking to another business in the same field about a job? Your employer might be able to work that out. Regularly drop a relative off at the oncologist? Wouldn't like to see what the insurance company would make of that.

      This isn't even touching on ID fraud.

      The issue here is not a beat-up (the article is fairly crummy though).

      1. Anonymous Coward
        Anonymous Coward

        Re: Stop the "big data scare" sensationalism.

        It's also a bit of a stretch to describe some of this data as "metadata"; location and time attached to a photo may be, but credit card and purchase details on a receipt, or taxi ride details? That's real data describing the transaction. The problem here seems to be that some data controllers think that anonymisation just means removing names, and as a result are releasing datasets that are too easy to correlate.

      2. Elmer Phud

        Re: Stop the "big data scare" sensationalism.

        "But that is precisely the issue here, AC. The method as described in the report (not this article, description is pretty poor here) shows how using various "correctly" anonymised sources an actor can simply de-anonymise those anonymous behaviours."

        Yup, despite me having three seperate 'identities' across the web it's no real big deal to dig in to FB data and find out who I really am (despite FB constantly asking for 'real' details).

        A little bit of work should cross-reference posts and messages, digging in to other's accounts would fine-tune my data.

        FB keeps asking for my mobile number - despite me putting it in messages where 'my mobile no.' precedes it. A little bot/bit (?) of a tweak ought to have a box coming up with 'is this your mobile'.

        I do use cash a lot of the time -- Only use 'loyalty' cards when using plastic, don't have a regular payment pattern -- like cash for up to an amount and card after.

        Paying for fuel by cash saves so much farting about - just bung cash on the counter and go 'Thirty quid, pump seven' and bugger off.

        (O.K. the 'pump security CCTV will register the number plate and time which will provide a link but not a card trail)

  3. Ragequit

    I had to skim thru the original paper to get the gist of it. All data was anonymous except for the photo's. Basically you can be stalked by using gps/time data on photos and correlating that into any other dataset that has time and location. The more photos you have the better. Anyway I don't think this is really a new concept. It's just someone bothered to the math behind it. Though depending on the precision of the data I don't think you can necessarily be 100% accurate. What if you were on vacation and shopping with a friend? If you didn't buy anything but took pictures of the places your friend did you would get a false positive. All you can prove is that the pictures were taken at the same general time/place as the credit card purchases?

  4. Rustident Spaceniak

    It really just automates snooping

    I think, as a phenomenon, this isn't all that new. In the olde days, when someone bought exactly one bottle of Glengrouse every week at the off-licence in Littleton-behind-the-woods, you could easily figure out from gossip that the buyer would be old Colonel Mumblewick. Now you just collect the same data electronically and profile everyone in Littleton the same way. Scary? Not very. Annoying? Probably a bit, to some. Useful? Dunno, might be at some point.

  5. Jim 59

    +1 to Mr Pauli for the word "docket".

    Would have been +2 for "chit".

  6. Doctor Syntax Silver badge

    90 percent accuracy

    Translate that into 10% errors.

    If someone's making enough attempts then there's a good chance that one of those is going to land on me. Or you. Or one of our families.

    1. ravenviz Silver badge

      Re: 90 percent accuracy

      Or Harry Buttle.

  7. Irongut

    So if you are not and have never been on FB, Twitter, ertc you're still anonymous?

    * smug face *

    1. choleric
      Black Helicopters

      No, for two reasons.

      1) Network analysis on the social relationships of people whom you know and who _are_ on FB, Twitter, etc. will reveal a missing link or gap. You are somebody's child/sibling/significant other/coworker/boss/underling/nemesis/other. You can be fairly straightforwardly identified as being the missing link, even though you yourself have not registered on the service.

      2) These days if you're not on FB, Twitter, etc. then sadly that means you are probably on some "and here's a list of all the 'weirdos' who chose not to register for some reason - we should keep an especially close eye on them" list in a TLA office not far from you...

      1. Ragequit
        Joke

        "2) These days if you're not on FB, Twitter, etc. then sadly that means you are probably on some "and here's a list of all the 'weirdos' who chose not to register for some reason"

        See!? Now photographs do more than just steal your soul!

        Seriously though anyone with enough time and resources could identify anyone. It's just social media makes it so anyone with little time nor money can identify someone. Though group photos (the kind that friends/family are likely to post on social media) are less of a concern as it does not help to establish a one to one correlation. Candid photos taken by a stalker could certainly establish a correlation in a dataset. Fortunately it would require 4 or more photos. Unfortunately your stalker probably posted 100's. Fortunately (or unfortunately) they'll probably make their presence know sooner or later and then you'll know your privacy has been compromised in more ways than one.

        This topic is food for thought but at the end of the day it just means people will be even more paranoid about gps tracking on their own devices.

  8. channel extended
    Trollface

    Cash is still King!!!!!

  9. dsanchezURV

    In a recent Technical Comment published in Science (18 March 2016, p. 1274, http://science.sciencemag.org/content/351/6279/1274.1.full ), David Sánchez, Sergio Martínez and Josep Domingo Ferrer (UNESCO Chair in Data Privacy, CRISES Research Group, Universitat Rovira i Virgili, Catalonia) demonstrate that the reidentification risk reported by De Montjoye et al. was significantly overestimated (due to a misunderstanding of the reidentification attack) and that the alleged ineffectiveness of anonymization is due to the choice of poor and undocumented methods and to a general disregard of 40 years of anonymization literature. The technical comment also shows how to properly anonymize data, in order to reduce unequivocal reidentifications to zero while retaining even more analytical utility than with the poor anonymization mechanisms employed by de Montjoye et al. In conclusion, data owners, subjects and users can be reassured that sound privacy models and anonymization methods exist to produce safe and useful anonymized data.

    Supplementary materials detailing the data set, algorithms and extended results of our study are available at: http://arxiv.org/abs/1511.05957 . Moreover, unlike the De Montjoye et al.’s data set, which was never made available, our data, anonymized results, and anonymization algorithms can be freely downloaded at: http://crises-deim.urv.cat/opendata/SPD_Science.zip .

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like