Didn't the AOL leak have pseudoidentities?
Officials from justice departments across the EU have been asked to explore to what extent the pseudonymisation of personal data can be used to "calibrate" businesses' obligations to data protection. Pseudonymisation (such as assigning fake names to people), as opposed to anonymisation (complete stripping of identity), allows …
How do you assign the same pseudo identity to the same real person unless you keep a record of the two, hence it isn't anonymous at all.
Everyone's surname becomes a hash of their DoB and street address ? Can't see how that will work when people move... plus DoB and (even vague) street address is probably still enough to identify someone...
Proper pseudonymisation would involve giving new identifier for each session. Very few organisations need to know who is visiting their site every time. Of course, true pseudonymisation is the same as anonymisation, because there is no way to aggregate data. That is what the EU should be aiming for - there should be a set of requirements that must be met before anonymisation can be routinely breached, and "because our business depends on it" would not be sufficient.
Who, exactly, is objecting?
Is it "citizens in member states" or is it "business groups that have been lobbying the MEPs that should be representing their constituency voters in member states"?
The collection of this data already has a price on it. Ensuring its also anonymous shouldn't be a big deal. More importantly though is if a business can't afford a slight increase in the cost of collecting the data then they shouldn't be doing it anyway.
As I see it there are really two distinct groups that don't like this. The first is governmental. Such as the police that want data on persons of interest. The second is advertisIng groups that want to show a specific person a specific ad. In my opinion, they can both piss off.
We know that both groups pay for that kind of data already. Which means the increased costs really isn't about implementing anonymization algorithms; rather it's lost revenue because it's no longer as valuable to the interested parties. And, again, they can piss off.
"allows the same individual...
"...to be assigned the same pseudonym across various data sets."
So if the pseudonym is compromised in one data set, then all the other data sets are compromised to.
Not exactly anonymous at all, then...
Anonymous or pseudonymous, does it matter?
How long will it be before a specialist industry within data mining/marketing develops, that takes all of this "anonymous"/"pseudonymous" data, matches it with other "anonymous"/"pseudonymous" data, Faecebook/Twatter/Reg posts, and are able to un-anonymise the data, then turn around and make all that information available to Faecebook/Twatter/Google*/Plod/ whoever?
* Can't think of a suitable pejorative, any suggestions?
It's frighteningly easy to extract identities from supposedly-anonymous data sets. Some Univ. of Texas researchers showed it's fairly easy to take Netflix's anonymized preferences list, cross-index it with the IMDB, and identify a number of the subscribers (http://www.schneier.com/blog/archives/2007/12/anonymity_and_t_2.html). Given a 5-digit ZIP code, gender, and date of birth, you can identify over 85% of Americans (ibid). If you know where an American works and lives _to the census block_, you can establish identity almost 100% of the time (http://www.schneier.com/blog/archives/2009/05/on_the_anonymit.html).
Be _very_ suspicious of claims of anonymization. Schneier notes ``[A]nonymity systems shouldn't be fielded before being subjected to adversarial attacks.''