How do you anonymize personal databases and protect people's privacy

Tuesday 3rd November 2015 01:28 GMT Graham Marsden

Great, but...

... they miss one vital point:

Corporate America DOES NOT WANT THIS!

You are the product. Data about you is worth money. Your information can and will be bought, sold, traded, folded, stapled and mutilated in any way they want and all the "standards" you can name won't make a damn bit of difference if they are not backed up with serious legal penalties.

15 0 Reply

This post has been deleted by its author

Tuesday 3rd November 2015 02:31 GMT Adam 1

It's easy. You just disable paste.

1 0 Reply
1. Tuesday 3rd November 2015 05:36 GMT Anonymous Coward
  
  Aaaaaarrrrrrrrrrgggggggghhhhhhh!
  
  0 0 Reply
2. Tuesday 3rd November 2015 06:47 GMT Anonymous Coward
  
  Disable paste
  
  And the governemt is proposing that these people keep logs of your browsing history for 12 months?
  
  2 0 Reply

Tuesday 3rd November 2015 03:01 GMT Speltier

Who is Bradley Cooper?

And why should I care? Can't you pick someone better known to show getting into a NY cab, like Elvis?

5 0 Reply

Tuesday 3rd November 2015 16:48 GMT James Micallef

Re: Who is Bradley Cooper?

"Can't you pick someone better known to show getting into a NY cab, like Elvis?"

Elvis is the one driving the cab, but shhhhhh don't tell anyone, don't want to blow his cover!

1 0 Reply
1. Tuesday 3rd November 2015 19:14 GMT Trigonoceps occipitalis
  
  Re: Who is Bradley Cooper?
  
  And he hands over to Lord "Lucky" Lucan at midnight.
  
  0 0 Reply

Tuesday 3rd November 2015 03:01 GMT Mark 85

The NIST might come up with a standard, but which agency gets to enforce it and will the others abide by it? Given the way the government is be run, I'd say that whatever they come up with will be totally ignored by every department with the excuse from each department "we know how to do this better".... and then they hacked....

I really wish we had a cynic icon.... instead, I'll use the result of this whole thing....

4 0 Reply

Tuesday 3rd November 2015 03:27 GMT Anonymous Coward

Based upon past performance, FTC. They're the designated whiner over our (total lack of) privacy enforcement here. Not that they can DO anything about the NSA, FBI, DHS, DEA, DIA, .... Which is the, crux of the issues of Safe Harbour.

0 0 Reply

Tuesday 3rd November 2015 05:13 GMT I. Aproveofitspendingonspecificprojects

Wow! Six comments already...

Does that mean everyone elseis busy working on it?

Or is everyone working on the obvious conclusion: "Getting into NSA will make me famous."

1 0 Reply

Tuesday 3rd November 2015 05:34 GMT Eddy Ito

Re: Wow! Six comments already...

Perhaps we just recognize mental masturbation when we see it.

2 1 Reply

Tuesday 3rd November 2015 05:42 GMT keristinium85

If people are interested there are a few papers over IPC. The paper I've linked below covers a lot of the examples used by NIST and looks to challenge the view that de-identification of information will simply just lead to the data being later re-identified through the aggregation with other data sets.

It's an interesting area of research, one that is very much in it's infancy but is so important to progress with.

https://www.ipc.on.ca/images/Resources/anonymization.pdf

0 0 Reply

Tuesday 3rd November 2015 07:46 GMT John Smith 19

All of this is far too complicated for the UK government to understand

Who will continue to wap out any dataset they can with virtually zero privacy protection and trust "the market" will "Do No Evil (TM)" with it.

1 0 Reply

Tuesday 3rd November 2015 09:49 GMT Displacement Activity

Pseudonymised NHS data

I wrote some software a few years ago to let GPs/PCTs/CCGs/etc (ie. UK family doctors and the people who pay them and fund medical care) identify anomalies in referral patterns, hospital admissions, length of stay in hospital, "GP performance", "over-referrals" (largely a myth, BTW) and so on. It was funded and used by local GPs - ie. the NHS itself - and the base dataset was the NHS Spine data.

The software was great, but it was useless for the first year or so, because no-one would let me (ie. the GPs) see the raw data with DOBs and gender in it, and you can't do the stats without them. It took a year to get the authorisations, but without postcode (or, equivalently, deprivation) data. Much better, but you can't really be sure what's going on without post/zip code, which makes the data identifiable. I spent about a year trying to get the additional clearance, but there was so much politics in the local NHS that it was next to impossible. The whole system then imploded with the PCT/CCG changeover, and everyone's access to the data was withdrawn, and the funding went, and the NHS disappeared up it's own backside.

So, the software has been unused for 2 years, and no-one in this area (and probably any other area) has any statistically valid way of finding out what's going on in primary care. The govt has now apparently decided that this is important again, so other people are now going to spend a couple of years dicking about trying to get the Spine data, before losing it again. And the whole pointless cycle will repeat again in another 5 years. And the base Spine dataset cost going on for a *billion* to create, plus maintenance.

So, if you're worried about the privacy of your NHS data - don't be. Everyone in charge is so stupid and paranoid that no-one's ever going to see it anyway.

6 1 Reply

Tuesday 3rd November 2015 11:02 GMT Just Enough

Re: Pseudonymised NHS data

So the basis for you calling everyone in the NHS stupid is that they are paranoid about data protection?

And that's a bad thing?

6 0 Reply

Tuesday 3rd November 2015 12:33 GMT Primus Secundus Tertius

Anonymous aggregates

I cannot imagine that lay people, i.e. politicians and journalists, will ever understand the difference between aggregated data and anonymous data.

Aggregated data would say, for example, the average blood pressure in postal area GU99 is P with standard deviation (*) Q. 'Anonymised' data says that Mr Z of GU99 9ZZ has blood pressure P. When the marketing droids also establish that Mr Z drives a red car and owns five computers, it all becomes uncomfortable.

It would be nice if database queries were restricted to aggregate data, but I don't see that as practical.

(*) Another term I have yet to see any journalist or poitician understand.

3 0 Reply

Tuesday 3rd November 2015 19:45 GMT UlfMattsson

Urgent need

I agree that "Given the growing interest in de-identification, there is a clear need for standards and assessment techniques that can measurably address the breadth of data and risks," but standards may take an additional 10 years to agree on and enforcing regulations is always difficult.

We know that NIST is concluding that "Many of the current techniques and procedures in use, such as the HIPAA Privacy Rule’s Safe Harbor de-identification standard, are not firmly rooted in theory." It may take many years to fix this issue.

We know that "the risk depends upon the availability of data in the future that may not be available now." So we need a policy driven approach that can be easily adjusted over time as more data is available.

I like to consider employing "a combination of several approaches to mitigate re-identification risk. These include technical controls." I've seen two interesting technical approaches that can provide a balanced combined solution to address the growing issue of privacy and access to data. The first approach is based on a service oriented privacy-preserving data publishing. This service oriented approach can provide policy driven control over how combinations of different data is accessed and the accumulated volume of data that is accessed. The second approach is based on data tokenization and dynamic masking, can secure the data itself against misuse and theft.

I think that a balance between the first and second approach can provide an attractive data centric solution for different sensitivity levels.

I agree that we need a "balance between providing privacy and useful data," and we are running out of time to fix this growing issue.

Ulf Mattsson, CTO Protegrity

0 0 Reply

Wednesday 4th November 2015 02:11 GMT Novatone

Quantization would be a good first step.

Researches don't really need a birthdate accurate to one day, quanize to 1 or 1/2 month.

Group small zip code areas into slightly larger areas.

Quanize other location and time information.

Etc.

0 0 Reply

Wednesday 4th November 2015 14:08 GMT Just Enough

Quantization only works if the researchers know exactly how it was done while analysing.

If you quantize, for instance, birthdate by month, and then analyse births by day of the week, you're going to get false results.

If you group your zip codes, then analyse by latitude divisions that intersect your groupings, again the results will be meaningless.

Hopefully the results of these would be obviously weird. But other quantizing may not be so obvious and missed.

0 0 Reply

Topics

Special Features

Vendor Voice

Resources

COMMENTS

Great, but...

Disable paste

Who is Bradley Cooper?

Re: Who is Bradley Cooper?

Re: Who is Bradley Cooper?

Wow! Six comments already...

Re: Wow! Six comments already...

All of this is *far* too complicated for the UK government to understand

Pseudonymised NHS data

Re: Pseudonymised NHS data

Anonymous aggregates

Urgent need

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

US legislators propose American Privacy Rights Act - and it looks quite good

Academics probe Apple's privacy settings and get lost and confused

96% of US hospital websites share visitor info with Meta, Google, data brokers

Google will delete data collected from 'private' browsing

Reform of USA's Section 702 spying rule may make it to a vote this week

Head of Israeli cyber spy unit exposed ... by his own privacy mistake

Ex-White House CIO tells The Reg: TikTok ban may be diplomatic disaster

Lawsuit claims Meta hobbled Facebook Watch to help Netflix

AT&T admits massive 70M+ mid-March customer data dump is real though old

Majority of Americans now use ad blockers

Meta accused of snarfing people's Snapchat data via traffic decryption

Uncle Sam wants to know how big airlines use passenger data

About Us

Our Websites

Your Privacy

All of this is far too complicated for the UK government to understand