back to article PDF redaction is hard, NSW Medical Council finds out - the hard way

Australian public sector agencies have a persistent problem trying to redact PDFs: this time, the guilty party is the Medical Council of NSW. The council breached the privacy of a doctor and her son, the Medical Tribunal found earlier this month, because it mishandled redacting their names out of a PDF it published on its …

  1. a_yank_lurker

    OOPS

    Actually, I can understand how this happened. Someone not familiar with how to properly redact text from a document could easily make this mistake. It seems to be more of an innocent error.

    It may also point to underlying flaw in their document management. I doubt the pdf was the original form but either a scanned copy or an electronic conversion.

    1. rh587

      Re: OOPS

      I've also seen it done the other way, where they knew they weren't adept with modern tech, so rather than risk getting it wrong, had printed the document, manually redacted PII and scanned it back in. Unfortunately, they hadn't used a sufficiently dense marker pen and you could still read the names through the black marks...

      Me? In the absence of a decent PDF editor that allows permanent edits (obviously the preferred option), I grab the page and put black marks over in MS Paint or Apple Preview, export as a flat jpg and reinsert the page back into the document, or re-export as a new PDF.

      Not elegant, but you can be absolutely sure that the PDF software hasn't just applied a layer of black "highlighting" or something - there's no indexing the data back out of a flat jpg (not without OCRing, and that won't work on the black blocks).

      1. Danny 14

        Re: OOPS

        generally "printing" said document to a PDF printer also permanently embeds the black box and removes the text underneath (sometimes you have to tick "small size" or equivalent so that it removes all the extra stuff like this)

        1. Anonymous Coward
          Anonymous Coward

          Re: OOPS

          You are the only poster that provided a workable solution using Adobe. You win!

          1. Hans 1
            Windows

            Re: OOPS

            Who in their right mind uses Adobe Acrobat ? I mean, SERIOUSLY ?

            I was hired for consultancy by a client, I finished ahead of schedule and asked the customer what else was a pain ... I noticed they used Acrobat, got them an alternative for their workflow .... the costs saved by an hour's job covered the costs of my consultancy work, and several years of support for the software I was originally hired for. I also got rid of a Windows Server in the process ...

            MS's & Adobe's MostHatedProfessional, I am!

          2. Fluffy Cactus

            Re: OOPS

            No he doesn't win. Because I don't pay for Adobe, only use the free Adobe Reader. The winner is the one

            who knows how to do this without paying. As if this isn't obvious.

    2. Anonymous Coward
      Linux

      Re: OOPS

      If you are hired to redact documents, then the least you could do is actually read up on the subject:

      "Redaction—Remove visible data from PDF files

      with Adobe"

      http://www.adobe.com/content/dam/Adobe/en/products/acrobat/pdfs/adobe-acrobat-xi-pdf-redaction-remove-visible-data-from-pdf-files-tutorial-ue.pdf

      Else export to a pdf, then convert back to a PDF:

      $convert -density 300 -depth 8 -flatten -quality 85 file.pdf file.png

      $convert -units PixelsPerInch file.png -density 96 outfile.pdf

  2. Anonymous Coward
    Anonymous Coward

    WYSI Not WYG

    1. Bc1609
  3. Anonymous Coward
    Anonymous Coward

    PDFs (and .doc etc.) can be tricky little buggers. Best to screenshot it and redact that. Can't really go wrong with a .jpg.

    1. Anonymous Coward
      Anonymous Coward

      DCT compression artefacts beg to differ, better make that a PNG.

    2. P. Lee

      The redaction problem is a vendor issue. The process doesn't do what it appears to do, which is pretty dumb.

      If there is a redaction in the document, it should remind you of the fact when you save the document, default to "save-as" and delete the original text during the save process.

      1. mathew42
        Facepalm

        It depends on how the redaction is being performed. I would doubt that the user is selecting the 'redact' tool and highlighting the section to be redacted. It is more likely that the user is drawing black rectangles. The editor maintains the text (objects) under the rectangle so that if the rectangle is deleted or the order of display is changed the objects can be displayed.

        As others have suggested the organisation should have a process and the tools to support that.

        1. rh587
          Facepalm

          "It is more likely that the user is drawing black rectangles."

          Or using the Highlight/Markup tools with the colour set to black, which would look to an inexperienced user like you're blocking over with a digital marker pen, but of course are designed to be whipped out once edits have been completed, and are entirely removeable.

          1. TeeCee Gold badge
            Facepalm

            That's the one the Civil Service went for some time ago as reported here at the time. Black text on a black background. As all the pixels in the affected area are, er, black it looks perfect too.

            This was a heavily redacted report on something or other to do with our troops in Iraq at the time. Copy text, paste sans formatting into ${word_processor} and all the classified bits were magically converted into public domain bits. Oops.

            1. KeithR

              "That's the one the Civil Service went for some time ago as reported here at the time."

              Really? Which "the Civil Service" is this?

              Because my bit of it - which deals with between 6 and 7 figures' worth of SARs a year (depending on how you count requests: all requests for personal data are SARs, but we don't expect every one to be handled via the full formal SAR regime) - has, since I set up its original Data Protection Unit some fifteen years ago, done all of its redaction using the old-school-but-reliable "print off; redact with indelible ink marker; photocopy; CHECK; release to data subject" technique.

              (My FoI colleagues use a broadly similar approach for stuff that they release - except that the data are rescanned for electronic transmission).

              In that time, although we've had our fair share of RFAs from the ICO (though significantly fewer than any other comparable UK Gov Dept), NOT ONE has been related to fecked-up redaction.

              Staff properly trained by people who know what they're on about, can make up for most technical shortfalls.

              And lazy sweeping statements like yours irritate me.

      2. Holleritho

        Accessible PDF

        If you turn the PDF into an image, you are making it totally inaccessible. I know making a PDF disabled-accessible is not child's play, but it should be done, especially by an organisation funded by public money.

        1. Anonymous Coward
          Anonymous Coward

          Re: Accessible PDF

          "If you turn the PDF into an image, you are making it totally inaccessible."

          And rather pointless keeping it as a PDF too

  4. jtaylor

    Process failure

    This can't be the first time that this medical council has had to redact names when publishing medical information. They should have a standard process for doing so -- including tools and review before publication.

    As already noted, this is a schoolboy error, not a malicious act. Either the people assigned to publish this information were not trained, were not provided correct tools, or were not following a process.

    Plus ça change, plus c'est la même chose.

  5. Anonymous Coward
    Anonymous Coward

    analog methods have issues too...

    I think I still have Dick Cheney's social security number written down somewhere. Back when he shot his lawyer buddy in the face, the accident report was published online somewhere. The "SSN" field was obliterated with black marker, but when scanned, most of the numbers were decipherable. The one or two digits that weren't completely obliterated were down to a couple of possibilities.

  6. Anonymous Coward
    Anonymous Coward

    Another common error...

    Changing the text to be the same as the background colour.

    1. gollux

      Re: Another common error...

      Heh, the old cut'n paste to Notepad for redaction recovery, something I try every time I come across redacted text.

      1. Anonymous Coward
        Facepalm

        Re: Another common error...

        Funny you mention cut and paste in notepad to recover redactions... it would also work for making them and proof checking.

        Copy to notepad, redact, paste to new PDF (to remove any possible "tracked changes" and other malarkey software can be "automatically" helping you with).

  7. Anonymous Coward
    Anonymous Coward

    I wish I had $10 for everytime.....

    I'd have at least $300 now.

    Regards, [BLACKSQUARE]

  8. gollux

    It isn't redacted unless the document is recreated with the redacted content removed, versioning explicitly turned off and meta information scrubbed.

    1. Pascal Monett Silver badge

      Totally that.

      And I marvel at the various solutions offered to redact a document.

      Is it really that hard to Find/Replace the name with <Redacted> and re-publish ? Seems to me that going back over the text and covering each occurrence of the name with a black rectangle is a lot more time-consuming, on top of being totally inefficient.

      1. Pompous Git Silver badge

        Is it really that hard to Find/Replace the name with <Redacted> and re-publish ? Seems to me that going back over the text and covering each occurrence of the name with a black rectangle is a lot more time-consuming, on top of being totally inefficient.

        When denied access to the appropriate software you do what you can. My last employer (government) did precisely that.

        1. Pompous Git Silver badge

          Perhaps whoever gave me the thumbs down would care to explain how they would go about being expected to edit a PageMaker document with MS Publisher...

  9. Steven Roper

    Not a user error

    I would put this down to a design flaw in the PDF editor's user interface design, not the ignorance or incompetence of the document creator or user.

    If the PDF editor (presumably Acrobat or whatever they're using) is giving the impression of redacting text by offering a feature to black it out, then at the very least the program should treat that as a delete-and-replace, removing the text and replacing it with the vector definition of the redaction block.

    While those of us who frequent this site are mostly technically literate, it's easy to forget that your average office worker isn't, nor is it fair to expect them to be. That's what they pay us for.

    Part of the art of programming and user interface design is asking the question, "When a user does this, what would most normal people expect to happen?" Finding the answers to that question and making the program behave that way is what makes an application intuitive and easy to use.

    This isn't just pandering to people's incompetence or stupidity. It's actively reducing the possibility of people making mistakes like this. Therefore in my view the best remedy is to redesign the PDF editor application to actually redact the text when a redaction square is drawn over it, not to chastise or retrain all the document users.

    1. Gordon861

      Re: Not a user error

      But the editor doesn't give the facility of redacting text, it lets you highlight. It even calls the tool a highlighting tool, which expressly implies that the text will still be there after you draw over it.

      Could Acrobat be better? Yes but this is an error of the user not the software.

      It looks like the latest versions of Acrobat do also have a tool for doing the job properly.

      1. Androgynous Cupboard Silver badge

        Re: Not a user error

        Acrobat has had redaction since Acrobat 8. While the UI to do this is certainly unintuitive, that doesn't make it any different to any other part of the Acrobat's UI, and the good news is in the next release they will almost certainly redesign it again to make a completely new type of unintuitive interface.

        1. Hans 1

          Re: Not a user error

          >Acrobat has had redaction since Acrobat 8. While the UI to do this is certainly unintuitive, that doesn't make it any different to any other part of the Acrobat's UI, and the good news is in the next release they will almost certainly redesign it again to make a completely new type of unintuitive interface.

          Which part of the Acrobat ui is intuitive ? I mean, EVEN SELECT & COPY is counter-intuitive!!!!!!!!!

        2. KeithR

          Re: Not a user error

          "Acrobat has had redaction since Acrobat 8"

          Acrobat, yes.

          But Acrobat Reader?

      2. Jimbo 6
        Joke

        highlighting tool

        This sorry saga definitely seems to have highlighted the tools at NSW Medical Council.

  10. Winkypop Silver badge
    Devil

    Black squares in a PDF

    I love telling authors they haven't redacted their documents properly (or at all).

    When you explain the problem it freaks them out.

    But then, I'm a bastard like that.

  11. frank ly

    I tried to repair a friend's hernia last week

    I bought some surgical scalpels, a retractor, a medical suturing kit and some chloroform. He recovered well and walked away. Two days later he collapsed and was taken to hospital. I was arrested and the doctors who treated my friend said I was stupid to try to do that without having had proper training in the use of those tools. I'm a qualified engineer dammit; it's a simple concept and I was very careful. Those medical people make things too difficult.

  12. Ken Moorhouse Silver badge

    Sending Spreadsheets to customers...

    ...with cost prices in hidden cells was a common blunder in years gone by.

    Do people still do that?

    1. Jason 24

      Re: Sending Spreadsheets to customers...

      Yup, they do, I have a filter on the email gateway to stop any excel sheets going out that don't have [ultimate] in the file name.

  13. Anonymous Coward
    Anonymous Coward

    Yet another way to get it wrong...

    Redact content that can be seen, but forgot about things like the path name to the file that was used to embed an image appearing when the mouse hovers over it...

    1. Tim Jenkins

      Re: Yet another way to get it wrong...

      I once had to point out to a senior member of university staff that removing the identity of a student from the body of a 'sensitive' document didn't entirely prevent said individual from being identified if the footer showed a file path ending in a folder that was labelled with the students name...

  14. Anonymous Coward
    Anonymous Coward

    Sent a letter to my MP. They, or an assistant, replied with a copy of a letter sent to another constituent on what they incorrectly thought was a similar issue. The other person's personal details were redacted with a black marker. Tilted the page in bright light and the redacted text reflected the light so it was perfectly readable.

    1. Anonymous Coward
      1. Francis Boyle Silver badge

        seems

        you missed the section about pronoun dropping.

    2. KeithR

      "Tilted the page in bright light and the redacted text reflected the light so it was perfectly readable."

      Yep - laser printer, right?

      This exact risk is why staff in my dept are instructed to redact, re-photocopy the document, then check the thing in exactly the way you did.

  15. Anonymous Coward
    Anonymous Coward

    This happens all the time with FOI requests in the UK

    The trouble is that the people who are required, by law, to release a document with names redacted from it are in many cases doing this for the first time. They are not trained specialists.

    Note that converting to PNG and removing the pixels is not really an adequate solution if the original used a proportional font: the width of the space where the word used to be reveals a lot of information if that width can be accurately measured. However, I can't think of a better solution that I could seriously recommend to a school secretary, for example, unless perhaps it would be:

    - print the document

    - cut out the sensitive information with scissors

    - put the paper on a non-flat surface

    - photograph it with a cheap hand-held camera

    That would make it a bit harder for anyone trying to reconstruct the missing words.

    1. Anonymous Coward
      Anonymous Coward

      Re: This happens all the time with FOI requests in the UK

      OCR is sometimes possible for the recreation if your printing to a hard copy.

      Or as I posted above, is copy/paste to a notepad then redacting, then back to PDF a solution? You loose some of the formatting, but retain all the content and it does a similar job to the paper and scissors.

      :P

    2. KeithR

      Re: This happens all the time with FOI requests in the UK

      "- print the document

      - cut out the sensitive information with scissors

      - put the paper on a non-flat surface

      - photograph it with a cheap hand-held camera"

      Not entirely practical when dealing with responses that might number hundreds - or on occasion, thousands - of pages...

  16. DavCrav

    I've always found that print out, black marker pen, photocopy, black marker pen, photocopy, scan back in, works quite well. Two rounds of photocopying are best I think, although you can probably get away with one as the photocopier ink all looks the same.

    1. Anonymous Coward
      Anonymous Coward

      Depending on the quality of the scan/text could you not drop the colour bit depth to 2? Should remove anything within the black squares. If it does not, it would be obvious and could be redone.

    2. Martin Summers Silver badge

      Nah, tipexing the words out on the screen is the best way.

  17. Tom 7

    Pointless Document Format.

    Does nothing it says on the tin.

  18. John Brown (no body) Silver badge

    Lack of training

    I'd put all down to a lack of training.

    Employers just expect people to know how to use computers and the applications running on them. They only train people on new, custom applications. So-called run of the mill apps are used by "everyone" so "everyone" knows how to use them. Not helped, of course, by the OS and app installers plastering messages across the screen during install about how "intuitive" it all is, ie you don't need to be traimed, you should just "know" by looking at the screen.

    A good example is Scotty in Star Trek IV: The Voyage Home where Scotty is trying to talk to a 20th cent. computer and the guy tells him to use the mouse. So he picks up the mouse and talks into it like a microphone. Intuitive? Only if you have previous experience to build on.

  19. Crazy Operations Guy

    Needs to be a feature

    Why hasn't Adobe just added a 'redact' tool to Acrobat (Or whatever the hell they're calling it now). A simple rectangle selector that removes the data it covers from all layers and any metadata. Maybe even add a bit of intelligence to it and ask to redact all other instances of the string the user just covered up.

    1. Anonymous Coward
      Anonymous Coward

      Re: Needs to be a feature

      Are you sure that's possible in the general case, where the PDF document was generated by some unknown software and is really weird?

    2. M7S

      Re: Needs to be a feature

      No doubt for copies not sold to USGovt there would be a secret "undo" feature intended for law enforcement that would, as with all backdoors, eventually be hacked, but the existence of which would continue to be formally denied.

  20. Lunatik
    Facepalm

    Yup, seen people get tripped up by this one before.

    Lack of knowledge about PDF layers and objects even gets some in the document processing industry who should really know better.

    All our redaction is done on a raster version of the page and output as a single image in the final PDF, regardless of source.

  21. Old Handle

    It's just a matter of using the right tool for the right job. If you're working with a scan, that's an image, so use an image editor (even Paint), if you have editable text... edit the text. Either just write [redacted] or replace it with xxxxx and do your "pretty" censorship on top of that.

  22. jzl

    Can't use a computer

    Yet another person who can't use a computer.

    1. Hans 1
      Megaphone

      Re: Can't use a computer

      ^ That link is great, read it, do it ... I have witnessed the same problem over here - they all know how to start apps in OS X, Linux, and Windows, get on the webs, do some basic shit, the teen can search for computer parts, yet, fails miserably in searching through wikipedia, for example. If it is not in a google results page, it's inaccessible.

      I have two pi's lying here that I have flashed multiple times, the teen could not care less ... guess what, no internet on his gaming rig today, if he wants internet, he will have to flash the pi, setup wifi, route traffic ... lets see what happens ... when he gets back from school.

      Last week, I told him that if he managed to "hack" into the Wifi router and disable parental control I would not enable it again, I would have thought he would have attempted to get into there immediately, but .... sadly no.

      Shit, in his position, I would have at least tried. The worst is, he can go onto my computer when, e.g., I go to the toilet (I do not always lock the screen, my bad), the password is in my password manager, if he is lucky, the browser session is still open ... but no :-(

      1. dajames

        Re: Can't use a computer

        Shit, in his position, I would have at least tried. The worst is, he can go onto my computer when, e.g., I go to the toilet (I do not always lock the screen, my bad), the password is in my password manager, if he is lucky, the browser session is still open ... but no :-(

        ... but, on the other hand, if he reads El Reg he'll now have an idea of where to start ...

  23. Brian Allan

    Another good example of idiots using technology!

  24. Fluffy Cactus

    One the free and easy ways to do this is to get

    1) a free E-Fax Messenger (which allows you to receive Faxes, but not send them)

    2) the free Microsoft Office Document Image Writer, which installs as a printer, by downloading the free Microsoft Office Sharepoint Designer 2007. This prints from a PDF to a flattened TIF format.

    3) The free E-Fax Messenger comes with a Free Fax editor that edits the free TIF format which you save, and then it lets you print it to a free PDF format, at free fax quality 100 or 200 dots per inch.

    Even an idiot like myself can do it.

    Another method is to get the totally free and Open Source GIMP picture manipulating program, which

    will let you open PDF's and make it into a picture of about 27 different file types. But since it's open source, the instructions on "how to edit" are so terribly complicated and unexplained, that an idiot like me

    doesn't have the patience to figure it out.

    Anyway, if you are cheap or patient, those are some ways to go.

    1. KeithR

      "Even an idiot like myself can do it."

      That sentiment is probably at least partly right... .

      The world over, government employees are limited - by contract, by policy, by technical measures and by threat of immediate dismissal if they try do anything about it - to using only the software Sys Admins (and management) have decided they "need".

      So - great idea and all, except... not.

  25. Adrian Midgley 1

    Not using PDFs...

    and indeed anything other than text where pictures are not needed seems to be indicated.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like