Reply to post: Re: @ Dr Heinrich Backhausen

We all hate Word docs and PDFs, but have they ever led you to being hit with 32 indictments?

Michael Wojcik Silver badge

Re: @ Dr Heinrich Backhausen

Scanned documents are images and thus stored at the image layer of a PDF rather than the Text layer. You generlly need to use OCR to convert them back to text

Unless they were OCR'd after scanning, of course. Back in the day I did a fair bit of scan-and-OCR in academia (for purposes permitted by copyright, AFAIK - mostly archival stuff and/or with the holder's permission). We did the OCR immediately and saved as plain text for storage and transmission reasons.

These days when most people think nothing of sending an email with a multi-megabyte attachment to a hundred recipients, "just scan it and send it as an image" is no doubt the most common case.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon