Reply to post • Re: @ Dr Heinrich Backhausen • The Register Forums

Monday 26th February 2018 14:40 GMT Michael Wojcik

Re: @ Dr Heinrich Backhausen

Scanned documents are images and thus stored at the image layer of a PDF rather than the Text layer. You generlly need to use OCR to convert them back to text

Unless they were OCR'd after scanning, of course. Back in the day I did a fair bit of scan-and-OCR in academia (for purposes permitted by copyright, AFAIK - mostly archival stuff and/or with the holder's permission). We did the OCR immediately and saved as plain text for storage and transmission reasons.

These days when most people think nothing of sending an email with a multi-megabyte attachment to a hundred recipients, "just scan it and send it as an image" is no doubt the most common case.

Topics

Special Features

Vendor Voice

Resources

User topics

Article topics

Reply to post: Re: @ Dr Heinrich Backhausen

We all hate Word docs and PDFs, but have they ever led you to being hit with 32 indictments?

Re: @ Dr Heinrich Backhausen

POST COMMENT House rules

Enter your comment

Add an icon

About Us

Our Websites

Your Privacy