back to article How a tax form kludge gifted the world 25 joyous years of PDF

HTML is the world's most common digital document file format. However, it's not the one everyone turns to when they want to create a precise document that looks, prints and behaves the same on any platform on any device. And it's hardly the format of choice for immediate offline reading, easy sharing or simple portability. For …

Silver badge

... Camelot - 'tis a silly name

but Elderberry would have been worse ... :-)

10
0
Reply
Anonymous Coward

Should have gone with cameltoe ;)

11
2
Reply
Bronze badge
Coat

Cameltoe

And no doubt the Open Source version would have been called LunchBox .....

9
1
Reply
Silver badge
Facepalm

Content creators have long been demanding a version of PDF that supports embedded HTML5-based media, interactivity and animation

For the love of $DIETY no, no and thrice no!

How many vulnerabilities have been in Acrobat reader due to the ability to execute arbitrary code? Please keep a document standard as that - something for reading and printing. Even the option for forms to fill in has piss-poor support and don't get me started on the shit that is the encrypted versions that only Adobe products can open.

71
1
Reply
Silver badge

Wasn't that the Flash idea?

9
0
Reply
Silver badge
Devil

For the love of $DIETY

Now with 30% less hellfire and damnation.

10
0
Reply
Silver badge

Content creators have long been demanding a version of PDF that supports embedded HTML5-based media, interactivity and animation

I'd love to see how those print out. I've already seen "Please click on this [link] for more information" in all its brain-dead-tree irony.

7
0
Reply

$DIETY ...

... like a $DIET ?

4
0
Reply

Since PDF's are normally designed for printed media, why would you want animated junk in it? If people want that kind of stuff, stick to HTML.

53
1
Reply
Silver badge

Joke enters stage left...

There are some examples around of people asking "how do I print off this video on page 3 of the PDF*"....

[edit]* By that I mean PowerPoint Slideshow, because AFAIK you cannot embed a video in PDF :P

8
0
Reply
Silver badge
Alert

Re: Joke enters stage left...

Click if you dare. Or if you've got a Mac it's probably this one.

3
0
Reply
Facepalm

Re: Joke enters stage left...

You can embed video in PDF. I used to have an unrewarding job webmastering for a large public sector organisation. There was a policy of "all website video needs approval", because accessibility and generally lousy production standards of most of the stuff comms/marketing droids were buying in or shooting.

However there was no such policy for PDFs. And all of a sudden some people were uploading suspiciously large PDFs with very few pages. Upon closer inspection these were found to contain embedded video.

"Can" != "should". Unless you're a JavaScript interpreter perhaps.

16
0
Reply
Silver badge

What really annoys me is people who don't understand the potential of the format. For example years ago I was supporting a small business who used PDF. The company received forms via email and were then printing them out to fill them in. They were being sent them from companies they were customers of through to a trade organisation they were a member of. Most if not all of these were locked down preventing filling them in electronically. These were just forms there wasn't anything about them that had intellectual property or anything like that. I pointed out to some of the companies concerned that they could put form fields in but don't think any of them had a clue what I was going on about. Some of them wanted the physical printed filled in copy sent by post and wouldn't accept a emailed version let alone a fax.

10
0
Reply
Anonymous Coward

An interesting article. Tried to get my screen-scraping bot to read PDF text in a browser***. I have never found an easy way to get the text words and paragraphs from the accessible page data. The exposed HTML, if accessible to Selenium, merely places letters in precise positions on a page.

***which for technical reasons is now Chrome.

7
0
Reply
Silver badge

PDF has its uses I suppose

As the story says, if you just want to print it out like it's supposed to look then PDF is fantastic. Must be great for people who still own printers.

For everything else just no. When Amazon first started selling the Kindle I had this wonderful idea to load our existing technical documents onto it. Nope, they are all PDFs designed to look like A4 and resisted my every effort at resizing to fit the smaller screen. Even trying to edit a PDF is a series of unpleasant workarounds unless, I suppose, you bought a full copy of Acrobat.

It's got so much more difficult recently because of Adobe trying to "cloudify" everything, plus extend PDF far past simple document representation into interactive forms and their own version of electronic signing.

Funny that PDF originated from a way to print tax forms because as it happens the "Agencia Tributaria" (Spanish tax equivalent of HMRC) have done a thorough job of offering their users a PKI-based online tax system. Works well but its weakness is its dependence on PDFs and problems are guaranteed in those bits every time.

10
8
Reply
Silver badge

Re: PDF has its uses I suppose

Not just for print outs, communication.

So you want to do away with a standard, so when I refer you to page 404 of the HTML status code manual, you get something totally different because in your rendering of the manual the relevant material is on page 418 or even 1415...

Similar considerations apply when I try to refer to the same material across devices and formats.

14
6
Reply

Re: PDF has its uses I suppose

This comment deliberately left blank.

13
0
Reply
Silver badge

Re: PDF has its uses I suppose

Even trying to edit a PDF is a series of unpleasant workarounds unless, I suppose, you bought a full copy of Acrobat.

Thanks for reminding me that I need to migrate my copy of Acrobat 8 to my new PC. (The reasons that I have one are, today, entirely invalid, and, on reflection, probably were *always* entirely invalid, but they seemed like a good idea at the time.)

Oh, and yes, I own a printer. And when it went so far EOL that I couldn't get ink cartridges any more, I bought different one to replace it. So I guess I own *two* printers, at least until I get around to taking the old one to the tip.

7
0
Reply
Silver badge

Re: PDF has its uses I suppose

That's why standards and legal documents use paragraph numbers.

As does the Bible and other holy books.

That particular problem was solved over a thousand years ago - in fact, before the concept of a "page" was invented.

12
2
Reply
Anonymous Coward

Re: @Steve The Cynic

So I guess I own *two* printers, at least until I get around to taking the old one to the tip.

If it's so far beyond use, then the correct disposal procedure is take it apart for s**ts & giggles, then take a pile of parts to the tip

10
0
Reply

This post has been deleted by its author

Bronze badge

Re: @Steve The Cynic

Or, depending on the type of printer, taking it apart for all the nummy parts for making other things. (servos, steppers, smooth steel rods with linear bearings that are already fit for it, etc...)

I know of a few 3d printers that were built using scavanged dot matrix printer parts...

5
0
Reply
Silver badge

Re: PDF has its uses I suppose

"Even trying to edit a PDF is a series of unpleasant workarounds"

Trying to unscrew a welded joint is also tricky and for the same reason: they're both intended to be unchangeable.

1
1
Reply
Anonymous Coward

Re: PDF has its uses I suppose

> as it happens the "Agencia Tributaria" (Spanish tax equivalent of HMRC) have done a thorough job of offering their users a PKI-based online tax system. Works well but its weakness is its dependence on PDFs and problems are guaranteed in those bits every time.

Well, that's the Spaniards for you: take a good idea and implement it as poorly as possible. Have you had to deal with one of their electronic IDs?

0
0
Reply
Anonymous Coward

Re: PDF has its uses I suppose

> when I refer you to page 404 of the HTML status code manual, you get something totally different because

Because you are not referring to a page: you are applying a physical metaphor in an inappropriate context and failing to comprehend the implications. That is an example of misuse.

2
0
Reply
Anonymous Coward

Re: @Steve The Cynic

> I know of a few 3d printers that were built using scavanged dot matrix printer parts...

Intentionally or in the process of trying to put the original thing back together?

4
0
Reply
Anonymous Coward

Re: PDF has its uses I suppose

> Trying to unscrew a welded joint is also tricky and for the same reason: they're both intended to be unchangeable.

Nope. Again that is one of the many and most commons misconceptions about PDF.

The output was not designed to be changeable. That is an altogether different kettle of fish than "intended to be unchangeable". You are mistaking a non-goal by a requirement.

1
1
Reply
Silver badge

Re: PDF has its uses I suppose

So you want to do away with a standard, so when I refer you to page 404 of the HTML status code manual, you get something totally different because in your rendering of the manual the relevant material is on page 418 or even 1415...

The vapidity of this example (there is no "HTML status code manual") aside, the problems with using page numbers for citation have been well known since long before there were computers. That's why, when we're using responsive-layout documents, we don't use page numbers to cite passages.

This straw man was scattered to the winds long ago.

0
0
Reply
Silver badge

Re: PDF has its uses I suppose

>The vapidity of this example (there is no "HTML status code manual") aside, the problems with using page numbers for citation have been well known since long before there were computers.

@Michael - I think you need to get out into an office or classroom and listen to real people and look at the books/source materials they are using, especially the "contents" page. Also take a look at the covers of various magazines: "Your guide to Cloud - see p18". I'm not talking about citation, although looking through various academic papers, I do note many in their references include page numbers, which can be helpful in confirming that the paper's author was referring to the "2nd edition - reprinted with corrections" and not the 2nd edition.

BTW I know there is no "HTML status code manual", I chose it as computing reference books/tomes are the sorts of things I felt El reg readers could relate to. Also the page numbers were carefully chosen - lateral thinking needed for 1415 .. :)

0
0
Reply

Flow

"Now with people primarily reading on screens, (over 50% of eBooks on phones) and no standard screen size or resolution, like Letter and A4 on paper, layout needs to be "Responsive" and work with user selected rescaling (sharp vs poor eyesight)."

Most of the HTML I see these days shows every sign of the "web designer" fighting to stop users' browsers from applying their own formatting to fit the device & screen.

37
0
Reply

Re: Flow

Aye, there's the rub.

Do you want Jobs style beautiful exact placement on the "page" or be able to view things on many devices.

11
0
Reply
Silver badge

PDF is a WORM format, as far as I'm concerned.

We use it in work to say "This is it, this is the document, this is how it looks, nobody change it" and then offer that to customers knowing it will look the same no matter what they open it on and it can't be tweaked. Yes, we know you *can* edit them, but you can't edit them easily or nicely or guaranteeably.

Draft in Word, publish in PDF.

It's a great format for that. This is the version, no changes. Sign it if you have to. Beyond that, it's really just another format.

I refuse to buy Acrobat, though. I paid for Nitro once when it was cheap and that serves all my needs. For years (and still currently), I used PDFCreator and other freebie Ghostscript-based things to create PDFs if I needed them.

I don't see that the format needs much extension.

However, I was recently asked how to "stop people stealing our pictures out of PDFs" (and also website images). My solution was "don't put them in there" because you can't beat an analogue hole (screenshot tool) and PDFs you can suck the content out any time you like. They can't restrict "reading" permissions.

The biggest problem with Adobe is all the plug-in shite that tries to put such limitations and other DRM on you. I have one that literally interferes with EVERY PDF you print by watermarking it, whether or not it was part of the purchased PDFs that had that DRM. We stopped buying that stuff, fortunately.

Keep it to a display format. I mean, use the forms stuff if you have to but even that's a security risk (running Javascript and talking to outside websites, etc.). Anything more is really a nonsense and won't be used and will contribute to the long-term death of the format.

PDFs are fine. I mean anything would be fine, but XPS. But Adobe can't be making much money out of them at all.

24
3
Reply
Silver badge

>how to "stop people stealing our pictures out of PDFs" (and also website images).

Well this goes back to considering the intended purpose. As PDF's are really intended to be printed, there is generally no reason to have pictures/images that are of a higher resolution than is necessary to print the page. Similarly with websites, do you really need a high resolution image when most people will be viewing the content on a 1366x768 laptop display or a 3~5-inch mobile phone display.

Doesn't stop people stealing the pictures, just ensures the copy taken isn't take good.

7
3
Reply
Silver badge

Draft in Word LibreOffice, publish in export as PDF.

8
0
Reply
Silver badge

"As PDF's are really intended to be printed, there is generally no reason to have pictures/images that are of a higher resolution than is necessary to print the page."

If all you have is a printer, that's fair enough. I view PDFs on Okular and that has a zoom control.

2
0
Reply
Silver badge

PDF is also great for presentations

I produce my PDF with LibreOffice. It is very efficient. A presentation produced by Impress (the LibreOffice counterpart of PowerPoint) decreases its file size 2-4 fold when exported to PDF with 90% quality which is good enough even for a big screen. Plus the slides look as intended on every computer.

The downside you do not have animation (with the exception of slide transitions). But most animations in presentations are a distraction anyway.

2
0
Reply
Silver badge

>I view PDFs on Okular and that has a zoom control.

This caused me to do a little research...

The image chosen to illustrate Okular's capabilities on Wikipedia made me smile. It nicely illustrates how narrow and limited many commenters experience is; you wouldn't use Word to write a musical score, however, PDF allows those without the relevant application to read your score.

Which reminds me of other uses of PDF and history!

We forget just how painful Word was before the rise of PDF; yes Word allowed you to do object embedding, only problem was anyone else wanting to read your Word document and view all those Visio diagrams, etc. that you had so carefully embedded, had to have the relevant applications installed on their system. Obviously, you could paste-as-picture/image but that made updating a pain. However, print as PDF and everyone could view your document as you intended - okay that might have made feedback harder, but generally many people simply printed off the document, annotated it by hand and handed it back.

Similarly for MS Project plans, want to ensure everyone can read the current plan, just print to PDF (using a sensible page size).

Interestingly, I've never received a document that used or a request for documents to be sent in, Microsoft's "PDF killer" XPS format. [Aside: I don't understand why MS haven't killed this off yet.]

Picking up on archival and OCR comments, one aspect of PDF not commented upon is it's ability to contain document layers, so for imaging and workflow applications, PDF was an ideal format, paper could be scanned to TIFF, OCR'd and the two files combined into a single PDF file. This meant that you could search on the OCR'd text and if it didn't read well (ie. it contained scan errors) you could view the original TIFF image.

2
2
Reply
Anonymous Coward

> I was recently asked how to "stop people stealing our pictures out of PDFs" (and also website images). My solution was "don't put them in there" because you can't beat an analogue hole

And I had this conversation with a friend. My solution was: use a free-culture licence so people are not "stealing" them.

For some unknown reason they actually did that and they realised that 1. the problem was not as big as they thought, it was just amplified by their anxiety and 2. their brand visibility in search rankings shot right up, perhaps as people felt more comfortable linking to them rather than lifting the content.

1
0
Reply
Anonymous Coward

> As PDF's are really intended to be printed, there is generally no reason to have pictures/images that are of a higher resolution than is necessary to print the page.

You don't know much about printing, do you?

1
0
Reply
Anonymous Coward

> I view PDFs on Okular

That's an automatic thumbs up.

1
0
Reply
Anonymous Coward

Re: PDF is also great for presentations

> I produce my PDF with LibreOffice. It is very efficient.

More importantly, it treats the output as a file not as a printer which obviously it isn't. This means that you can easily and conveniently control such things as the metadata, window title, table of contents, and of course, hyperlinks.

Few things make you look as incompetent as producing a PDF that goes like "and for this look on page 735¾" and not hyperlinking the stupid page reference (or not producing a table of contents, if appropriate).

1
0
Reply
Gold badge

"We forget just how painful Word was before the rise of PDF; yes Word allowed you to do object embedding, only problem was anyone else wanting to read your Word document and view all those Visio diagrams, etc. that you had so carefully embedded, had to have the relevant applications installed on their system. Obviously, you could paste-as-picture/image but that made updating a pain."

Interesting. The OLE rules actually said that embedded objects had to offer a rendering that did not require the relevant application and that containers had to save that rendering as part of the containing document, precisely to avoid the problem you've just described. I would say that it was almost impossible to actually program either the server or the container application without being aware of this, so it is interesting that you found yourself using a version of Office or Visio that had managed to screw this up.

(Update: It's probably also worth mentioning that the OLE libraries provide this capability for free. It requires conscious effort on the part of the programmer to avoid caching a graphic rendering. So that's someone going the extra mile to tick the box labelled "be annoying".)

0
0
Reply
Silver badge

The OLE rules actually said that embedded objects had to offer a rendering that did not require the relevant application

Agree, however, from memory, the 'rendering' often didn't look exactly like the source, hence it was easier and made for a smaller Word document, to do the paste-as-picture. Also given the performance of the systems back in the 90's and early 2000's and the ease with which you could overwhelm Word, it was often easier/quicker to update the original object in the source application and just paste the results into the Word document.

The other benefit of this approach was that reviewers/contributors had to tell you about corrections they wanted to make to such objects...

0
0
Reply
Silver badge

It nicely illustrates how narrow and limited many commenters experience is; you wouldn't use Word to write a musical score, however, PDF allows those without the relevant application to read your score.

1. Terrible thing X is useless for application A.

2. Sometimes-useful thing Y is useful for application A', which is related to but distinct from A.

3. Therefore people who do not believe Y is wonderful have limited experience.

I think your syllogism needs work. Or, preferably, nuking from orbit. Care to try again?

0
0
Reply
Silver badge

@Michael "I think your syllogism needs work. Or, preferably, nuking from orbit. Care to try again?"

I think you and others understood perfectly what I was saying and implying; PDF is a universal display format for 'printed' material; in some respects it is digital paper; if whatever you wish to communicate can be represented on a piece of paper then it can be held in a PDF file. Provided the reader has a PDF reader then they can access the digital paper and view what you wrote on it.

Whilst I agree many PDF files do have some limitations which make reading simple text less than satisfying on small screen smartphones and tablets, I suggest this limitation is more a limitation of peoples usage and the tools available. For example, I've not looked at it, but given a PDF can have multiple layers, there is no reason why a viewer couldn't pull the text layer and display that instead of the printed page.

I suspect also if someone really wants the benefits of ePub (or another dynamic display format) whilst not also losing the benefits of PDF then perhaps the way forward is to Standardise ePub, promote an ePub printer that can be plugged in and used just like PDF printers and get the PDF Standard updated to include an ePub layer!

0
0
Reply
Silver badge

PDF bloat

I am proof-reading a draft of a magazine for a small charity. Its Editor uses a DTP system that turns 40 A5 pages into a 8 MByte file. Acrobat 9 (vintage 2009) reduces that to 321Kbytes with no obvious loss of visual appearance.

7
2
Reply
Silver badge

Ahem

PDF/X, the X was "X-change" I believe.

As for versions - it's even worse than you made out:

* Acrobat 8 was PDF 1.7, aka ISO32000-1:2008

* Acrobat 9 was PDF 1.7 extension level 3 (there is no PDF 1.8)

* Acrobat X was PDF 1.7 extension level 8 (unpublished; extension level 5 was published, but as far as we know there was never an extension level 4, 6 or 7)

* Acrobat XI was... actually I never really figured that one out either. But we got EC sigs, which is nice.

Then you've got Acrobat DC 2015, Acrobat DC 2017, Acrobat DC 2018. Next year it will probably be Acrobat DC 1880 just to keep us guessing, or perhaps Acrobat DC 77πᵉ. With an entirely new user interface of course, with all the buttons rotating in a constant spiral around the center of the screen this time, because change!

Fortunately the file format means PDF is largely backwards compatible so you can largely forget about version numbers.

12
2
Reply

Re: Ahem

Everyone knows “X” sells!

8
0
Reply
Silver badge

Re: Ahem

I thought that was XXX

6
0
Reply
Silver badge

Re: Ahem

I have Acrobat X Pro. So far it reads all PDFs I've thrown at it. I am in the process of de-Adobifying my systems; I will not be getting a newer version of Acrobat. Ever. My PDF needs are simple. I must be able to create PDFs from scans, including from scans from automatic document feeders on assorted scanners, copiers, and multifunction devices. I must be able to combine assorted elements into PDFs, including previously scanned files in PDF, PNG, JPG, TIFF, or GIF format. (You'd be amazed how many pretenders to the Acrobat throne can't handle GIFs...) I must be able to have basic OCR, which generates DOC, DOCX, or RTF files which don't have too many errors. (I can point the file to a dedicated OCR app, usually ReadIRIS, if necessary. The PDF just has to have good enough resolution.) In particular I must be able to generate PDFs from assorted other elements into a single PDF which has sufficient resolution to do OCR if necessary or to just be usable as is, depending on what we want to do with the assorted stuff. (This, of course, means that any image files MUST be scanned in or otherwise generated at a high enough resolution to be useful; anyone who hands us 100 dpi images gets laughed at. This in turn usually means that we get the original document and scan it in ourselves, most people simply scan in at far too low a resolution or use silly formats or both. GIF, I'm looking at you. And BMP. And the idiot who still uses PICT; yo! moron! Apple hasn't used PICT in nearly 20 years! Sigh.)

3
1
Reply

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2018