Reply to post: Re: Hardly surprising.

World's favourite open-source PDF interpreter needs patching (again)

Suricou Raven

Re: Hardly surprising.

PDF incorporates postscript, but PDF is not postscript.

PDF is, at core, a container format. It contains objects. Some of these objects are postscript documents (Or rather, a simplified subset of postscript) which define the drawing of pages. Other objects are resources that this postscript can call upon, such as fonts or images. There are also objects involved in various extensions such as edit protection, version management or metadata.

Prior to PDF, postscript was rather cumbersome to use: It depended upon your document renderer (be it software, or firmware in your printer) having all of the required fonts installed, and being able to render images. This sort of worked, but if might mean that your postscript printed slightly differently on different printers because they used different fonts. PDF was intended to solve this by taking that postscript and bundling it into a file with all the fonts required, and limiting which features could be used to ensure compatibility. It actually worked very well, which is why it became so popular. It also added navigation data and dependency rules that make it possible to view any page at random - you don't run into problems rendering page 78 because it refers back to a template from page 60 which incorporates a logo from page 112, another curse of postscript.

Inside of PDF though, it's a horror. A horror which people are shielded from by libraries: Programmers need only call upon libpdf to extract those resources as needed, and need not look into the abyss.

How bad is it? Consider this: PDF supports single-byte ASCII encoding, and two-byte UTF-16 encoding, for strings. These may be freely mixed within one string, using escape codes to switch between them. This weird method is used because PDF predates the introduction of UTF-8. It means that you cannot use any of the common string-handling functions when working on raw PDF objects.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon