1 post • joined Saturday 5th July 2008 09:36 GMT
OK for reading, no good for analysing
PDF has one major problem - no document structural markup, ie anything indicating what's a heading, where a paragraph starts and ends etc.
Fine for viewing on screen and printing, but if you want a machine to read it, analyse it, extract information for search & discovery, then it's awful. Even worse if you want to extract tables or other structured information.
HTML's actually a much better format for this.
Unfortunately a lot of documents are getting published (and archived) in PDF only.
- Facebook offshores HUGE WAD OF CASH to Caymans - via Ireland
- Review Best budget Android smartphone there is? Must be the Moto G
- NSFW Confessions of a porn site boss: How the net porn industry flopped
- World's OLDEST human DNA found in leg bone – but that's not the only boning going on...
- OHM MY GOD! Move over graphene, here comes '100% PERFECT' stanene