"What we need is an information shaman,"
No, actually we don't, we really don't.
Southampton is pushing to be the go-to place for expertise on linked data in the UK, and researchers at its main university launched a site earlier this month containing no less than 21 "non-confidential" datasets that underline that semantic web desire. The University of Southampton (UoS) is one of the first academic …
No, actually we don't, we really don't.
""What we need is an information shaman," .... No, actually we don't, we really don't." ... enigmatix Posted Tuesday 22nd March 2011 12:13 GMT
Err..... Yes, actually you do, you really do. And ideally providing virgin core source lode for intelligent conversion into semantic product placements for sublime use in future programs and live linked future data projects.
Are Southampton into Future Reality Sets via Linked Novel MetaDataBase Stations/Core Virgin Source Mines which could even be just Open Minds in Networks InterNetworking JOINT Applications ...... Joint Operations Investigating Network Transparency as a FailSafe Security Protocols rendering Secrecy redundant and unnecessary as a Future Control Lever.
Drilling down to the actual data I find information on bus routes, vending machines and a short list of other stuff already available on the Southampton web site, hardly worth a fawning two page article on El Reg. Come on guys where is your customary cynicism?
"hardly worth a fawning two page article on El Reg"
I count 3 pages.
And if you wanted to cross-reference these bus timetables against, say, course timetables, that would be easy already, right? Because the information is already on the web? Come on, it doesn't take a genius to work out how this stuff is going to change the web completely.
" "PDF is an embarrassment to our species," Gutteridge says of Adobe Software's once proprietary but now open standard for document exchange." *
For that I can almost forgive him for say "less than 1000" instead of "fewer".
I wonder if he knows that data 'R' plural?
I'm glad to see that TB-L does. Must have had a proper education.
* I've cursed PDF so often and so vehemently you'd think it was a Microsoft product.
" For that I can almost forgive him for say "less than 1000" instead of "fewer". "
There is almost certainly some kind of fundamental law governing the increased possibility of buggering up one's grammar while bashing someone else with it.
(also known as Hartman's Law of Prescriptivist Retaliation): "any article or statement about correct grammar, punctuation, or spelling is bound to contain at least one eror".
I hop that was deliberatte!
Huh? His, maybe, but I'd have thought that most people's displays were 16:9.
I believe PDF was designed to replace Postscript (which in the early days was a pain to use) in an HTML 2.0-like world where styling didn't exist and the mainstream software could display the same content radically differently (and, plus, remember frames? how do you print a framed site?).
For its original purpose, PDF has done well. It has become icky and bloated, but to insult a technology because nobody has devised something both better and universal shows a certain degree of stupidity.
But then we are talking academics getting excited over datasets. While I can understand the value of this, Joe Average just wants something to look at. Like, for example, Nuclear (poop) Boy instead of a string (<cough> array) of readings in what would appear to be an obscure format.
Completely agree, he also seems to miss the fact that many people use PDF's as a way to send an electronic document in a fixed form, eg a quote, invoice, contract etc, so you can be reasonably sure that it hasn't been altered (yes I know there are ways to do it, but most users wouldn't know them). In terms of portrait / landscape I can kind of see where he's coming from, however I think he's missing how people actually read. A column of text is far easier to read and scan through, than a wide long line of text, that's why after all many documents in A4 portrait have two columns.
Mixed 16:10 and 4:3 displays here, definitely no 99:70 ones to be seen...
"PDF is a brilliant way to simulate A4 or portrait views."
And if you change your page orientation to landscape before saving to PDF, it's a pretty good way to simulate those too... Blaming PDF for the dearth of landscape-formatted documentation is, even by the usual standards of academics infected by outoftouchwiththerealworlditis, really rather dumb.
There's a place for both styles of formatting - viewing some types of data works really well when it's spread laterally across a widescreen display, other data is unmanageable unless constrained into a narrower band. e.g. I see no technical reason why novels couldn't be printed in landscape format, but from a useability point of view it'd be hideous. Just because we can reformat data to fill the available space doesn't mean we necessarily should...
Not just sure that it hasn't been altered, but also confident that all the people who read it have seen the same thing. If you are sending an invoice in Euros, you don't want to find out that one reader had his viewer configured to automatically convert that to Yen at the current exchange rate, because your quote is for Euros. I know that sort of configuration would count as a bug, but it's a lot less likely when you send a PDF. For some things, what you actually said is more important than what you meant, and fewer filters/conversions is better.
"there are ways to do it, but most users wouldn't know them"
Hands up who has, when they've been faxed a document to sign, pasted in their pre-scanned signature and sent it back through the fax server?
People may be "reasonably sure that it hasn't been altered", but that belief is based on ignorance.
I can change the price in it anyway. You cant prove that what I got is not what you sent unless you methods that are above and beyond PDF - so using PDF is just way of sending money to adobe, it helps neither of us.
As someone said PDF the new fax - only secure and reliable for the gullible. PDF turns a 21C computer into 19th paper. A bit like using your ferarri to open you stable doors to go for a ride on your horse. What the man is suggesting is perhaps taking the combustion engine out of the ferarri and using it to power a matter transporter for your data. All PDF is is a larger and larger filing cabinet to stick your data in before tying it to your horses arse.
All this hatred for PDFs based upon odd uses of it.
Okay. On my mobile phone, I have a bunch of component/processor datasheets and other tech docs. They're all PDFs. They load up in the reader app, and they look exactly like they do on the PC.
Would those slagging off PDF care to come up with a multi-system compatible, easy (hence single-file) version of representing this sort of information? Should we do back to pure text files with lack of styles and formatting and laughably bad ASCII art circuit diagrams?
There are some things PDF is useful for...
Joe average probably wants his work calendar in a an electronic format that can actually be processed by machines. Yeah a print is hand and not much wrong on the pdf front there, want to actually use some of that data however...
This isn't just an academic exercise. There are some real world uses starting to appear.
One example is the BBC who are using linked data in real live situations (e.g. 2010 World Cup site) see this blog post http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dynamic_sem.html and are starting to move more and more to a dynamic publishing framework that is built on top of a "news" ontology.
In this way "linked news" can be based on the ontology, rather than any predetermined relationships that the content authors may define.
Eben Moglen suggests people should be able to maintain their info in an encrypted container hosted by a "freedom box". That could mean a physical device or it could mean a piece of software running on your own PC or running on a trusted host of your choice. You could grant / deny access to the box on a granular level and your data would be distributed (encrypted of course) via P2P to make it easier to find.
It's not a bad idea at all but not one without issues. Biggest issue is that this is Eben Moglen and the FSF proposing it. This means the concept is practically DOA because anything the FSF touches gets bogged down in polemics. It needs somebody, preferably a startup to embrace the idea, monetize it and the pragmatism to see it to implementation. Probably the closest thing at the moment to a freedom box would be the Diaspora (also Moglen inspired) but whether it can compete against Facebook is a massive question.
Information can be tagged. Those same tags can be used to enforce security. All you're doing is extending the model to include protective markings and clearance levels. For example:
Object = securable item.
Subject = something that consumes (view/print/edit) objects.
An object is tagged* with a protective marking, such as "No marking", "Private" or "Secret."
A subject is tagged* with a clearance level, such as "No clearance", "Friend" or "Family."
* Tagging of objects and subejcts is done by the owner of the object.
Lastly a policy is created that limits access to any object marked as "secret" to subjects cleared to "Family."
The policy travels with the object in a single file. The object is encrypted, and the policy is readable. A central or federated server enforces policy. Much like this app does: http://wwww.wittenburg.co.uk/interact/
We should own our own data.
We should be able to control who accesses or links to our data.
A new type of device is needed. A secure section of our home router or modem could contain a publicly accessible store.
Never fill in a form with your various personal data again, allow a site to access whichever limited data from your data-set. Change your personal information once only, it is always current everywhere. Stop permission for whichever sites you want when you want.
We might turn off our PCs, laptops, phones, pads from time to time, but our routers generally stay on, except for power outages or maintenance.
I have been pushing this barrow for many years now.
People or corporations can access our data when we want them to, or if there is a perceived value for how much we sell it to them.
You have a log of who access what data when.
It could end up being our storage in our part of our cloud. We own it, we control it. It might contain all of our personal data collected and created through our lifetime. Expand it as required, backup as required.
Facebook just got made open... so did everything else
Seeing as you can run your own OpenID provider behind your router, it shouldn't take much to move it onto the router itself.
This sounds like the path to hell if you ask me.
I have seen real world semantic web apps - there is a company in Switzerland that has stuck manyears into developing the very idea. The key to handling this sort of data is deciding first what information you actually need - without a use case to define the shape of the specific needles you seek you'll be simply stuck looking at hay.
BTW, re PDF - there is another reason why portrait persists: our own physical limitations. Our eyes have a limited width from which we read. If you had a text landscape you'd tire pretty soon when reading..
I agree - trying to read a Kindle landscape feels really wrong - put it back to portrait and all is well.
"But our screens are all A4 landscape yet there is this stupid insistence that the portrait way is still developed. It's a legacy thing and we haven't got around to getting rid of it yet"
While his tape measure is slightly off - I do agree with the "stupid insistence that the portrait way is still being developed" - I say this looking at the white sliver down the centre of the screen that is the reg website!
...the ones with 2 columns of text!
Who thinks that is a good idea? I'm reading it on a screen too small to display a full A4 page at a readable size, so halfway through each page I get to scroll back up again and start again. PDFs designed to be read onscreen (and which are created with linked chapters and text rather than images of text) are wonderful in comparison.
Why not read it in a CSV form, which this chap suggests is a replacement for PDF...
He's citing an example where CSV is used to exchange data between systems. The idea being, once you have the CSV you can format it any way you please when you display it. You can't do this with PDF as they're pre-formatted when the document is written.
PDF is horrible because it is, at its simplest, a vector-based picture of a printed page, from which it's almost impossible to extract useful information. So Adobe added a pile of extensions to pdf (and also, even more stupidly, to ttf fonts) to let you try to reverse engineer the original information out of the print representation. This is unnecessarily complex and still doesn't work reliably for unicode text using complex scripts.
RDF (specifically RDF/XML) is horrible (and consequently not widely used) because it is almost sadistically complicated. As the article says, the information model it encodes is simply triples of URIs, yet it provides myriad ways to describe this described by a not-quite-finished specification. (I've had the misfortune of writing an RDF/XML parser so I've been bitten by all the corner cases that were brushed under the carpet without resolution in the rush to get the spec. published.)
"The skills of taking a data system and understanding how to map it into RDF so that it can be useful is bloody hard. It requires someone who can see the data, understand the structure, understand how it will be used and then map between two spaces in their head."
Document what you want to do in Executable English, then _run_ it. Other people will be able to read what you did, and also get English explanations of the results of running it.
Here's an example that you can view, run and change, using a browser:
I've heard a whisper that the Wellcome Foundation now requires it's grant funded researchers to publish their raw data on the internet. Not that it'd be of much use to the general public, but it's the same kind of thinking. I'm concerned over data integrity in the cut throat world of multi-million pound research grants where reputation is 95% of everything.
To be fair most things suck in some ways, but I've spent the last 10 years working to give people open access to research papers, and I hate the fact that PDF is the defacto standard. We should be able to read them easily on phones, ipads and kindles and we can't. But there's top men working on it http://blogs.ch.cam.ac.uk/pmr/2011/03/11/scholarly-html-theme-and-presentations-today/
Not everyone can code; and you need a coder, or other unusual skills to get the value from open data, but once the app (or whatever) is produced, we can all benefit. An information shamen is one of those rare people who can (a) understand complex diverse data sources and (b) is willing to build tools to help the other 99.9% of us benefit from them.
Some of our students produced this tool from the open data we made available:
(requires a recent browser!)
with evince. It's nice having an N900.
On the main content - so you finally bought into the semantic web idea then?
I remember that was the 'next big thing' back when I was a lowly student and you were moaning about how half the new intake didn't know how to use ftp from the command line any more...
Plus ca change etc.
Genuinely surprised to see your face peering out at me from the front page this morning though, nearly spat-up coffee all over the keyboard!
"PDF is an embarrassment to our species"
A completely blinkered view which really doesnt deserve arguing the case against....
...but briefly, speaking as a designer, there are loads of reasons why I love pdf as well as things that annoy me -- the same with design for other media (web/print etc).
That quote must have been said with no thought whatsoever about the words which were coming out of his mouth
nah, just in the context of scholarly communication and of communicating data. Getting data as a table in PDF really really sucks.
Using PDF in the current era of many sizes of viewport is just plain daft.
I think what most of the posters are getting at is that your comment was a tad to general and sweeping about PDFs. They have a use and also being from a designer background a bloody good use.
But I can see why when you recieve a PDF it is generally uselss and a pain in the proverbeal to you.
Plus I have also made some good money out of organisations designing and building PDFs so don't dis an element of my trade too quickly, same goes for your semantic data. Don't be surprised if it's not received to warmly by people who can see their livelihoods slipping away from a new standard being employed. I would definately stick to your comments about letting the politicians hammer it out but I would be very good friends with a Marketing bod if you want it to lift off.
Er, well *Page Description Language* kind of says it all. PDF is not a data extraction format. If it doesn't fit your page, then get another PDF, generated from the actual data, using the application. Maybe numpty was interviewed in the pub. I see scrollbars on my ElReg screen. Does that mean that HTML doesn't work?
"Ultimately, we provide the tools. Let the politicians do the arguments.".... Christopher Gutteridge
The wiser semantic web developer will completely avoid the politician, realising that they have no valid lead input to offer themselves into future linked data programming. They are useful tools though for leads which linked data sets provide, so they are not completely useless. In fact, there is probably good enough evidence to suggest that they are easily groomed to be quite convenient servants.
Sir, I have been accused of many things, being easy to groom is generally not one of them.
No one has seen the chat logs that I'm keeping for later.
Quite so, it is indeed an unusual semantic knack to perform better than just well, and can be difficult to probably impossible without the competent exercise of a particular and peculiar knowledge and provision of sticky sweet bait.
Information, lacking agency - and contrary to the popular (and annoyingly resistant to logic) meme - doesn't want to be free.
Some of the data in those silos is in silos because it has value to the people who collected it, and they certainly don't want it to be free.
So I would hope that there is some parallel group working on implementing a complementary micro-transaction framework so that on the day when the big switch is thrown on the brave new semantic web those of us who believe in swapping money for things of value are able to play.
Otherwise simply wishing for all the information to be free is like asking santa for a magic kitten that shits fairy dust.
Still, good luck to them. I can't wait to have another standard to choose from.
BTW, if anyone is actually in possession of a magic kitten that shits fairy dust and is willing to swap it for one that vomits what appear to be the remains of dead snails, do get in touch.
Best I can do is one that vomits undigested cat biscuits and leaves the liver of eaten creatures on your kitchen floor. With enough prodding can also perform a 'double-tap' on your smallest childs head.
Idealism yes but I'm not sure TimBL is the poster boy for "youth" these days :-)
Why I say your comment is poor is this, what semantic web switch are you talking about?
Your facebook data is not about to be magically represented in rdf made open and linked.
The Semantic Web is an effort to standardise the publishing of interlinked data in the same vein that the original web standardised the publishing of interlinked documents. You didn't have to publish web pages back then and no one forces people to publish linked data now.
Now the open access ideal for science publication/data is related but different, if your experiment/science isn't repeatable and observable it's not really following the scientific method anyway, so the idea that research data and articles should be locked away and unavailable (or behind some pay wall) is not really taking science forward. Allowing research to be open, linked and widely accessible seems likely to change the quality of research science.
As for other data sets like government and organisational (the OS stuff that was opened) It is simply OUR data in the first place, why lock it up? The one time Labour/Tory/Liberals have agreed in any meaningful way about tech is when they listen to Sir Tim BL about data.
Search Google for:
...4,350 results found
"a detail-obsessed librarian who's middle name is pedant"
Whose name is pedant again? I must have made a mistake here too, for ironic value. Unavoidable, apparently.
PDF is merely Postscript with a nice wrapper around it. The original intent was to provide a document format that people could create a print-ready document on one platform (Say, a mac with Pagemaker or some other desktop publishing program installed) and open it on, say, a PC or unix workstation for later viewing/printing. (although the latter still requires a bit of mucking about with the workstation's print drivers to make it come out right.)
It's somewhat handy for technical manuals, print-ready artwork for proofing, and when Adobe strapped on the forms capabilities for things like tax forms. It's not all that useful for ebooks, especially graphics heavy ebooks, at least on reader devices.
My own personal hate is the $&*#^! idiots who make a fill-able form, but disallow printing or saving a copy of the form data. WHAT IS THE POINT, PEOPLE.
> PDF is merely Postscript with a nice wrapper around it.
If only it were. Sadly it isn't quite: lack of a full programming capability being one of the bigger omissions.
> My own personal hate is the $&*#^! idiots who make a fill-able form, but disallow printing or saving a copy of the form data. WHAT IS THE POINT, PEOPLE.
This is usually because the person creating the form hasn't realised they bought the wrong version of Acrobat. So the form works for them and they never bother to test on a standard Reader.
/Mines the one with the blue cookbook in the pocket