back to article Super-fast RDF search engine developed

The next generation of the internet is a step closer thanks to a major breakthrough in "semantic web" research in Ireland. The semantic web, or data web, is a machine readable version of the internet that makes it more efficient to conduct searches, using RDF (Resource Description Framework) statements, which are used to …


This topic is closed for new posts.

Semantics, meta-data... Who is going to enter it?

This reminds me of the new windows search for Vista : You can find any data via a keyword search - but only if you have tagged your files with the appropriate keywords (ok, I'm paraphrasing).

If the search is fully automatic, then it is indexing plain data to metadata, and this does not seem fundamentally different from any other search index.

If the system relies on metadata, then someone either has to enter the metadata by hand, or by... an indexing system that translates data to metadata.

The advantage of this is that the indexing work is carried out on the site side and not on the search engine side.

The inconvenient is that the metadata "categories" are generated by the criteria that are site dependant, and not search engine indpendant, and this means that the search engine would probably have to reindex the metadata to fit in it's own categories.

The other point is that this would seem a good candidate for search index poisoning: The medadata from whatever feed linking to a page could be extracts of a medical or technical journal, but the page referenced by this feed would be of questionable security (porn, botnet installer) etc... What you index may not be what you really get, unless your are reading their site "digest" and not the associated complete page...




The metadata issue is nearly ancient...

Here's an example of a "modern" usage of a metadata indexed, semantic database:

Most medical packages. All patients, treatments, diagnosis, drugs, insurance providers, illnesses, etc, are stored in both native-language and machine parseable codes. Having deployed several of these buggers, I can personally attest that the time spent setting up the machine codes *far* exceeds the time needed to input the native language textual descriptions.

In addition to being tedious to remember, the codes are subject to many errors, because of their abstract nature and the unfamiliarity of most office staff with entering them.

Ie-- the secretary who's been at the office since 1972 can remember

"John Smith: Broken leg; set in cast, prescribed painkillers, scheduled for one-month followup" much easier than she can remember "Patient 4829: Diagnosis 203871: Treatments: 8194, 16872."

So, it creates additional programming overhead for the devs writing the program, putting in all kinds of error checking and breadcrumb/autocomplete setups for allowing people to have help and feedback as they enter the numeric codes.

However, the numeric codes are what give medical databasing programs the ability to easily talk to one another, and are thus important. That is to say, the benefits of semantic indexing within the program outweigh the administrative burden of including such data.

For medical systems. For programs that have the possibility of killing people if, say, a penicillin allergy isn't brought to the attention of an ER doctor.

If the process entering a bunch of semantic data for non-essential, hobby and entertainment applications will take hold in any more than a few fields where it's shown to be enormously useful remains to be seen. Generally, one can assume that your average blogger or gamer, fifteen year old or eighty year old, will not be interested, nor have the desire to spend the time to do it.

I think that the semantic web will largely be limited to industry, science, and education. Which will be great for people like us, being able to say "I've got board 18503 that has problem 28974. Solution? 28405-832." But, for the average Myspacer, the knowledge of semantic indexing will be a healthy "0." Just like it is for every other tidbit of the inner workings of the Web.



Why do you need to use the numbers when you can just have a database table with for example

34734923947 | Broken leg

and then they can type in 'broken leg' and the computer will substitute the numbers in later (possibly verifying with the user first in case they made a spelling mistake or whatever). Typing in arcane numbers like that is just insane...?

Anonymous Coward

please read up on the topic before commenting

Please read some background information about RDF and semantic web before posting a comment

This topic is closed for new posts.