Difference between revisions of "User Supplied Metadata"

From Creative Commons
Jump to: navigation, search
(beginning of a write up of a spec)
(No difference)

Revision as of 02:53, 23 June 2010

Contact Contact::Raphael Krut-Landau
Project ,|project_name|Project Driver::project_name}}
Status Status::In Development


The story from the user's point of view

A moment a bit like this is fairly common. You've asked a search engine to tell you what it knows about a particular query; "sustainability water" for instance. When the engine returns with its listing of results, you see a particular result that could be categorized more effectively. You want to teach the engine a new fact about one of those ecological pages.

Requirements

In this feature, we allow you, as a user of DiscoverEd, to associate a new word with a search result that you see on your screen. Next to all search results there is a small link reading "Add a tag"; click this to open a brightly colored rectangular box where you can enter a new word. The box has a small "submit" link; click this and you immediately see the word alongside all the other tags that the engine associates with the result, if there were any.

Implementation

The brightly colored box mentioned above is an HTML form. The POST handler which accepts the user's submission writes a new RDFa triple to the Jena quad store, consisting of four strings:

submitter_uri, result_uri, dct:subject, tag

Note that the word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a topic. We use it to pick out the concept, 'is tagged with'.

We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.

So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. This code looks in the Jena quad store for any tags stored there. It then adds these tags to Lucene. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above.