Difference between revisions of "User Supplied Metadata"
Dithyramble (talk | contribs) (beginning of a write up of a spec) |
(No difference)
|
Revision as of 02:53, 23 June 2010
Contact | Contact::Raphael Krut-Landau |
---|---|
Project | ,|project_name|Project Driver::project_name}} |
Status | Status::In Development |
The story from the user's point of view
A moment a bit like this is fairly common. You've asked a search engine to tell you what it knows about a particular query; "sustainability water" for instance. When the engine returns with its listing of results, you see a particular result that could be categorized more effectively. You want to teach the engine a new fact about one of those ecological pages.
Requirements
In this feature, we allow you, as a user of DiscoverEd, to associate a new word with a search result that you see on your screen. Next to all search results there is a small link reading "Add a tag"; click this to open a brightly colored rectangular box where you can enter a new word. The box has a small "submit" link; click this and you immediately see the word alongside all the other tags that the engine associates with the result, if there were any.
Implementation
The brightly colored box mentioned above is an HTML form. The POST handler which accepts the user's submission writes a new RDFa triple to the Jena quad store, consisting of four strings:
submitter_uri, result_uri, dct:subject, tag
Note that the word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a topic. We use it to pick out the concept, 'is tagged with'.
We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.
So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. This code looks in the Jena quad store for any tags stored there. It then adds these tags to Lucene. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above.