Difference between revisions of "User Supplied Metadata"

From Creative Commons
Jump to: navigation, search
(The story from the user's point of view)
(Implementation)
Line 19: Line 19:
 
  result_uri, dct:subject, tag
 
  result_uri, dct:subject, tag
  
(Side note: The word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a ''topic''. We use it to pick out the concept, 'is tagged with'.)
+
(Side note: The word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a ''topic''. We use it here to mean "is tagged with".)
  
 
We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.
 
We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.
  
 
So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.
 
So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.

Revision as of 03:32, 23 June 2010

Contact Contact::Raphael Krut-Landau
Project ,|project_name|Project Driver::project_name}}
Status Status::In Development


The story from the user's point of view

A moment a bit like this is fairly common. You've asked a search engine to tell you what it knows about a particular query. When the engine returns with its listing of results, you see a particular result that could be categorized more usefully. You want to tell the search engine, bring up this result when the user searches for such-and-such a word.

Requirements

In this feature, we allow you, as a user of DiscoverEd, to associate a tag with a search result that you see on your screen. Next to all search results there is a small link reading "Add a tag"; click this to open a brightly colored box where you can enter the tag. The box has a small "submit" link; click this and you immediately see the word alongside all the other tags that the engine associates with the result, if there were any.

Implementation

The brightly colored box mentioned above is an HTML form. The POST handler which accepts the user's submission creates (if necessary) a new Jena triple store whose URI represents the person who filled in the form. The handler then inserts a new RDFa triple into this triple store:

result_uri, dct:subject, tag

(Side note: The word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a topic. We use it here to mean "is tagged with".)

We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.

So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.