Difference between revisions of "User Supplied Metadata"

From Creative Commons
Jump to: navigation, search
(Requirements)
(Implementation)
Line 15: Line 15:
 
== Implementation ==
 
== Implementation ==
  
The brightly colored box mentioned above is an HTML form. The POST handler which accepts the user's submission writes a new RDFa triple to the Jena quad store, consisting of four strings:
+
The brightly colored box mentioned above is an HTML form. The POST handler which accepts the user's submission creates (if necessary) a new Jena triple store whose URI represents the person who filled in the form. The handler then inserts a new RDFa triple into this triple store:
  
  submitter_uri, result_uri, dct:subject, tag
+
  result_uri, dct:subject, tag
  
Note that the word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a ''topic''. We use it to pick out the concept, 'is tagged with'.
+
(Side note: The word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a ''topic''. We use it to pick out the concept, 'is tagged with'.)
  
 
We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.
 
We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.
  
So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. This code looks in the Jena quad store for any tags stored there. It then adds these tags to Lucene. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above.
+
So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.

Revision as of 02:58, 23 June 2010

Contact Contact::Raphael Krut-Landau
Project ,|project_name|Project Driver::project_name}}
Status Status::In Development


The story from the user's point of view

A moment a bit like this is fairly common. You've asked a search engine to tell you what it knows about a particular query; "sustainability water" for instance. When the engine returns with its listing of results, you see a particular result that could be categorized more effectively. You want to teach the engine a new fact about one of those ecological pages.

Requirements

In this feature, we allow you, as a user of DiscoverEd, to associate a tag with a search result that you see on your screen. Next to all search results there is a small link reading "Add a tag"; click this to open a brightly colored box where you can enter the tag. The box has a small "submit" link; click this and you immediately see the word alongside all the other tags that the engine associates with the result, if there were any.

Implementation

The brightly colored box mentioned above is an HTML form. The POST handler which accepts the user's submission creates (if necessary) a new Jena triple store whose URI represents the person who filled in the form. The handler then inserts a new RDFa triple into this triple store:

result_uri, dct:subject, tag

(Side note: The word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a topic. We use it to pick out the concept, 'is tagged with'.)

We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.

So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.