Difference between revisions of "User Supplied Metadata"

From Creative Commons
Jump to: navigation, search
(Implementation)
(What needs to be done)
 
(6 intermediate revisions by the same user not shown)
Line 23: Line 23:
 
We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.
 
We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.
  
So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.
+
So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. The number 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.
 +
 
 +
== What works ==
 +
 
 +
Look in the branch <tt>add_tagging_form</tt> (at time of writing, this pointed to [http://gitorious.org/discovered/repo/commit/a2af4aea3270e4a663abc2eb89c310e1ab5148c8 a2af4aea3270e4a663abc2eb89c310e1ab5148c8]).
 +
 
 +
* We can add a tag to the RdfStore and retrieve it, using the bean api for both adding and retrieving. (Nothing crazy-special.)
 +
* The search results jsp has the add-a-tag form
 +
 
 +
== What needs to be done ==
 +
* Make <tt>org.creativecommons.learn.test.AddATag.testCheckThatResourceIsSearchableViaTag</tt> pass.
 +
* Write a test that the HTML form submits to a POST handler which adds a tag to the RdfStore. This code adds a tag: <tt>org.creativecommons.learn.Tag.add(taggerURI, resourceURI, tag);</tt>

Latest revision as of 04:26, 23 June 2010

Contact Contact::Raphael Krut-Landau
Project ,|project_name|Project Driver::project_name}}
Status Status::In Development


The story from the user's point of view

A moment a bit like this is fairly common. You've asked a search engine to tell you what it knows about a particular query. When the engine returns with its listing of results, you see a particular result that could be categorized more usefully. You want to tell the search engine, bring up this result when the user searches for such-and-such a word.

Requirements

In this feature, we allow you, as a user of DiscoverEd, to associate a tag with a search result that you see on your screen. Next to all search results there is a small link reading "Add a tag"; click this to open a brightly colored box where you can enter the tag. The box has a small "submit" link; click this and you immediately see the word alongside all the other tags that the engine associates with the result, if there were any.

Implementation

The brightly colored box mentioned above is an HTML form. The POST handler which accepts the user's submission creates (if necessary) a new Jena triple store whose URI represents the person who filled in the form. The handler then inserts a new RDFa triple into this triple store:

result_uri, dct:subject, tag

(Side note: The word "subject" above might confuse you a bit if you are into RDF. In RDF, "subject" usually means the subject of a triple (subject, predicate, object). In the Dublin Core terms (DCT), subject means a topic. We use it here to mean "is tagged with".)

We want to ensure that this new tag appears whenever anybody now or in the future chances upon the search result in question using this particular installation of the DiscoverEd search engine. Here's how the engine will do that. From time to time, a webmaster asks his copy of DiscoverEd to "crawl" — that is, to download copies of web pages from the internet and put their text, and other information about them, into the search engine's Lucene database. We want to make sure that the user-submitted tag is included among that information we store in Lucene.

So there'll be a bit of a code that runs whenever you ask DiscoverEd to perform a crawl. During the crawl, when we are inserting information about a particular URL into the Lucene database, this bit of code looks in all the Jena triple stores for any tags associated with that URL. It then inserts these tags into Lucene as well. In the parlance of Lucene, it adds a new column (or you could say, a new kind of field). The column is named something like 18__dct_subject. The number 18 signifies the user who submitted a tag via the brightly colored box mentioned above. It then adds a new field to the Lucene document associated with the URL we're crawling.

What works

Look in the branch add_tagging_form (at time of writing, this pointed to a2af4aea3270e4a663abc2eb89c310e1ab5148c8).

  • We can add a tag to the RdfStore and retrieve it, using the bean api for both adding and retrieving. (Nothing crazy-special.)
  • The search results jsp has the add-a-tag form

What needs to be done

  • Make org.creativecommons.learn.test.AddATag.testCheckThatResourceIsSearchableViaTag pass.
  • Write a test that the HTML form submits to a POST handler which adds a tag to the RdfStore. This code adds a tag: org.creativecommons.learn.Tag.add(taggerURI, resourceURI, tag);