| Contact | Asheesh Laroia |
|---|---|
| Project | AgShare |
| Status | In Development |
The people who run a DiscoverEd may wish to let users search specific metadata easily. For example, http://discovered.labs.creativecommons.org/ lets users search for works "tagged" with "banana" by searching for tag:banana. (In particular, the predicate for "tag" is the term "subject" as specified by the Dublin Core.)
These prefixes, like tag:, are stored in the DiscoverEd code right now. This feature aims to move those into a configuration file.
This feature was defined and developed during the June 2010 DiscoverEd Sprint
Contents |
Requirements
When DiscoverEd crawls feeds and resources and saves metadata such as the page title, it converts this information into RDF triples; those triples are eventually saved on disk in a triple store, namely Jena.
We will create a new configuration file that stores a list of mappings from predicate URIs. For example, we might list "method:" as a shorthand for the RDF predicate <http://purl.org/dc/terms/instructionalMethod>, a.k.a. "dct:instructionalMethod". At indexing time, a Lucene column called "method" will be created in the Lucene documents corresponding to each resource that has the dct:instructionalMethod predicate set in the Jena store.
Then, at search time, Nutch's built-in query parser handles the query, e.g., "method:yaddayadda".
How to use
Note: There is an implementation of this in the current version of DiscoverEd (as of 2010-08-30), but it ignores the excludecurator argument.
Let's say you want to allow users to perform this query:
method:"Experiential learning"
and retrieve all web pages in your index that have a metadatum with predicate <http://purl.org/dc/terms/instructionalMethod> and value "Experiential learning".
To do so, first edit conf/nutch-site.xml. Add this XML inside the <configuration> block.
<property>
<name>query.basic.method.boost</name>
<value>1.0</value>
</property>
This block of XML tells Nutch to accept the "method:" prefix in search queries. The value of this property indicates the weight the search engine should assign to this term.
Next, edit conf/discovered-search-prefixes.xml. Add this XML inside the <configuration> block.
<property>
<name>http://purl.org/dc/terms/instructionalMethod</name>
<value>method</value>
</property>
This block of XML tells DiscoverEd to copy data out of the Jena store and paste it into a format where Nutch's basic query parser can find it.
Implementation
- Added a sample configuration file
- Added code to our IndexFilter that looks for relevant triples and stores them in the Lucene document
- We had a problem with these columns not appearing in Lucene, but we fixed the underlying bug that caused that.
Next steps
- Rewriting this to be compatible with excludecurator.
- About CC Wiki
- This page was last modified on 30 August 2010, at 19:24.