Field Query Mapping
Contact | Contact::Asheesh Laroia |
---|---|
Project | ,|project_name|Project Driver::project_name}} |
Status | Status::In Development |
The people who run a DiscoverEd may wish to let users search specific metadata easily. For example, http://discovered.labs.creativecommons.org/ lets users search for works "tagged" with "banana" by searching for tag:banana. (In particular, the predicate for "tag" is the term "subject" as specified by the Dublin Core.)
These prefixes, like tag:, are stored in the DiscoverEd code right now. This feature aims to move those into a configuration file.
This feature was defined and developed during the June 2010 DiscoverEd Sprint
Contents
[hide]Requirements
When DiscoverEd crawls feeds and resources and saves metadata such as the page title, it converts this information into RDF triples; those triples are eventually saved on disk in a triple store, namely Jena.
We will create a new configuration file that stores a list of mappings from predicate URIs. For example, we might list "method:" as a shorthand for the RDF predicate <http://purl.org/dc/terms/instructionalMethod>, a.k.a. "dct:instructionalMethod". At indexing time, a Lucene column called "method" will be created in the Lucene documents corresponding to each resource that has the dct:instructionalMethod predicate set in the Jena store.
Then, at search time, Nutch's built-in query parser handles the query, e.g., "method:yaddayadda".
How to use
Note: There is an implementation of this in the current version of DiscoverEd (as of 2010-08-30), but it ignores the excludecurator argument.
Let's say you want to allow users to perform this query:
method:"Experiential learning"
and retrieve all web pages in your index that have a metadatum with predicate <http://purl.org/dc/terms/instructionalMethod> and value "Experiential learning".
To do so, first edit conf/nutch-site.xml
. Add this XML inside the <configuration> block.
<property> <name>query.basic.method.boost</name> <value>1.0</value> </property>
This block of XML tells Nutch to accept the "method:" prefix in search queries. The value of this property indicates the weight the search engine should assign to this term.
Next, edit conf/discovered-search-prefixes.xml
. Add this XML inside the <configuration> block.
<property> <name>http://purl.org/dc/terms/instructionalMethod</name> <value>method</value> </property>
This block of XML tells DiscoverEd to copy data out of the Jena store and paste it into a format where Nutch's basic query parser can find it.
Implementation
- Added a sample configuration file
- Added code to our IndexFilter that looks for relevant triples and stores them in the Lucene document
- We had a problem with these columns not appearing in Lucene, but we fixed the underlying bug that caused that.
Next steps
- Rewriting this to be compatible with excludecurator.