Difference between revisions of "Field Query Mapping"
Dithyramble (talk | contribs) (→How to use) |
Paulproteus (talk | contribs) |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 4: | Line 4: | ||
|status=In Development | |status=In Development | ||
}} | }} | ||
− | The people who run a DiscoverEd may wish to let users search specific metadata easily. For example, http://discovered.creativecommons.org | + | The people who run a DiscoverEd may wish to let users search specific metadata easily. For example, http://discovered.labs.creativecommons.org/ lets users search for works "tagged" with "banana" by searching for tag:banana. (In particular, the predicate for "tag" is the term "subject" as specified by the Dublin Core.) |
These prefixes, like tag:, are stored in the DiscoverEd code right now. This feature aims to move those into a configuration file. | These prefixes, like tag:, are stored in the DiscoverEd code right now. This feature aims to move those into a configuration file. | ||
Line 19: | Line 19: | ||
== How to use == | == How to use == | ||
+ | |||
+ | '''Note''': There is an implementation of this in the current version of DiscoverEd (as of 2010-08-30), but it ignores the ''excludecurator'' argument. | ||
Let's say you want to allow users to perform this query: | Let's say you want to allow users to perform this query: | ||
− | <blockquote> | + | <blockquote><pre> |
method:"Experiential learning" | method:"Experiential learning" | ||
− | </blockquote> | + | </pre></blockquote> |
and retrieve all web pages in your index that have a metadatum with predicate <http://purl.org/dc/terms/instructionalMethod> and value "Experiential learning". | and retrieve all web pages in your index that have a metadatum with predicate <http://purl.org/dc/terms/instructionalMethod> and value "Experiential learning". | ||
Line 30: | Line 32: | ||
To do so, first edit <code>conf/nutch-site.xml</code>. Add this XML inside the <configuration> block. | To do so, first edit <code>conf/nutch-site.xml</code>. Add this XML inside the <configuration> block. | ||
− | <blockquote> | + | <blockquote><pre> |
<property> | <property> | ||
<name>query.basic.method.boost</name> | <name>query.basic.method.boost</name> | ||
<value>1.0</value> | <value>1.0</value> | ||
</property> | </property> | ||
− | </blockquote> | + | </pre></blockquote> |
This block of XML tells Nutch to accept the "method:" prefix in search queries. The value of this property indicates the weight the search engine should assign to this term. | This block of XML tells Nutch to accept the "method:" prefix in search queries. The value of this property indicates the weight the search engine should assign to this term. | ||
Line 41: | Line 43: | ||
Next, edit <code>conf/discovered-search-prefixes.xml</code>. Add this XML inside the <configuration> block. | Next, edit <code>conf/discovered-search-prefixes.xml</code>. Add this XML inside the <configuration> block. | ||
− | <blockquote> | + | <blockquote><pre> |
<property> | <property> | ||
<name>http://purl.org/dc/terms/instructionalMethod</name> | <name>http://purl.org/dc/terms/instructionalMethod</name> | ||
<value>method</value> | <value>method</value> | ||
</property> | </property> | ||
− | </blockquote> | + | </pre></blockquote> |
This block of XML tells DiscoverEd to copy data out of the Jena store and paste it into a format where Nutch's basic query parser can find it. | This block of XML tells DiscoverEd to copy data out of the Jena store and paste it into a format where Nutch's basic query parser can find it. | ||
Line 54: | Line 56: | ||
* Added a sample configuration file | * Added a sample configuration file | ||
* Added code to our IndexFilter that looks for relevant triples and stores them in the Lucene document | * Added code to our IndexFilter that looks for relevant triples and stores them in the Lucene document | ||
− | * | + | * We had a problem with these columns not appearing in Lucene, but we fixed the underlying bug that caused that. |
− | == | + | == Next steps == |
− | * | + | * Rewriting this to be compatible with ''excludecurator''. |
Latest revision as of 20:24, 30 August 2010
Contact | Contact::Asheesh Laroia |
---|---|
Project | ,|project_name|Project Driver::project_name}} |
Status | Status::In Development |
The people who run a DiscoverEd may wish to let users search specific metadata easily. For example, http://discovered.labs.creativecommons.org/ lets users search for works "tagged" with "banana" by searching for tag:banana. (In particular, the predicate for "tag" is the term "subject" as specified by the Dublin Core.)
These prefixes, like tag:, are stored in the DiscoverEd code right now. This feature aims to move those into a configuration file.
This feature was defined and developed during the June 2010 DiscoverEd Sprint
Contents
[hide]Requirements
When DiscoverEd crawls feeds and resources and saves metadata such as the page title, it converts this information into RDF triples; those triples are eventually saved on disk in a triple store, namely Jena.
We will create a new configuration file that stores a list of mappings from predicate URIs. For example, we might list "method:" as a shorthand for the RDF predicate <http://purl.org/dc/terms/instructionalMethod>, a.k.a. "dct:instructionalMethod". At indexing time, a Lucene column called "method" will be created in the Lucene documents corresponding to each resource that has the dct:instructionalMethod predicate set in the Jena store.
Then, at search time, Nutch's built-in query parser handles the query, e.g., "method:yaddayadda".
How to use
Note: There is an implementation of this in the current version of DiscoverEd (as of 2010-08-30), but it ignores the excludecurator argument.
Let's say you want to allow users to perform this query:
method:"Experiential learning"
and retrieve all web pages in your index that have a metadatum with predicate <http://purl.org/dc/terms/instructionalMethod> and value "Experiential learning".
To do so, first edit conf/nutch-site.xml
. Add this XML inside the <configuration> block.
<property> <name>query.basic.method.boost</name> <value>1.0</value> </property>
This block of XML tells Nutch to accept the "method:" prefix in search queries. The value of this property indicates the weight the search engine should assign to this term.
Next, edit conf/discovered-search-prefixes.xml
. Add this XML inside the <configuration> block.
<property> <name>http://purl.org/dc/terms/instructionalMethod</name> <value>method</value> </property>
This block of XML tells DiscoverEd to copy data out of the Jena store and paste it into a format where Nutch's basic query parser can find it.
Implementation
- Added a sample configuration file
- Added code to our IndexFilter that looks for relevant triples and stores them in the Lucene document
- We had a problem with these columns not appearing in Lucene, but we fixed the underlying bug that caused that.
Next steps
- Rewriting this to be compatible with excludecurator.