Difference between revisions of "Field Query Mapping"

Revision as of 21:06, 21 June 2010

Contact	Contact::Asheesh Laroia
Project	,\|project_name\|Project Driver::project_name}}
Status	Status::In Development

The people who run a DiscoverEd may wish to let users search specific metadata easily. For example, http://discovered.creativecommons.org/search/ lets users search for works "tagged" with "banana" as a Dublin Core subject by searching for tag:banana.

These prefixes, like tag:, are stored in the DiscoverEd code right now. This feature aims to move those into a configuration file.

This feature was defined and developed during the June 2010 DiscoverEd Sprint

Requirements

When DiscoverEd crawls feeds and resources and saves metadata such as the page title, it converts this information into RDF triples; those triples are eventually saved on disk in a triple store, Jena.

We will create a new configuration file that stores a list of mappings from predicate URIs (such as stating that "method:" will be a shorthand for the RDF predicate http://purl.org/dc/terms/instructionalMethod, AKA dct:instructionalMethod). At indexing time, a Lucene column called "method" will be created in the Lucene documents corresponding to each resource that has the dct:instructionalMethod predicate set.

Then, at search time, Nutch's built-in query parser handles the query.

Implementation

Added a sample configuration file
Added code to our IndexFilter that looks for relevant triples and stores them in the Lucene document
Problem: The Lucene documents does not seem to show our column, so we're going back to the drawing board and carefully reading the relevant Nutch documentation to make sure we're using the APIs correctly

Deferred until later

Handling provenance with regard to this.

@@ Line 12: / Line 12: @@
 == Requirements ==
-(still writing)he+%5B%5BDiscove
+When DiscoverEd crawls feeds and resources and saves metadata such as the page title, it converts this information into RDF triples; those triples are eventually saved on disk in a triple store, Jena.
+We will create a new configuration file that stores a list of mappings from predicate URIs (such as stating that "method:" will be a shorthand for the RDF predicate http://purl.org/dc/terms/instructionalMethod, AKA dct:instructionalMethod). At indexing time, a Lucene column called "method" will be created in the Lucene documents corresponding to each resource that has the dct:instructionalMethod predicate set.
+Then, at search time, Nutch's built-in query parser handles the query.
+== Implementation ==
+* Added a sample configuration file
+* Added code to our IndexFilter that looks for relevant triples and stores them in the Lucene document
+* Problem: The Lucene documents does not seem to show our column, so we're going back to the drawing board and carefully reading the [http://wiki.apache.org/nutch/HowToMakeCustomSearch relevant Nutch documentation] to make sure we're using the APIs correctly
+==Deferred until later==
+* Handling provenance with regard to this.

Difference between revisions of "Field Query Mapping"

Revision as of 21:06, 21 June 2010

Requirements

Implementation

Deferred until later

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

default links

wiki navigation

Tools