Difference between revisions of "Metadata Retriever Plugins"

From Creative Commons
Jump to: navigation, search
(Deferred until later)
Line 1: Line 1:
 
{{DiscoverEd Specification
 
{{DiscoverEd Specification
|status=In Development
+
|contact=Nathan Yergler
 +
|project=AgShare, FSKN
 +
|status=Complete
 
}}
 
}}
==First Phase==
+
Operators may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, etc.  This describes a plugin system for adding sources of information at aggregation time.
Operators may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, or something else.  This describes how to implement a plugin to import this information.
 
  
A plugin implements IMetadataImporter, which provides a single method, loadMetadata.  loadMetadata takes a Resource, and is responsible for adding any fields to it.  loadMetadata is not responsible for persisting the Resource.
+
== Requirements ==
  
The initial prototype will log URLs as it goes.
+
Metadata retriever plugins are Nutch plugins which implement the  MetadataRetriever extension point.  MetadataRetriever extensions must implement a single method, <code>retrieve</code>.  <code>retrieve</code> takes a Resource as an argument, and may add additional metadata to it.  <code>retrieve</code> is not responsible for persisting the Resource.
  
The first working plug-in could exercise the del.icio.us API to retrieve tags that delicious users have placed on a URI.  Example: <code>api.del.icio.us</code> <code>/v1/posts/suggest?url=http://ocw.nd.edu/computer-applications/applied-multimedia-technology</code>
+
We will implement two demonstration plugins: a "dummy" plugin which logs the URLs being passed to it, and a functional demonstration which uses the Delicious API to [http://delicious.com/help/api#posts_suggest retrieve suggested and popular tags] for a page.
  
To accomplish this...
+
== Implementation ==
*enable resources to allow dynamic metadata to be attached (by plug-ins) through a new method on "resource"
 
**method that adds an assertion to resource
 
***checks to see if the assertion exists, uses if exists, creates if not exist
 
***populates assertion with data
 
  
**method that gets all the assertions from resource
+
* Added support for storing arbitrary metadata on Resource objects
*associate each custom metadata field with a URI-namespace to avoid cardinality
+
* Added support to TripleStore for serialization an de-serialization
*Create plug-in; call from aggregate step; use resource's save method;
+
* Implemented the org.creativecommons.learn.plugin.MetadataRetriever extension point and org.creativecommons.learn.plugin.MetadataRetrievers extension loader
 +
* Implemented test plugins
  
 
==Deferred until later==
 
==Deferred until later==
*Allow configuration of plug-ins
+
 
*Alter order in which plug-ins run (to determine which are authoritative)
+
* Configuration parameters for plugins (the Delicious plugin reads from discovered.xml, but the plugin.xml manifest doesn't state that it needs parameters).
 +
* Allow sequencing of MetadataRetriever plugins, to determine which are authoritative

Revision as of 14:15, 17 June 2010

Contact Contact::Nathan Yergler
Project ,|project_name|Project Driver::project_name}}
Status Status::Complete

Operators may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, etc. This describes a plugin system for adding sources of information at aggregation time.

Requirements

Metadata retriever plugins are Nutch plugins which implement the MetadataRetriever extension point. MetadataRetriever extensions must implement a single method, retrieve. retrieve takes a Resource as an argument, and may add additional metadata to it. retrieve is not responsible for persisting the Resource.

We will implement two demonstration plugins: a "dummy" plugin which logs the URLs being passed to it, and a functional demonstration which uses the Delicious API to retrieve suggested and popular tags for a page.

Implementation

  • Added support for storing arbitrary metadata on Resource objects
  • Added support to TripleStore for serialization an de-serialization
  • Implemented the org.creativecommons.learn.plugin.MetadataRetriever extension point and org.creativecommons.learn.plugin.MetadataRetrievers extension loader
  • Implemented test plugins

Deferred until later

  • Configuration parameters for plugins (the Delicious plugin reads from discovered.xml, but the plugin.xml manifest doesn't state that it needs parameters).
  • Allow sequencing of MetadataRetriever plugins, to determine which are authoritative