Difference between revisions of "Metadata Retriever Plugins"

From Creative Commons
Jump to: navigation, search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{DiscoverEd Specification
 
{{DiscoverEd Specification
|status=In Development
+
|contact=Nathan Yergler
 +
|project=AgShare, FSKN
 +
|status=Complete
 
}}
 
}}
==First Phase==
 
Operators may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, or something else.  This describes how to implement a plugin to import this information.
 
  
A plugin implements IMetadataImporter, which provides a single method, loadMetadataloadMetadata takes a Resource, and is responsible for adding any fields to it.  loadMetadata is not responsible for persisting the Resource.
+
DEd sites may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, etcThis describes a plugin system for adding sources of information at aggregation time.
  
The initial prototype will log URLs as it goes.
+
This feature was defined and developed during the [[DiscoverEd Sprint (June, 2010)|June 2010 DiscoverEd Sprint]]
  
The first working plug-in could exercise the del.icio.us API to retrieve tags that delicious users have placed on a URI.  Example: <code>api.del.icio.us</code> <code>/v1/posts/suggest?url=http://ocw.nd.edu/computer-applications/applied-multimedia-technology</code>
+
== Requirements ==
  
To accomplish this...
+
Metadata retriever plugins are [http://wiki.apache.org/nutch/PluginCentral Nutch plugins] which implement the <code>MetadataRetriever</code> extension point. MetadataRetriever extensions must implement a single method, <code>retrieve</code>.  <code>retrieve</code> takes a Resource as an argument, and may add additional metadata to it.  <code>retrieve</code> is not responsible for persisting the Resource.
*enable resources to allow dynamic metadata to be attached (by plug-ins) through a new method on "resource"
 
**method that adds an assertion to resource
 
***checks to see if the assertion exists, uses if exists, creates if not exist
 
***populates assertion with data
 
  
**method that gets all the assertions from resource
+
We will implement two demonstration plugins: a "dummy" plugin which logs the URLs being passed to it, and a functional demonstration which uses the Delicious API to [http://delicious.com/help/api#posts_suggest retrieve suggested and popular tags] for a page.
*associate each custom metadata field with a URI-namespace to avoid cardinality
+
 
*Create plug-in; call from aggregate step; use resource's save method;
+
== Implementation ==
 +
 
 +
* Added support for storing arbitrary metadata on Resource objects
 +
* Added support to TripleStore for serialization an de-serialization
 +
* Implemented the org.creativecommons.learn.plugin.MetadataRetriever extension point and org.creativecommons.learn.plugin.MetadataRetrievers extension loader
 +
* Implemented test plugins
  
 
==Deferred until later==
 
==Deferred until later==
Allow configuration of plug-ins
+
 
*enable/disable individual plug-ins
+
* Configuration parameters for plugins (the Delicious plugin reads from discovered.xml, but the plugin.xml manifest doesn't state that it needs parameters).
*add additional or custom plug-ins
+
* Allow sequencing of MetadataRetriever plugins, to determine which are authoritative
*alter order in which plug-ins run (to determine which are authoritative)
 

Latest revision as of 17:55, 18 June 2010

Contact Contact::Nathan Yergler
Project ,|project_name|Project Driver::project_name}}
Status Status::Complete


DEd sites may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, etc. This describes a plugin system for adding sources of information at aggregation time.

This feature was defined and developed during the June 2010 DiscoverEd Sprint

Requirements

Metadata retriever plugins are Nutch plugins which implement the MetadataRetriever extension point. MetadataRetriever extensions must implement a single method, retrieve. retrieve takes a Resource as an argument, and may add additional metadata to it. retrieve is not responsible for persisting the Resource.

We will implement two demonstration plugins: a "dummy" plugin which logs the URLs being passed to it, and a functional demonstration which uses the Delicious API to retrieve suggested and popular tags for a page.

Implementation

  • Added support for storing arbitrary metadata on Resource objects
  • Added support to TripleStore for serialization an de-serialization
  • Implemented the org.creativecommons.learn.plugin.MetadataRetriever extension point and org.creativecommons.learn.plugin.MetadataRetrievers extension loader
  • Implemented test plugins

Deferred until later

  • Configuration parameters for plugins (the Delicious plugin reads from discovered.xml, but the plugin.xml manifest doesn't state that it needs parameters).
  • Allow sequencing of MetadataRetriever plugins, to determine which are authoritative