Metadata Retriever Plugins

From Creative Commons
Revision as of 17:55, 18 June 2010 by Nathan Yergler (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Contact Contact::Nathan Yergler
Project ,|project_name|Project Driver::project_name}}
Status Status::Complete

DEd sites may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, etc. This describes a plugin system for adding sources of information at aggregation time.

This feature was defined and developed during the June 2010 DiscoverEd Sprint


Metadata retriever plugins are Nutch plugins which implement the MetadataRetriever extension point. MetadataRetriever extensions must implement a single method, retrieve. retrieve takes a Resource as an argument, and may add additional metadata to it. retrieve is not responsible for persisting the Resource.

We will implement two demonstration plugins: a "dummy" plugin which logs the URLs being passed to it, and a functional demonstration which uses the Delicious API to retrieve suggested and popular tags for a page.


  • Added support for storing arbitrary metadata on Resource objects
  • Added support to TripleStore for serialization an de-serialization
  • Implemented the org.creativecommons.learn.plugin.MetadataRetriever extension point and org.creativecommons.learn.plugin.MetadataRetrievers extension loader
  • Implemented test plugins

Deferred until later

  • Configuration parameters for plugins (the Delicious plugin reads from discovered.xml, but the plugin.xml manifest doesn't state that it needs parameters).
  • Allow sequencing of MetadataRetriever plugins, to determine which are authoritative