Metadata Retriever Plugins
Operators may wish to include metadata about Resources from other sources, including web services (ie, Semantic Analysis), databases, etc. This describes a plugin system for adding sources of information at aggregation time.
Metadata retriever plugins are Nutch plugins which implement the MetadataRetriever extension point. MetadataRetriever extensions must implement a single method,
retrieve takes a Resource as an argument, and may add additional metadata to it.
retrieve is not responsible for persisting the Resource.
We will implement two demonstration plugins: a "dummy" plugin which logs the URLs being passed to it, and a functional demonstration which uses the Delicious API to retrieve suggested and popular tags for a page.
- Added support for storing arbitrary metadata on Resource objects
- Added support to TripleStore for serialization an de-serialization
- Implemented the org.creativecommons.learn.plugin.MetadataRetriever extension point and org.creativecommons.learn.plugin.MetadataRetrievers extension loader
- Implemented test plugins
Deferred until later
- Configuration parameters for plugins (the Delicious plugin reads from discovered.xml, but the plugin.xml manifest doesn't state that it needs parameters).
- Allow sequencing of MetadataRetriever plugins, to determine which are authoritative