PuSH Feed Type
The people who run a DiscoverEd instance may wish to be updated nearly-immediately when there are new resources published by a curator. Curators and publishers may wish to notify upstream consumers (aggregators, indexers, other tools) when changes occur.
Right now, DiscoverEd instances aggregate feeds and crawl periodically, often manually at the behest of the search engine operator. PubSubHubbub (PuSH) provides a way for a subscriber (like the DiscoverEd instance) to subscribe feeds and receive automatic, nearly-instantaneous notification of new information in the feed. This can be built on top of existing Atom/RSS feeds that curators already publish. Subscribers can register their interest in a feed and receive notifications when a change occurs. Curators and publishers may either notify subscribers (through a "hub") when changes occur, or the hub will periodically poll and distribute notifications to subscribers.
A complete implementation of this specification would provide the following things.
- DiscoverEd can discover a PuSH hub mentioned in a feed.
- DiscoverEd can register itself as a subscriber to that feed on that hub. (To do that, it has to provide a URL on the DiscoverEd instance that, when the feed is updated, the hub should POST to.)
- When the hub pings DiscoverEd to say there is an update to that feed, it re-aggregates data from that feed, does a crawl, and merges the index.
Aggregation and crawling are currently two different steps in the pipeline. Implementing this will require us to examine the way they interact. This should not be difficult from a code perspective (we've made progress on the aggregation side, and the Nutch API is relatively sane), but will require us to update the index in place (as opposed to merging).
- Seeking partner to support development and test use of a hub.