Difference between revisions of "PuSH Feed Type"

From Creative Commons
Jump to: navigation, search
Line 1: Line 1:
 
{{DiscoverEd Specification
 
{{DiscoverEd Specification
 
|contact=Asheesh Laroia
 
|contact=Asheesh Laroia
|project=AgShare
+
|project=
 
|status=Draft
 
|status=Draft
 
}}
 
}}
The people who run a DiscoverEd instance may wish to be updated nearly-immediately when there are new resources published by a curator.
 
  
Right now, DiscoverEd instances aggregate feeds and crawl every once in a while, often manually at the behest of the search engine operator. PubSubHubBub provides a way for the DiscoverEd instance to subscribe feeds and receive automatic, nearly-instantaneous notification of new information in the feed.
+
The people who run a DiscoverEd instance may wish to be updated nearly-immediately when there are new resources published by a curator. Curators and publishers may wish to notify upstream consumers (aggregators, indexers, other tools) when changes occur.
  
This can be built on top of existing Atom/RSS feeds that curators already publish.
+
Right now, DiscoverEd instances aggregate feeds and crawl periodically, often manually at the behest of the search engine operator. [https://code.google.com/p/pubsubhubbub/ PubSubHubbub] (PuSH) provides a way for a subscriber (like the DiscoverEd instance) to subscribe feeds and receive automatic, nearly-instantaneous notification of new information in the feed. This can be built on top of existing Atom/RSS feeds that curators already publish. Subscribers can register their interest in a feed and receive notifications when a change occurs.  Curators and publishers may either notify subscribers (through a "hub") when changes occur, or the hub will periodically poll and distribute notifications to subscribers.
  
This feature was defined and developed during the fun DC meeting thing. (Nathan, did that meeting have a name?)
+
This feature was initially described during a meeting on the [http://learningregistry.org Learning Registry] with [http://nsdl.org NSDL], [http://www.adlnet.gov/ ADLnet], and the US Department of Education.
  
 
== Requirements ==
 
== Requirements ==
Line 20: Line 19:
 
* When the hub pings DiscoverEd to say there is an update to that feed, it re-aggregates data from that feed, does a crawl, and merges the index.
 
* When the hub pings DiscoverEd to say there is an update to that feed, it re-aggregates data from that feed, does a crawl, and merges the index.
  
== Status ==
+
== Potential Issues ==
  
* This draft document has been written. That's all.
+
Aggregation and crawling are currently two different steps in the pipeline. Implementing this will require us to examine the way they interact. This should not be difficult from a code perspective (we've made progress on the aggregation side, and the Nutch API is relatively sane), but will require us to update the index in place (as opposed to merging).
* NSDL is interested in trying this with us.
 
  
== Questions ==
+
== Status ==
  
* Can we make things as simple as this:
+
* Seeking partner to support development and test use of a hub.
** OER Africa adds <link rel="hub"...>
 
** They do nothing else.
 
** The chosen hub polls the feed, and when there are updates, pings us.
 
** Then we get real-time updates with basically no effort from OER Africa.
 

Revision as of 18:39, 8 September 2010

Contact Contact::Asheesh Laroia
Project ,|project_name|Project Driver::project_name}}
Status Status::Draft


The people who run a DiscoverEd instance may wish to be updated nearly-immediately when there are new resources published by a curator. Curators and publishers may wish to notify upstream consumers (aggregators, indexers, other tools) when changes occur.

Right now, DiscoverEd instances aggregate feeds and crawl periodically, often manually at the behest of the search engine operator. PubSubHubbub (PuSH) provides a way for a subscriber (like the DiscoverEd instance) to subscribe feeds and receive automatic, nearly-instantaneous notification of new information in the feed. This can be built on top of existing Atom/RSS feeds that curators already publish. Subscribers can register their interest in a feed and receive notifications when a change occurs. Curators and publishers may either notify subscribers (through a "hub") when changes occur, or the hub will periodically poll and distribute notifications to subscribers.

This feature was initially described during a meeting on the Learning Registry with NSDL, ADLnet, and the US Department of Education.

Requirements

A complete implementation of this specification would provide the following things.

  • DiscoverEd can discover a PuSH hub mentioned in a feed.
  • DiscoverEd can register itself as a subscriber to that feed on that hub. (To do that, it has to provide a URL on the DiscoverEd instance that, when the feed is updated, the hub should POST to.)
  • When the hub pings DiscoverEd to say there is an update to that feed, it re-aggregates data from that feed, does a crawl, and merges the index.

Potential Issues

Aggregation and crawling are currently two different steps in the pipeline. Implementing this will require us to examine the way they interact. This should not be difficult from a code perspective (we've made progress on the aggregation side, and the Nutch API is relatively sane), but will require us to update the index in place (as opposed to merging).

Status

  • Seeking partner to support development and test use of a hub.