MediaWiki Feed Type
Publishers who store and edit text in MediaWiki may want a particular category from their wiki aggregated into a DiscoverEd instance. This specification describes:
- How publishers can set up their wikis to be aggregated by DiscoverEd
- How DiscoverEd instance maintainers can aggregate content from such a wiki
- The changes that we made to DiscoverEd to support this.
It's pretty easy, all around. Relax, and read on.
Guide for publishers
DiscoverEd can pull content from a category in your wiki. Our code relies on the following:
- The MediaWiki API enabled, at least the read-only portion. (This is the default. More info here.)
- A <link rel="search"> tag pointing to the OpenSearch API on your wiki. This is also enabled by default, but if you customize the theme, you might accidentally remove this tag. We need it there.
We detect the MediaWiki API path by looking at a page on your wiki and then determining the URL to the API by looking for another MediaWiki PHP file link. Right now, we rely on the <link rel="search"> for that detection. So if you remove the OpenSearch header, we can't find the API URL.
Guide for DiscoverEd sysadmins
- There is a new feed type: mediawiki-category. If you add a feed with that type, set the URL of the feed to the MediaWiki category.
- The provenance of a Resource we find in the category is the category URL.
- The code has extensive logging of its exceptions, so if you find you are missing data you thought you would have. do read the log.
DiscoverEd code changes
- Factored out the RSS feed aggregation into a separate class
- Created a new valid feed type, mediawiki-category
- Created a MediaWikiCategory class which can, starting from a category page, find the API URL, query it for all the pages in that category, and create a Resource representing each such page.
Things that would be nice, but that the world will probably never see:
- It would be nice if MediaWiki had a <link rel="api"> or similar that unambiguously pointed automated agents to the API.
- It would be interesting if MediaWiki just created an RSS feed, in "MIT OCW format", for each category.
- We extract extremely little metadata right now from the pages: just the title. It would be nice if there were a reasonable way to store and extract metadata. A decision by us could make a big difference.