Towards a Global Infrastructure For Sharing Learning Resources

From Creative Commons
Jump to: navigation, search

This document was developed by the participants of the OER Search and Discovery meeting, based on discussions at the meeting.

Introduction

This document aims to help those who ask, "What should I be doing when I publish my OER's so that they are searchable and discoverable by the OER community and (perhaps) the whole world?". More specifically, this document aims to help those with a collection of such OER's.

There are several options for many of the fundamental questions. Luckily, the number of options is actually quite limited. For nearly all producers we recommend adopting one of the choices outlined below. These have significant adoption and deployment, increasing the reach and exposure of your OER. We will develop bridges between the different approaches; indeed, many of these bridges already exist.

In essence, the choices boil down to:

  • If you are looking for a repository to store the OER's and make them available to the world, then you can contact ARIADNE or Connexions who will gladly do so on your behalf or provide you with the code to run your own repository.
  • If you have your own repository, then we strongly suggest that you make your content and metadata available for harvesting. As an alternative or in addition, you can also make your repository available for federated search. We strongly suggest that you register your repository
  • For your content, we make no specific assumptions: HTML documents, OpenOffice documents, Microsoft Office documents, MPEG video clips, PDF documents, MP3 sound files, etc.; anything goes. Of course, in the spirit of open educational resources, we certainly encourage you to publish in as open a format as feasible.
  • For the metadata, we strongly suggest to use either Learning Object Metadata or Dublin Core. For specific niche domains, other formats like for instance MPEG apply as well. You may also define a so-called 'application profile' that defines your specific requirements your repository imposes with respect to metadata, so that you can enforce them. In concrete terms, such metadata can be expressed as XML or RDF.

Harvesting

Harvesting is a technique that allows a software agent to collect resources (content or metadata) from a repository. Some harvesting protocols such as OAI-PMH enable the requesting agent to retrieve only a specific set of metadata or content (for instance based on a query that identifies what is relevant).

The major advantage of harvesting is that the harvester then has all the metadata or content available, so that queries from end users can be processed without further need to contact the harvested repositories. Especially in the context of infrastructures with many dozens or more repositories, this can be quite important, as the response time involved with contacting repositories in order to answer and end user request becomes prohibitively long. Moreover, when queries are forwarded to local repositories for federated search, the local repository may incur considerable load servicing third party queries.

The major concern with harvesting for some organisations is that this approach allows third parties to collect the metadata and content from a repository. In an OER context, this is probably not much of a concern, but it can still raise issues about visibility for the party being harvested.

In concrete terms, there are two important harvesting protocols:

  • OAI-PMH: The Open Archives Initiative Protocol for Metadata Harvesting is widely used by repositories of scholarly material and learning resources. It can include simple queries to determine which metadata or content to harvest.
  • RSS/Atom: RSS and Atom are syndication formats, commonly used for web feeds that can be read by software such as Bloglines or Google Reader. They may be used more generically to let software know about updates to a repository as resources are created or changed.

We suggest that you adopt OAI-PMH if you have no strong reason to prefer RSS, as OAI-PMH offers more features that can be important for third party developers. If you also want to provide feeds of new objects that are deposited in your repository, then an additional RSS feed is quite useful.

Registration

In order to enable harvesting software to contact a provider of content and metadata, this software must be aware of

  • the location of the provider (typically a URL), as well as
  • the protocol(s) that the provider supports (OAI-PMH, RSS).

This information is typically maintained in a so-called registry. In a way, a registry is a meta-repository: a repository with information about repositories. Its main goal is to enable other services to discover repositories. To this effect, it sometimes also includes additional information about the content repositories, such as data about

  • the collections they hold: their domain, the number of resources, etc.
  • when the repository was last updated,
  • etc.

We strongly encourage you to register your repository in with ARIADNE or OERfeeds.info.

Metadata

Metadata describe the content. We strongly suggest that you use either one of the following two metadata schemas, that define an extensive set of metadata elements:

  • LOM: The IEEE Learning Object Metadata standard defines circa 70 elements that can be used to describe general, technical, educational and other aspects.
  • DC: Dublin Core metadata are more generic in nature. Specific extensions for learning content are discussed in the DCMI Education Community.

By adopting one of these common schemas, it is easy for software to process the metadata. For specific niche content, standards such as MPEG (for audio-visual material) and others, special purpose formats may be more appropriate. In that case, a mapping to LOM or DC enables interoperability with learning specific tools and infrastructures.

In order to enforce your requirements, you can define an "application profile". This typically involves making some metadata elements mandatory, or imposing constraints on the values that they can hold, etc.

Finally, the metadata are typically expressed in a machine-readable format, sometimes called a "binding". Common bindings includes:

  • XML: the Extensible Markup Language that is at the heart of many Web document formats (like HTML).
  • RDF: the Resource Description Framework allows you to describe relationships and properties about your resources. The information is commonly written as XML. If you are publishing XHTML content, we recommend using RDFa to include the metadata with the content. See "ccREL: The Creative Commons Rights Expression Language" for a discussion of applications enabled by co-locating metadata and content.

If you want to make sure that the metadata you expose adhere to your self-imposed requirements, we strongly suggest leveraging a validation service, such as ARIADNE's validation service.