Difference between revisions of "DiscoverEd Metadata"

From Creative Commons
Jump to: navigation, search
Line 2: Line 2:
  
 
== Overview ==
 
== Overview ==
This document outlines the format in which ccLearn would like to receive syndication feeds for the data that will go into our OER database.
 
  
The data must be supplied in an [http://www.atomenabled.org/developers/syndication/ Atom] or [http://www.rssboard.org/rss-specification RSS] formatBoth of these standards are in widespread use on the Internet for content syndication.
+
This document outlines some of the metadata ccLearn is utilizing for the [[Universal Education Search]] projectThis information applies to RSS and Atom feeds which we use to direct the crawl, as well as to embedded metadata ([[RDFa]]) that might be included in pages.
  
Presently, ccLearn is looking for the following data:
+
Supported formats include [http://www.atomenabled.org/developers/syndication/ Atom], [http://www.rssboard.org/rss-specification RSS] and [[RDFa]].  See below for format-specific encoding information. 
* Link: Full URL of the referenced resource.
+
 
 +
Presently, ccLearn is exposing the following data:
 
* Title ([http://dublincore.org/documents/dces/#title DC:title]): A brief descriptive title for the resource.
 
* Title ([http://dublincore.org/documents/dces/#title DC:title]): A brief descriptive title for the resource.
 
* Summary: A relatively short summary/synopsis of the resource.
 
* Summary: A relatively short summary/synopsis of the resource.
* License: This should be a URL to the license; e.g., http://creativecommons.org/licenses/by/3.0/.
+
* License: The URL of the work's license; e.g., http://creativecommons.org/licenses/by/3.0/.
* Grade level ([http://dublincore.org/documents/dcmi-terms/#terms-educationLevel DCT:educationLevel]): What grade(s) or age-level(s) this material is suitable for.
+
* Education level ([http://dublincore.org/documents/dcmi-terms/#terms-educationLevel DCT:educationLevel]): What grade(s) or age-level(s) this material is suitable for.
 
* Language (<code>xml:lang</code>, [http://dublincore.org/documents/dcmi-terms/#terms-language DCT:language]): The language(s) of the referenced resource (not of your site).
 
* Language (<code>xml:lang</code>, [http://dublincore.org/documents/dcmi-terms/#terms-language DCT:language]): The language(s) of the referenced resource (not of your site).
 
* Subject (<code>cc:subject</code>): The subject(s) of the resource; e.g., math.
 
* Subject (<code>cc:subject</code>): The subject(s) of the resource; e.g., math.
 +
 +
== Encoding ==
 +
 +
=== Syndication Feeds ===
 +
  
 
<div style="border: 1px dashed #06f; margin: 0.5em auto 1em; padding:1em; margin-left:2em" class="boilerplate plainlinks" id="stub">
 
<div style="border: 1px dashed #06f; margin: 0.5em auto 1em; padding:1em; margin-left:2em" class="boilerplate plainlinks" id="stub">
 
'''NOTE''': The sample Atom and RSS 2.0 feeds below mostly implement the minimum elements required by the respective specification plus the fields that ccLearn needs.  For our purposes, a feed must minimally contain most of the elements in the examples below, but may also contain any other valid elements.  Also, though '''we prefer an Atom feed''', there is no reason that another type of feed cannot be used, as long as it is able to include all of the data CC needs ''AND'' includes the data in such a way that the [http://feedparser.org Universal Feed Parser] can extract it in a normalized way.
 
'''NOTE''': The sample Atom and RSS 2.0 feeds below mostly implement the minimum elements required by the respective specification plus the fields that ccLearn needs.  For our purposes, a feed must minimally contain most of the elements in the examples below, but may also contain any other valid elements.  Also, though '''we prefer an Atom feed''', there is no reason that another type of feed cannot be used, as long as it is able to include all of the data CC needs ''AND'' includes the data in such a way that the [http://feedparser.org Universal Feed Parser] can extract it in a normalized way.
 
</div>
 
</div>
 
== CC-specific categories (tags/fields) ==
 
  
 
The CC Specific fields do not have native Atom or RSS element definitions.  For these fields we suggest that they be embedded as category or tag specifications (<code><category></code> in Atom) with a specific prefix.  These have the general format of:
 
The CC Specific fields do not have native Atom or RSS element definitions.  For these fields we suggest that they be embedded as category or tag specifications (<code><category></code> in Atom) with a specific prefix.  These have the general format of:
Line 32: Line 35:
  
 
* Subject: cc:subject:<data>
 
* Subject: cc:subject:<data>
 +
 +
=== RDFa ===
 +
 +
ccLearn's [[Universal Education Search]] will be indexing RDFa when pages are crawled, making this our preferred way of encoding machine readable metadata.  We believe this will have the broadest possible exposure for current and future software agents.
 +
 +
== Vocabulary ==
  
 
=== Specifying Subject ===
 
=== Specifying Subject ===
  
The subject refers to the actual content in the resource; i.e., what is this resource ''about''? For many resources, more than one subject will be necessary; in this case, specify multiple subject <category> elements.  We ask that you try to limit the number of elements to only those subjects that are objectively reflective of the entire resource. If you want to include other types of categories (opinions, metrics, etc), please add those as normal (un-prefixed) <category> elements instead.
+
The subject refers to the actual content in the resource; i.e., what is this resource ''about''? For many resources, more than one subject will be necessary; in this case, specify multiple subject elements.  We ask that you try to limit the contents of the subject to only those subjects that are objectively reflective of the entire resource. Other types of categories (opinions, metrics, etc) may have other vocabularies available which are more appropriate.  
  
== Specifying Education level ==
+
=== Specifying Education level ===
  
 
The education level should indicate all levels (student ages) for which the resource is deemed appropriate. The education level should be labeled using the [http://dublincore.org/documents/dcmi-terms/#terms-educationLevel DCT:educationLevel] term.   
 
The education level should indicate all levels (student ages) for which the resource is deemed appropriate. The education level should be labeled using the [http://dublincore.org/documents/dcmi-terms/#terms-educationLevel DCT:educationLevel] term.   
Line 48: Line 57:
 
You may include equivalent terms as well by specifying more than one <code>DCT:educationLevel</code> <category>.  For example, you might include a <code>DCT:educationLevel</code> for <code>9</code>, <code>10</code>, and <code>secondary</code>.
 
You may include equivalent terms as well by specifying more than one <code>DCT:educationLevel</code> <category>.  For example, you might include a <code>DCT:educationLevel</code> for <code>9</code>, <code>10</code>, and <code>secondary</code>.
  
== Specifying Language ==
+
=== Specifying Language ===
  
 
When specifying the language for a resource, the value should be specified as described by [http://www.ietf.org/rfc/rfc4646.txt RFC-4646].  For example, <code>en</code> for English.  To distinguish English (United States) from English (United Kindgom), the language would be specified as <code>en-US</code> and <code>en-GB</code>, respectively.
 
When specifying the language for a resource, the value should be specified as described by [http://www.ietf.org/rfc/rfc4646.txt RFC-4646].  For example, <code>en</code> for English.  To distinguish English (United States) from English (United Kindgom), the language would be specified as <code>en-US</code> and <code>en-GB</code>, respectively.
Line 54: Line 63:
 
In an Atom 1.0 feed, the language is specified as the <code>xml:lang</code> attribute of the <code>content</code> element.  Multiple languages in a single entry is not supported.
 
In an Atom 1.0 feed, the language is specified as the <code>xml:lang</code> attribute of the <code>content</code> element.  Multiple languages in a single entry is not supported.
  
== Embedding license data ==
+
=== Embedding license data ===
  
 
Since the licensing of a resource is expected to be conveyed via URL, we can leverage the Atom &lt;link&gt; element.  However we must markup the link element so as to identify it as a license URL.  This is accomplished with adding the attribute rel="license" to the &lt;link&gt; element.  For example:
 
Since the licensing of a resource is expected to be conveyed via URL, we can leverage the Atom &lt;link&gt; element.  However we must markup the link element so as to identify it as a license URL.  This is accomplished with adding the attribute rel="license" to the &lt;link&gt; element.  For example:
Line 61: Line 70:
  
 
See the complete [[Syndication|CC with syndication formats]] documentation for more information.
 
See the complete [[Syndication|CC with syndication formats]] documentation for more information.
 +
 +
== Examples ==
  
 
=== Atom 1.0 Example ===
 
=== Atom 1.0 Example ===

Revision as of 21:34, 14 July 2008

Overview

This document outlines some of the metadata ccLearn is utilizing for the Universal Education Search project. This information applies to RSS and Atom feeds which we use to direct the crawl, as well as to embedded metadata (RDFa) that might be included in pages.

Supported formats include Atom, RSS and RDFa. See below for format-specific encoding information.

Presently, ccLearn is exposing the following data:

  • Title (DC:title): A brief descriptive title for the resource.
  • Summary: A relatively short summary/synopsis of the resource.
  • License: The URL of the work's license; e.g., http://creativecommons.org/licenses/by/3.0/.
  • Education level (DCT:educationLevel): What grade(s) or age-level(s) this material is suitable for.
  • Language (xml:lang, DCT:language): The language(s) of the referenced resource (not of your site).
  • Subject (cc:subject): The subject(s) of the resource; e.g., math.

Encoding

Syndication Feeds

The CC Specific fields do not have native Atom or RSS element definitions. For these fields we suggest that they be embedded as category or tag specifications (<category> in Atom) with a specific prefix. These have the general format of:

cc:<field>:<data>

For example, the <category> content for Subject would become something like:

cc:subject:Math

The Creative Commons-specific fields build upon existing category/tag support in feeds. Therefore any cc: field may be specified multiple times if needed. The fields we currently use for refining search results include:

  • Subject: cc:subject:

RDFa

ccLearn's Universal Education Search will be indexing RDFa when pages are crawled, making this our preferred way of encoding machine readable metadata. We believe this will have the broadest possible exposure for current and future software agents.

Vocabulary

Specifying Subject

The subject refers to the actual content in the resource; i.e., what is this resource about? For many resources, more than one subject will be necessary; in this case, specify multiple subject elements. We ask that you try to limit the contents of the subject to only those subjects that are objectively reflective of the entire resource. Other types of categories (opinions, metrics, etc) may have other vocabularies available which are more appropriate.

Specifying Education level

The education level should indicate all levels (student ages) for which the resource is deemed appropriate. The education level should be labeled using the DCT:educationLevel term.

Though we will accept any descriptions that seem appropriate to you, please consider using one of the following schemas:

  • primary, secondary, tertiary, adult;
  • K,1,2,3,...,20 (where the number refers to the actual grade-level).

You may include equivalent terms as well by specifying more than one DCT:educationLevel <category>. For example, you might include a DCT:educationLevel for 9, 10, and secondary.

Specifying Language

When specifying the language for a resource, the value should be specified as described by RFC-4646. For example, en for English. To distinguish English (United States) from English (United Kindgom), the language would be specified as en-US and en-GB, respectively.

In an Atom 1.0 feed, the language is specified as the xml:lang attribute of the content element. Multiple languages in a single entry is not supported.

Embedding license data

Since the licensing of a resource is expected to be conveyed via URL, we can leverage the Atom <link> element. However we must markup the link element so as to identify it as a license URL. This is accomplished with adding the attribute rel="license" to the <link> element. For example:

<link rel="license" href="http://creativecommons.org/licenses/by/3.0/" />

See the complete CC with syndication formats documentation for more information.

Examples

Atom 1.0 Example

Here is a sample, one entry Atom 1.0 feed which implements the guidelines above.


<feed xmlns="http://www.w3.org/2005/Atom">
  <id>http://oersite.org/cc/</id>
  <title>OER Aggregation Web Site</title>
  <updated>2008-01-16T12:00:00Z</updated>
  <link rel="self" href="http://oersite.org/cc/atom.xml" type="application/atom+xml" />
  <author>
    <name>John Q. Public</name>
    <email>webmaster@oersite.org</email>
  </author>
  <entry>
    <id>tag:ocw.org,2007-10-15:/math/101</id>
    <updated>2007-10-15T12:00:00Z</updated>
    <link href="http://ocw.org/math/101" />
    <title>Math 101</title>
    <summary>Basic mathematics for 5th graders</summary>
    <link rel="license" href="http://creativecommons.org/licenses/by/3.0/" />
    <category term="cc:subject:Math" />
    <category term="dc:educationLevel:5" />
    <content type="xhtml" xml:lang="en">The content</content>
  </entry>
</feed>