Difference between revisions of "DiscoverEd Metadata"

From Creative Commons
Jump to: navigation, search
(Specifying Education level)
(Embedding license data)
Line 41: Line 41:
 
=== Embedding license data ===
 
=== Embedding license data ===
  
Since the licensing of a resource is expected to be conveyed via URL, we can leverage the Atom <link> element.  However we must markup the link element so as to identify it as a license URL.  This is accomplished with adding the attribute rel="license" to the <link> element.  For example:
+
The resource license should be communicated using the stable URL of the license.  For example, http://creativecommons.org/licenses/by/3.0/. 
  
<pre><link rel="license" href="http://creativecommons.org/licenses/by/3.0/" /></pre>
+
See the [[Syndication|CC with syndication formats]] documentation for more information on including this is a bootstrap feed.
 
 
See the complete [[Syndication|CC with syndication formats]] documentation for more information.
 
  
 
== Examples ==
 
== Examples ==

Revision as of 21:59, 1 April 2009


Overview

This document outlines some of the metadata ccLearn is utilizing for the DiscoverEd project. This metadata is collected from embedded RDFa at crawl time. We also look for this metadata in feeds (RSS, Atom) and OAI-PMH harvests, although these are considered as a way to bootstrap the metadata corpus. While the metadata store may include additional information from resources, these fields are exposed by default in the search results.

  • Title (DCT:title): A brief descriptive title for the resource.
  • Summary (DCT:description): A relatively short summary or synopsis of the resource.
  • License (DCT:license, cc:license, xhtml:license): The URL of the work's license; e.g., http://creativecommons.org/licenses/by/3.0/.
  • Education level (DCT:educationLevel): What grade(s) or age-level(s) this material is suitable for.
  • Language (xml:lang, DCT:language): The language(s) of the referenced resource (not of your site).
  • Subject (DCT:subject): The subject(s) of the resource; e.g., math.

Our preferred and suggested metadata encoding/transport is XHTML+RDFa. We believe this has the broadest possible exposure for current and future software agents.

Vocabulary

Specifying Subject

The subject refers to the actual content in the resource; i.e., what is this resource about? For many resources, more than one subject will be necessary; in this case, specify multiple subject elements. We ask that you try to limit the contents of the subject to only those subjects that are objectively reflective of the entire resource. Other types of categories (opinions, metrics, etc) may have other vocabularies available which are more appropriate.

Specifying Education level

The education level should indicate all levels (student ages) for which the resource is deemed appropriate. The education level should be labeled using the DCT:educationLevel term.

Though we will accept any descriptions that seem appropriate to you, please consider using one of the following schemas:

  • primary, secondary, tertiary, adult;
  • K,1,2,3,...,20 (where the number refers to the actual grade-level).

You may include equivalent terms as well by specifying more than one value for DCT:educationLevel. For example, you might include a DCT:educationLevel for 9, 10, and secondary.

Specifying Language

When specifying the language for a resource, the value should be specified as described by RFC-4646. For example, en for English. To distinguish English (United States) from English (United Kindgom), the language would be specified as en-US and en-GB, respectively.

In an Atom 1.0 feed, the language is specified as the xml:lang attribute of the content element. Multiple languages in a single entry is not supported.

Embedding license data

The resource license should be communicated using the stable URL of the license. For example, http://creativecommons.org/licenses/by/3.0/.

See the CC with syndication formats documentation for more information on including this is a bootstrap feed.

Examples

[X]HTML + RDFa

The following is an example of how a resource at http://ocw.example.org/math/101 could be annotated with machine readable metadata. This is our preferred manner for encoding this information as it exposes the metadata to a much wider range of clients.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml/"
      xmlns:dc="http://purl.org/dc/terms/">
  <head>
   <title>OER Site</title>
  </head>

  <body>
     <h1 property="dc:title">Math 101</h1>
     <h2>by <span property="dc:author">John Q. Public</span></h2>
     <p property="dc:description">Basic mathematics for 5th graders</p>
     <p>Subjects: <span property="dc:subject">Math</span></p>
     <p>Grade level: <span property="dc:educationLevel">5</span></p>
     <p>Language: <span property="dc:language" content="en">English</span></p>
     <p>License: <a href="http://creativecommons.org/by/3.0/" rel="license">Attribution 3.0</a></p>

     <p>Lorem ipsum, etc, etc.</p>

  </body>
</html>

If a site aggregates resources such that the metadata appears on a page other than the actual resource, the about attribute can be used to indicate that the metadata is about a different resource. For example, the following page could be published at http://commons.oer.example.org/math/101 and still refer to the same resource as the previous example:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml/"
      xmlns:dc="http://purl.org/dc/terms/">
  <head>
   <title>OER Site</title>
  </head>

  <body>
     <div about="http://ocw.example.org/math/101">
       <h1 property="dc:title">Math 101</h1>
       <h2>by <span property="dc:author">John Q. Public</span></h2>
       <p property="dc:description">Basic mathematics for 5th graders</p>
       <p>Subjects: <span property="dc:subject">Math</span></p>
       <p>Grade level: <span property="dc:educationLevel">5</span></p>
       <p>Language: <span property="dc:language" content="en">English</span></p>
       <p>License: <a href="http://creativecommons.org/by/3.0/" rel="license">Attribution 3.0</a></p>
     </div>

     <p>Lorem ipsum, etc, etc.</p>

  </body>
</html>


Atom 1.0 Example

Here is a sample, one entry Atom 1.0 feed which implements the guidelines above. Note that inclusion of additional metadata in the feed is optional and considered inferior to inclusion with the resource using RDFa.

<feed xmlns="http://www.w3.org/2005/Atom">
  <id>http://oersite.example.org/cc/</id>
  <title>OER Aggregation Web Site</title>
  <updated>2008-01-16T12:00:00Z</updated>
  <link rel="self" href="http://oersite.example.org/cc/atom.xml" type="application/atom+xml" />
  <author>
    <name>John Q. Public</name>
    <email>webmaster@oersite.org</email>
  </author>
  <entry>
    <id>tag:ocw.org,2007-10-15:/math/101</id>
    <updated>2007-10-15T12:00:00Z</updated>
    <link href="http://ocw.example.org/math/101" />
    <title>Math 101</title>
    <summary>Basic mathematics for 5th graders</summary>
    <link rel="license" href="http://creativecommons.org/licenses/by/3.0/" />
    <category term="dc:subject:Math" />
    <category term="dc:educationLevel:5" />
    <content type="xhtml" xml:lang="en">The content</content>
  </entry>
</feed>