Difference between revisions of "DiscoverEd Data"

From Creative Commons
Jump to: navigation, search
 
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{incomplete}}
 
 
[http://oesearch.creativecommons.org Open Education Search] is a project of [http://learn.creativecommons.org ccLearn].  You can find more information on the project at http://learn.creativecommons.org/projects/oesearch/.
 
 
This page documents ways in which developers may use the data gathered by the project for other purposes, including integration and customization.
 
 
 
== Data Gathered ==
 
== Data Gathered ==
  
The Open Education Search (OES) project is a web-scale search of Open Educational Resources (OER).  As such, it utilizes a web-wide index, promoting results which have been identified as OER.  ccLearn is serving as an aggregation point for other organizations which have identified or produced OER. At this time we gather:
+
The DiscoverEd project is a scalable search of educational resources with a special emphasis on Open Educational Resources (OER).  As such, it utilizes a web-wide index, promoting results which have been identified as OER.  DiscoverEd.creativecommons.org is serving as an aggregation point for other organizations which have identified or produced OER.  
 
 
* URLs or URL patterns
 
* Subject annotations, sometimes called labels
 
 
 
We are not currently attempting to aggregate rich metadata sets.
 
 
 
The data gathered is currently available in a format suitable for use in a [http://google.com/coop/cse/ Google Custom Search Engine]; ccLearn is committed to making it available in a "raw" format for further reuse.
 
 
 
== Integrating Open Education Search ==
 
 
 
Any site may include Open Education Search on their website by pointing to our CSE context definition.  For example, the following block of HTML will produce a search box with the same semantics as Open Education Search.  For more information on customizing the results format or hosting the results on your site, see the [http://google.com/coop/docs/cse/cref.html Linked CSE documentation] from Google.
 
 
 
<code><pre>
 
<form id="cref" action="http://google.com/cse">
 
  <input type="hidden" name="cref"
 
    value="http://oercloud.creativecommons.org/api/posts_context"
 
    />
 
  <input type="text" name="q" size="40" />
 
  <input type="submit" name="sa" value="Search" />
 
</form>
 
<script type="text/javascript"
 
  src="http://google.com/coop/cse/brand?form=cref"></script>
 
</pre></code>
 
 
 
== Customizing OE Search ==
 
 
 
Another re-use scenario is the use of the dataset, with adjustments made to result weighting.  We use labels for subject annotations, as well as annotating the source of the URL.  For example, URLs received from [http://cnx.org Connexions] are labeled with <code>connexions</code>.  Using these labels you can promote the results from your site while still maintaining the web-scale breadth of Open Education Search.
 
 
 
Weight adjustment requires publishing a context file which provides the basic definition of your search engine.  Within this definition you can define a relative weighting for a label:
 
 
 
<code><pre>
 
<Label name="connexions" mode="BOOST" weight="0.8"></Label>
 
</pre></code>
 
 
 
Additionally, your definition can refer to our lists of URL annotations:
 
 
 
<code><pre>
 
<Include type="Annotations" href="http://oercloud.creativecommons.org/api/posts/coop?p=0"/>
 
...
 
</pre></code>
 
  
'''NB: We currently require the "p=n" query string to reduce processing load; we will be publishing a single annotation file you can include by October 15, 2007.'''
+
Data are aggregated from several sources, including:
  
== Additional Resources and Information ==
+
* RSS and Atom feeds (title, description and subject information)
 +
* [http://www.openarchives.org/pmh/ OAI-PMH] repositories (OAI-DC metadata)
 +
* Crawled pages (embedded [[RDFa]])
  
* [http://oercloud.creativecommons.org/api/posts/coop_context Open Education Search CSE Context]
+
You can read more details about our [http://wiki.creativecommons.org/CcLearn_Search_Metadata metadata specifications]. The aggregated information, along with source annotations, is stored in a triple store.  This information is available as a SPARQL endpoint.
* [http://google.com/coop/docs/cse/cref.html Linked CSE documentation]
 
  
 
[[Category:Learn]]
 
[[Category:Learn]]
 
[[Category:Developer]]
 
[[Category:Developer]]
 +
[[Category:DiscoverEd]]

Latest revision as of 21:30, 18 June 2010

Data Gathered

The DiscoverEd project is a scalable search of educational resources with a special emphasis on Open Educational Resources (OER). As such, it utilizes a web-wide index, promoting results which have been identified as OER. DiscoverEd.creativecommons.org is serving as an aggregation point for other organizations which have identified or produced OER.

Data are aggregated from several sources, including:

  • RSS and Atom feeds (title, description and subject information)
  • OAI-PMH repositories (OAI-DC metadata)
  • Crawled pages (embedded RDFa)

You can read more details about our metadata specifications. The aggregated information, along with source annotations, is stored in a triple store. This information is available as a SPARQL endpoint.