Difference between revisions of "Metadata Scraper"

From Creative Commons

Latest revision as of 19:44, 8 January 2018


Description of Software	The Creative Commons Metadata Scraper is a simple crawler used by the license engine to detect metadata stored in pages.
Bug Tracker	Link to Bug Tracker
Code Repository	Link to Code repository
Mailing List	Link to Mailing list

1 Decommissioned
2 What it Did
3 When it Happened
4 Developer Information
5 Contact

Decommissioned

The Creative Commons Metadata Scraper was decommissioned on Monday 8th January 2018. Due to the changing nature of the web, in particular the widespread adoption of HTTPS, the Scraper had seen greatly reduced use. For potential future directions in metadata presentation, see Creative Commons' Open Ledger project.

What it Did

The metadata scraper was used by the license deeds to extract embedded metadata from pages. It scanned the page for RDFa using rdfadict and pyRdfa.

When it Happened

When a creator or copyright holder selects a license, they have the opportunity to provide additional metadata about their work. This includes information such as creator and medium, as well as how the creator would like to be attributed. If this information is provided, it is encoded in the HTML generated using RDFa.

In order to provide more context relevant information for visitors, the license deeds loaded a script which looks at the referring page for metadata. If found, the metadata was used to update the deed and display additional attribution or CC+ details.

Developer Information

Metadata Scraper was implemented in Python using the CherryPy web framework. The source code is available on GitHub at https://github.com/cc-archive/metadata_scraper .

Contact

Contact Rob Myers with questions about the scraper.

Retrieved from "https://wiki.creativecommons.org/index.php?title=Metadata_Scraper&oldid=116559"

Categories:

@@ Line 1: / Line 1: @@
 {{Software Project
 |Description=The Creative Commons Metadata Scraper is a simple crawler used by the license engine to detect metadata stored in pages.
-|Bug tracker=http://code.creativecommons.org/issues/
+|Bug tracker=https://github.com/cc-archive/metadata_scraper/issues/
-|Code repository=http://code.creativecommons.org/viewsvn/metadata_scraper/
+|Code repository=https://github.com/cc-archive/metadata_scraper
-|Mailing list=http://lists.ibiblio.org/mailman/listinfo/cc-devel
+|Mailing list=https://creativecommons.email/mailman/listinfo/cc-devel
 }}
-== What it Does ==
-The metadata scraper is used by the license deeds to extract embedded metadata from pages.  It currently scans the page for [[RDFa]] using [http://pypi.python.org/pypi/rdfadict rdfadict] and [http://www.w3.org/2007/08/pyRdfa/ pyRdfa].
+== Decommissioned ==
-== When it Happens ==
+The Creative Commons Metadata Scraper was decommissioned on Monday 8th January 2018. Due to the changing nature of the web, in particular the widespread adoption of HTTPS, the Scraper had seen greatly reduced use. For potential future directions in metadata presentation, see Creative Commons' [https://github.com/creativecommons/open-ledger Open Ledger] project.
-When a creator or copyright holder [http://creativecommons.org/license selects a license], they have the opportunity to provide additional metadata about there work.  This includes information such as creator and medium, as well as how the creator would like to be attributed.  If this information is provided, it is encoded in the HTML generated using [[RDFa]].
+== What it Did ==
-In order to provide more context relevant information for visitors, the license deeds load a script which looks at the referring page for metadata.  If found, the metadata is used to update the deed and display additional attribution or CC+ details.
+The metadata scraper was used by the license deeds to extract embedded metadata from pages.  It scanned the page for [[RDFa]] using [http://pypi.python.org/pypi/rdfadict rdfadict] and [http://www.w3.org/2007/08/pyRdfa/ pyRdfa].
+== When it Happened ==
+When a creator or copyright holder [http://creativecommons.org/license selects a license], they have the opportunity to provide additional metadata about their work.  This includes information such as creator and medium, as well as how the creator would like to be attributed. If this information is provided, it is encoded in the HTML generated using [[RDFa]].
+In order to provide more context relevant information for visitors, the license deeds loaded a script which looks at the referring page for metadata.  If found, the metadata was used to update the deed and display additional attribution or CC+ details.
 == Developer Information ==
-Metadata Scraper is implemented in Python using the [http://cherrypy.org CherryPy] web framework.  The source code is available in the <code>metadata_scraper</code> module in [[Source_Repository_Information#Subversion|subversion]].  You can [http://code.creativecommons.org/svnroot/metadata_scraper/ checkout] the source or [http://code.creativecommons.org/viewsvn/metadata_scraper/ browse it].
+Metadata Scraper was implemented in Python using the [http://cherrypy.org CherryPy] web framework.  The source code is available on GitHub at https://github.com/cc-archive/metadata_scraper .
 == Contact ==
-Contact [[User:Nathan Yergler|Nathan Yergler]] with questions about the scraper.
+Contact [[User:CCID-rob|Rob Myers]] with questions about the scraper.
 [[Category:Developer]]

Difference between revisions of "Metadata Scraper"

Latest revision as of 19:44, 8 January 2018

Contents

Decommissioned

What it Did

When it Happened

Developer Information

Contact

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

default links

wiki navigation

Tools