Difference between revisions of "Tracker CC Indexing"
(→Progress) |
|||
(17 intermediate revisions by 4 users not shown) | |||
Line 2: | Line 2: | ||
[[Category:Developer]] | [[Category:Developer]] | ||
[[Category:metadata]] | [[Category:metadata]] | ||
+ | [[Category:Tracker]] | ||
+ | {{template:Merge|Embedding Specifications}} | ||
+ | {{template:SMW}} | ||
== Google Summer of Code Project: “Indexing Embedded License Claims in Tracker” == | == Google Summer of Code Project: “Indexing Embedded License Claims in Tracker” == | ||
Line 7: | Line 10: | ||
Here's some relevant (now revised) sections of the Summer of Code application: | Here's some relevant (now revised) sections of the Summer of Code application: | ||
− | == | + | == License Metadata Summary == |
− | |||
<table border="1"> | <table border="1"> | ||
<tr> | <tr> | ||
<td><strong>Format</strong></td><td><strong>Form of Metadata</strong></td><td><strong>Location of Metadata</strong></td><td><strong>Links</strong></td> | <td><strong>Format</strong></td><td><strong>Form of Metadata</strong></td><td><strong>Location of Metadata</strong></td><td><strong>Links</strong></td> | ||
</tr> | </tr> | ||
+ | <tr><td colspan="4"><strong>Audio</strong></td></tr> | ||
<tr> | <tr> | ||
<td>MP3</td> | <td>MP3</td> | ||
Line 33: | Line 36: | ||
<td>[http://flac.sourceforge.net/format.html#metadata_block_vorbis_comment FLAC Format Spec]</td> | <td>[http://flac.sourceforge.net/format.html#metadata_block_vorbis_comment FLAC Format Spec]</td> | ||
</tr> | </tr> | ||
+ | <tr> | ||
+ | <td>Monkey's Audio (APE)</td> | ||
+ | <td>Native Vorbis-like comment field</td> | ||
+ | <td>AFAIK, there is no standard tag spec</td> | ||
+ | <td></td> | ||
+ | </tr> | ||
+ | <tr><td colspan="4"><strong>Images</strong></td></tr> | ||
<tr> | <tr> | ||
<td>JPEG</td> | <td>JPEG</td> | ||
Line 40: | Line 50: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>JPEG | + | <td>JPEG 2000</td> |
<td>XMP</td> | <td>XMP</td> | ||
<td>UUID Box</td> | <td>UUID Box</td> | ||
Line 62: | Line 72: | ||
<td>Application block</td> | <td>Application block</td> | ||
<td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>SVG</td> | ||
+ | <td>RDF</td> | ||
+ | <td>/svg/metadata/rdf</td> | ||
+ | <td>[http://wiki.creativecommons.org/SVG CC Wiki, SVG, based on Inkscape]</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
Line 67: | Line 83: | ||
<td>XMP</td> | <td>XMP</td> | ||
<td>Resource block</td> | <td>Resource block</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr><td colspan="4"><strong>Video</strong></td></tr> | ||
+ | <tr> | ||
+ | <td>AVI</td> | ||
+ | <td>?</td> | ||
+ | <td>?</td> | ||
<td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
</tr> | </tr> | ||
Line 79: | Line 102: | ||
<td>Native tag</td> | <td>Native tag</td> | ||
<td>kMDItemCopyright(old)/kUserDataTextCopyright(new) tag</td> | <td>kMDItemCopyright(old)/kUserDataTextCopyright(new) tag</td> | ||
− | <td>[http://developer.apple.com/documentation/QuickTime/Conceptual/QT7UpdateGuide/Chapter03/chapter_3_section_1.html#//apple_ref/doc/uid/TP40001163-CH314-553378 Quicktime 7 API Reference]</td> | + | <td>[http://developer.apple.com/documentation/QuickTime/Conceptual/QT7UpdateGuide/Chapter03/chapter_3_section_1.html#//apple_ref/doc/uid/TP40001163-CH314-553378 Quicktime 7 API Reference] |
+ | [http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>OGG</td> | ||
+ | <td>No metadata standard</td> | ||
+ | <td></td> | ||
+ | <td>[http://wiki.xiph.org/Metadata Ogg Metadata Draft]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Theora</td> | ||
+ | <td colspan="2">Theora comments (similar to Vorbis comments)</td> | ||
+ | <td>[http://www.theora.org/doc/Theora_I_spec.pdf Theora Spec]</td> | ||
</tr> | </tr> | ||
+ | <tr> | ||
+ | <td>Flash</td> | ||
+ | <td>RDF</td> | ||
+ | <td>?</td> | ||
+ | <td></td> | ||
+ | </tr> | ||
+ | <tr><td colspan="4"><strong>Documents</strong></td></tr> | ||
<tr> | <tr> | ||
<td>PDF</td> | <td>PDF</td> | ||
Line 100: | Line 142: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>SMIL</td> |
<td>RDF</td> | <td>RDF</td> | ||
− | <td>/ | + | <td>/smil/head/metadata@id="meta-rdf"/RDF</td> |
− | <td>[http:// | + | <td>[http://web.resource.org/cc/modules/smil/ CreativeCommons SMIL Module]</td> |
+ | </tr> | ||
+ | <tr> | ||
+ | <td>RSS 1.0</td> | ||
+ | <td colspan="2">/RDF/channel/license or /RDF/channel/item/license</td> | ||
+ | <td>[http://web.resource.org/rss/1.0/modules/cc/ CreativeCommons RSS 1.0 Module]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>RSS 2.0</td> | ||
+ | <td colspan="2">/rss/channel/cc:license or /rss/channel/item/cc:license</td> | ||
+ | <td>[http://backend.userland.com/creativeCommonsRssModule CreativeCommons RSS 2.0 Module]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Atom</td> | ||
+ | <td colspan="2">/feed/entry/link@rel=license</td> | ||
+ | <td>[http://ietfreport.isoc.org/idref/draft-snell-atompub-feed-license/ Atom License Extension]</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
Line 117: | Line 174: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>MS Office</td> | + | <td>MS Office (2003)</td> |
<td>DocumentSummaryInformation Infile</td> | <td>DocumentSummaryInformation Infile</td> | ||
<td>CreativeCommons_LicenseURL property</td> | <td>CreativeCommons_LicenseURL property</td> | ||
<td>[http://www.microsoft.com/downloads/details.aspx?FamilyID=113b53dd-1cc0-4fbe-9e1d-b91d07c76504&displaylang=en Office Add-in]</td> | <td>[http://www.microsoft.com/downloads/details.aspx?FamilyID=113b53dd-1cc0-4fbe-9e1d-b91d07c76504&displaylang=en Office Add-in]</td> | ||
</tr> | </tr> | ||
+ | <tr> | ||
+ | <td>MS Office OpenXML (2007)</td> | ||
+ | <td>?</td> | ||
+ | <td>?</td> | ||
+ | <td>[http://www.ecma-international.org/publications/standards/Ecma-376.htm OpenXML Spec] | ||
+ | |||
+ | [http://lists.ibiblio.org/pipermail/cc-devel/2007-June/000466.html Relevant mailing list post]</td> | ||
+ | </tr> | ||
+ | |||
</table> | </table> | ||
+ | == Indexing Licenses in Tracker Summary == | ||
− | |||
<table border="1"> | <table border="1"> | ||
<tr> | <tr> | ||
Line 155: | Line 221: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>Done</td> |
<td>TIFF</td> | <td>TIFF</td> | ||
<td>Extend the Imagemagick extractor, using 'convert file.tif xmp:-' to read XMP</td> | <td>Extend the Imagemagick extractor, using 'convert file.tif xmp:-' to read XMP</td> | ||
Line 161: | Line 227: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>Done/td> |
<td>PNG</td> | <td>PNG</td> | ||
<td>Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.)</td> | <td>Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.)</td> | ||
Line 173: | Line 239: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>Done</td> |
<td>PDF</td> | <td>PDF</td> | ||
<td>Extend the current PDF extractor (which uses Poppler) to read the metadata field.</td> | <td>Extend the current PDF extractor (which uses Poppler) to read the metadata field.</td> | ||
Line 179: | Line 245: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>Done</td> |
<td>HTML</td> | <td>HTML</td> | ||
<td>Write a new HTML extractor, using libxml2, and scan for RDFa</td> | <td>Write a new HTML extractor, using libxml2, and scan for RDFa</td> | ||
Line 203: | Line 269: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>Done</td> |
− | <td>MS Office</td> | + | <td>MS Office (old format)</td> |
<td>Extend existing msoffice extractor</td> | <td>Extend existing msoffice extractor</td> | ||
<td>[http://www.microsoft.com/downloads/details.aspx?FamilyID=113b53dd-1cc0-4fbe-9e1d-b91d07c76504&displaylang=en MSOffice Add-in]</td> | <td>[http://www.microsoft.com/downloads/details.aspx?FamilyID=113b53dd-1cc0-4fbe-9e1d-b91d07c76504&displaylang=en MSOffice Add-in]</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 18:45, 21 September 2007
This article has been identified as a candidate for merging with Embedding Specifications.
The contents of this article have been identified as candidates for conversion to Semantic Markup. You can help Creative Commons by splitting the article and constructing ask queries. See the Semantic MediaWiki page for more information.
Google Summer of Code Project: “Indexing Embedded License Claims in Tracker”
Here's some relevant (now revised) sections of the Summer of Code application:
License Metadata Summary
Format | Form of Metadata | Location of Metadata | Links |
Audio | |||
MP3 | XMP / Native id3 tags | The PRIV,XMP field / WCOP tag | XMP Spec ID3v2.3 Spec |
Vorbis | XMP / Native comment field | XMP comment field / LICENSE comment field | Ogg Vorbis Docs |
FLAC | Native comment fields (id3v2 or vorbis-style comments) | Same as with MP3 for id3v2 or Vorbis for vorbis-style comments | FLAC Format Spec |
Monkey's Audio (APE) | Native Vorbis-like comment field | AFAIK, there is no standard tag spec | |
Images | |||
JPEG | XMP | APP1 Markers | XMP Spec |
JPEG 2000 | XMP | UUID Box | XMP Spec |
TIFF | XMP | XMLPacket tag | XMP Spec |
PNG | XMP | iTXt, XML:com:adobe:xmp field | XMP Spec |
GIF | XMP | Application block | XMP Spec |
SVG | RDF | /svg/metadata/rdf | CC Wiki, SVG, based on Inkscape |
PSD (Adobe Photoshop) | XMP | Resource block | XMP Spec |
Video | |||
AVI | ? | ? | XMP Spec |
Matroska | Native tag | COPYRIGHT tag | Matroska Tagging Spec |
Quicktime | Native tag | kMDItemCopyright(old)/kUserDataTextCopyright(new) tag | Quicktime 7 API Reference XMP Spec |
OGG | No metadata standard | Ogg Metadata Draft | |
Theora | Theora comments (similar to Vorbis comments) | Theora Spec | |
Flash | RDF | ? | |
Documents | |||
XMP | metadata field | XMP Spec | |
Postscript/EPS | XMP | Document-level metadata | XMP Spec |
HTML | RDFa | <a rel="license" href="..."></a> | CC Wiki, RDFa |
SMIL | RDF | /smil/head/metadata@id="meta-rdf"/RDF | CreativeCommons SMIL Module |
RSS 1.0 | /RDF/channel/license or /RDF/channel/item/license | CreativeCommons RSS 1.0 Module | |
RSS 2.0 | /rss/channel/cc:license or /rss/channel/item/cc:license | CreativeCommons RSS 2.0 Module | |
Atom | /feed/entry/link@rel=license | Atom License Extension | |
Any XML | XMP | Wherever valid | XMP Spec |
OpenOffice.org (OASIS) | OO.org CC License Add-In SoC Project is working on the spec | ||
MS Office (2003) | DocumentSummaryInformation Infile | CreativeCommons_LicenseURL property | Office Add-in |
MS Office OpenXML (2007) | ? | ? | OpenXML Spec Relevant mailing list post |
Indexing Licenses in Tracker Summary
Status | Format | Extraction Method | Test content |
Done, GStreamer patch pending | MP3 | Reading native tags already complete. Maybe extend GStreamer extractor to read XMP. | XMP embedded with Exempi / Tags embedded with id3v2 |
In progress | Vorbis | Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10). | XMP embedded with vorbiscomment |
Done, GStreamer patch pending | FLAC | Native tags already extracted through the GStreamer extractor. Maybe extend GStreamer extractor to read XMP. | embedded with id3v2 or metaflac |
In progress | JPEG | Extend the Imagemagick extractor, using 'convert file.jpg xmp:-' to read XMP | XMP embedded with Exempi |
Done | TIFF | Extend the Imagemagick extractor, using 'convert file.tif xmp:-' to read XMP | XMP embedded with Exempi (Note: there's a bug in Adobe's XMP SDK that prevents Exempi from embedding valid XMP) |
Done/td> | PNG | Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.) | XMP embedded with Exempi |
In progress | GIF | Would need to write a GIF extractor | Palimpsest |
Done | Extend the current PDF extractor (which uses Poppler) to read the metadata field. | XMP embedded with Exempi | |
Done | HTML | Write a new HTML extractor, using libxml2, and scan for RDFa | Various actual sites, including creativecommons.org |
In progress | SVG | I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also??? | Inkscape |
Any XML | Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2 | ||
Awaiting spec | OpenOffice.org (OASIS) | Extend OASIS extractor | OO.org Add-In |
Done | MS Office (old format) | Extend existing msoffice extractor | MSOffice Add-in |