Difference between revisions of "Tracker CC Indexing"
(→Progress) |
|||
(35 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:Technology]] | ||
+ | [[Category:Developer]] | ||
+ | [[Category:metadata]] | ||
+ | [[Category:Tracker]] | ||
+ | {{template:Merge|Embedding Specifications}} | ||
+ | {{template:SMW}} | ||
+ | |||
== Google Summer of Code Project: “Indexing Embedded License Claims in Tracker” == | == Google Summer of Code Project: “Indexing Embedded License Claims in Tracker” == | ||
Here's some relevant (now revised) sections of the Summer of Code application: | Here's some relevant (now revised) sections of the Summer of Code application: | ||
− | == | + | == License Metadata Summary == |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | <table border="1"> | |
+ | <tr> | ||
+ | <td><strong>Format</strong></td><td><strong>Form of Metadata</strong></td><td><strong>Location of Metadata</strong></td><td><strong>Links</strong></td> | ||
+ | </tr> | ||
+ | <tr><td colspan="4"><strong>Audio</strong></td></tr> | ||
+ | <tr> | ||
+ | <td>MP3</td> | ||
+ | <td>XMP / Native id3 tags</td> | ||
+ | <td>The PRIV,XMP field / WCOP tag</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec] | ||
+ | [http://www.id3.org/id3v2.3.0 ID3v2.3 Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Vorbis</td> | ||
+ | <td>XMP / Native comment field</td> | ||
+ | <td>XMP comment field / LICENSE comment field</td> | ||
+ | <td>[http://xiph.org/vorbis/doc/v-comment.html Ogg Vorbis Docs]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>FLAC</td> | ||
+ | <td>Native comment fields (id3v2 or vorbis-style comments)</td> | ||
+ | <td>Same as with MP3 for id3v2 or Vorbis for vorbis-style comments</td> | ||
+ | <td>[http://flac.sourceforge.net/format.html#metadata_block_vorbis_comment FLAC Format Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Monkey's Audio (APE)</td> | ||
+ | <td>Native Vorbis-like comment field</td> | ||
+ | <td>AFAIK, there is no standard tag spec</td> | ||
+ | <td></td> | ||
+ | </tr> | ||
+ | <tr><td colspan="4"><strong>Images</strong></td></tr> | ||
+ | <tr> | ||
+ | <td>JPEG</td> | ||
+ | <td>XMP</td> | ||
+ | <td>APP1 Markers</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>JPEG 2000</td> | ||
+ | <td>XMP</td> | ||
+ | <td>UUID Box</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>TIFF</td> | ||
+ | <td>XMP</td> | ||
+ | <td>XMLPacket tag</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>PNG</td> | ||
+ | <td>XMP</td> | ||
+ | <td>iTXt, XML:com:adobe:xmp field</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>GIF</td> | ||
+ | <td>XMP</td> | ||
+ | <td>Application block</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>SVG</td> | ||
+ | <td>RDF</td> | ||
+ | <td>/svg/metadata/rdf</td> | ||
+ | <td>[http://wiki.creativecommons.org/SVG CC Wiki, SVG, based on Inkscape]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>PSD (Adobe Photoshop)</td> | ||
+ | <td>XMP</td> | ||
+ | <td>Resource block</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr><td colspan="4"><strong>Video</strong></td></tr> | ||
+ | <tr> | ||
+ | <td>AVI</td> | ||
+ | <td>?</td> | ||
+ | <td>?</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Matroska</td> | ||
+ | <td>Native tag</td> | ||
+ | <td>COPYRIGHT tag</td> | ||
+ | <td>[http://www.matroska.org/technical/specs/tagging/index.html Matroska Tagging Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Quicktime</td> | ||
+ | <td>Native tag</td> | ||
+ | <td>kMDItemCopyright(old)/kUserDataTextCopyright(new) tag</td> | ||
+ | <td>[http://developer.apple.com/documentation/QuickTime/Conceptual/QT7UpdateGuide/Chapter03/chapter_3_section_1.html#//apple_ref/doc/uid/TP40001163-CH314-553378 Quicktime 7 API Reference] | ||
+ | [http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>OGG</td> | ||
+ | <td>No metadata standard</td> | ||
+ | <td></td> | ||
+ | <td>[http://wiki.xiph.org/Metadata Ogg Metadata Draft]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Theora</td> | ||
+ | <td colspan="2">Theora comments (similar to Vorbis comments)</td> | ||
+ | <td>[http://www.theora.org/doc/Theora_I_spec.pdf Theora Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Flash</td> | ||
+ | <td>RDF</td> | ||
+ | <td>?</td> | ||
+ | <td></td> | ||
+ | </tr> | ||
+ | <tr><td colspan="4"><strong>Documents</strong></td></tr> | ||
+ | <tr> | ||
+ | <td>PDF</td> | ||
+ | <td>XMP</td> | ||
+ | <td>metadata field</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Postscript/EPS</td> | ||
+ | <td>XMP</td> | ||
+ | <td>Document-level metadata</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>HTML</td> | ||
+ | <td>RDFa</td> | ||
+ | <td><a rel="license" href="..."></a></td> | ||
+ | <td>[http://wiki.creativecommons.org/RDFa CC Wiki, RDFa]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>SMIL</td> | ||
+ | <td>RDF</td> | ||
+ | <td>/smil/head/metadata@id="meta-rdf"/RDF</td> | ||
+ | <td>[http://web.resource.org/cc/modules/smil/ CreativeCommons SMIL Module]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>RSS 1.0</td> | ||
+ | <td colspan="2">/RDF/channel/license or /RDF/channel/item/license</td> | ||
+ | <td>[http://web.resource.org/rss/1.0/modules/cc/ CreativeCommons RSS 1.0 Module]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>RSS 2.0</td> | ||
+ | <td colspan="2">/rss/channel/cc:license or /rss/channel/item/cc:license</td> | ||
+ | <td>[http://backend.userland.com/creativeCommonsRssModule CreativeCommons RSS 2.0 Module]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Atom</td> | ||
+ | <td colspan="2">/feed/entry/link@rel=license</td> | ||
+ | <td>[http://ietfreport.isoc.org/idref/draft-snell-atompub-feed-license/ Atom License Extension]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Any XML</td> | ||
+ | <td>XMP</td> | ||
+ | <td>Wherever valid</td> | ||
+ | <td>[http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf XMP Spec]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>OpenOffice.org (OASIS)</td> | ||
+ | <td colspan="2">OO.org CC License Add-In SoC Project is working on the spec</td> | ||
+ | <td></td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>MS Office (2003)</td> | ||
+ | <td>DocumentSummaryInformation Infile</td> | ||
+ | <td>CreativeCommons_LicenseURL property</td> | ||
+ | <td>[http://www.microsoft.com/downloads/details.aspx?FamilyID=113b53dd-1cc0-4fbe-9e1d-b91d07c76504&displaylang=en Office Add-in]</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>MS Office OpenXML (2007)</td> | ||
+ | <td>?</td> | ||
+ | <td>?</td> | ||
+ | <td>[http://www.ecma-international.org/publications/standards/Ecma-376.htm OpenXML Spec] | ||
− | + | [http://lists.ibiblio.org/pipermail/cc-devel/2007-June/000466.html Relevant mailing list post]</td> | |
+ | </tr> | ||
− | + | </table> | |
− | + | == Indexing Licenses in Tracker Summary == | |
− | |||
− | + | <table border="1"> | |
− | + | <tr> | |
− | http:// | + | <td><strong>Status</strong></td><td><strong>Format</strong></td><td><strong>Extraction Method</strong></td><td><strong>Test content</strong></td> |
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Done, GStreamer patch pending</td> | ||
+ | <td>MP3</td> | ||
+ | <td>Reading native tags already complete. Maybe extend GStreamer extractor to read XMP.</td> | ||
+ | <td>XMP embedded with Exempi / Tags embedded with id3v2</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>In progress</td> | ||
+ | <td>Vorbis</td> | ||
+ | <td>Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10).</td> | ||
+ | <td>XMP embedded with vorbiscomment</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Done, GStreamer patch pending</td> | ||
+ | <td>FLAC</td> | ||
+ | <td>Native tags already extracted through the GStreamer extractor. Maybe extend GStreamer extractor to read XMP.</td> | ||
+ | <td>embedded with id3v2 or metaflac</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>In progress</td> | ||
+ | <td>JPEG</td> | ||
+ | <td>Extend the Imagemagick extractor, using 'convert file.jpg xmp:-' to read XMP</td> | ||
+ | <td>XMP embedded with Exempi</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Done</td> | ||
+ | <td>TIFF</td> | ||
+ | <td>Extend the Imagemagick extractor, using 'convert file.tif xmp:-' to read XMP</td> | ||
+ | <td>XMP embedded with Exempi (Note: there's a bug in Adobe's XMP SDK that prevents Exempi from embedding valid XMP)</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Done/td> | ||
+ | <td>PNG</td> | ||
+ | <td>Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.)</td> | ||
+ | <td>XMP embedded with Exempi</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>In progress</td> | ||
+ | <td>GIF</td> | ||
+ | <td>Would need to write a GIF extractor</td> | ||
+ | <td>Palimpsest</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Done</td> | ||
+ | <td>PDF</td> | ||
+ | <td>Extend the current PDF extractor (which uses Poppler) to read the metadata field.</td> | ||
+ | <td>XMP embedded with Exempi</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Done</td> | ||
+ | <td>HTML</td> | ||
+ | <td>Write a new HTML extractor, using libxml2, and scan for RDFa</td> | ||
+ | <td>Various actual sites, including creativecommons.org</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>In progress</td> | ||
+ | <td>SVG</td> | ||
+ | <td>I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also???</td> | ||
+ | <td>Inkscape</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td></td> | ||
+ | <td>Any XML</td> | ||
+ | <td>Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2</td> | ||
+ | <td></td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Awaiting spec</td> | ||
+ | <td>OpenOffice.org (OASIS)</td> | ||
+ | <td>Extend OASIS extractor</td> | ||
+ | <td>OO.org Add-In</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>Done</td> | ||
+ | <td>MS Office (old format)</td> | ||
+ | <td>Extend existing msoffice extractor</td> | ||
+ | <td>[http://www.microsoft.com/downloads/details.aspx?FamilyID=113b53dd-1cc0-4fbe-9e1d-b91d07c76504&displaylang=en MSOffice Add-in]</td> | ||
+ | </tr> | ||
+ | </table> |
Latest revision as of 18:45, 21 September 2007
This article has been identified as a candidate for merging with Embedding Specifications.
The contents of this article have been identified as candidates for conversion to Semantic Markup. You can help Creative Commons by splitting the article and constructing ask queries. See the Semantic MediaWiki page for more information.
Google Summer of Code Project: “Indexing Embedded License Claims in Tracker”
Here's some relevant (now revised) sections of the Summer of Code application:
License Metadata Summary
Format | Form of Metadata | Location of Metadata | Links |
Audio | |||
MP3 | XMP / Native id3 tags | The PRIV,XMP field / WCOP tag | XMP Spec ID3v2.3 Spec |
Vorbis | XMP / Native comment field | XMP comment field / LICENSE comment field | Ogg Vorbis Docs |
FLAC | Native comment fields (id3v2 or vorbis-style comments) | Same as with MP3 for id3v2 or Vorbis for vorbis-style comments | FLAC Format Spec |
Monkey's Audio (APE) | Native Vorbis-like comment field | AFAIK, there is no standard tag spec | |
Images | |||
JPEG | XMP | APP1 Markers | XMP Spec |
JPEG 2000 | XMP | UUID Box | XMP Spec |
TIFF | XMP | XMLPacket tag | XMP Spec |
PNG | XMP | iTXt, XML:com:adobe:xmp field | XMP Spec |
GIF | XMP | Application block | XMP Spec |
SVG | RDF | /svg/metadata/rdf | CC Wiki, SVG, based on Inkscape |
PSD (Adobe Photoshop) | XMP | Resource block | XMP Spec |
Video | |||
AVI | ? | ? | XMP Spec |
Matroska | Native tag | COPYRIGHT tag | Matroska Tagging Spec |
Quicktime | Native tag | kMDItemCopyright(old)/kUserDataTextCopyright(new) tag | Quicktime 7 API Reference XMP Spec |
OGG | No metadata standard | Ogg Metadata Draft | |
Theora | Theora comments (similar to Vorbis comments) | Theora Spec | |
Flash | RDF | ? | |
Documents | |||
XMP | metadata field | XMP Spec | |
Postscript/EPS | XMP | Document-level metadata | XMP Spec |
HTML | RDFa | <a rel="license" href="..."></a> | CC Wiki, RDFa |
SMIL | RDF | /smil/head/metadata@id="meta-rdf"/RDF | CreativeCommons SMIL Module |
RSS 1.0 | /RDF/channel/license or /RDF/channel/item/license | CreativeCommons RSS 1.0 Module | |
RSS 2.0 | /rss/channel/cc:license or /rss/channel/item/cc:license | CreativeCommons RSS 2.0 Module | |
Atom | /feed/entry/link@rel=license | Atom License Extension | |
Any XML | XMP | Wherever valid | XMP Spec |
OpenOffice.org (OASIS) | OO.org CC License Add-In SoC Project is working on the spec | ||
MS Office (2003) | DocumentSummaryInformation Infile | CreativeCommons_LicenseURL property | Office Add-in |
MS Office OpenXML (2007) | ? | ? | OpenXML Spec Relevant mailing list post |
Indexing Licenses in Tracker Summary
Status | Format | Extraction Method | Test content |
Done, GStreamer patch pending | MP3 | Reading native tags already complete. Maybe extend GStreamer extractor to read XMP. | XMP embedded with Exempi / Tags embedded with id3v2 |
In progress | Vorbis | Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10). | XMP embedded with vorbiscomment |
Done, GStreamer patch pending | FLAC | Native tags already extracted through the GStreamer extractor. Maybe extend GStreamer extractor to read XMP. | embedded with id3v2 or metaflac |
In progress | JPEG | Extend the Imagemagick extractor, using 'convert file.jpg xmp:-' to read XMP | XMP embedded with Exempi |
Done | TIFF | Extend the Imagemagick extractor, using 'convert file.tif xmp:-' to read XMP | XMP embedded with Exempi (Note: there's a bug in Adobe's XMP SDK that prevents Exempi from embedding valid XMP) |
Done/td> | PNG | Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.) | XMP embedded with Exempi |
In progress | GIF | Would need to write a GIF extractor | Palimpsest |
Done | Extend the current PDF extractor (which uses Poppler) to read the metadata field. | XMP embedded with Exempi | |
Done | HTML | Write a new HTML extractor, using libxml2, and scan for RDFa | Various actual sites, including creativecommons.org |
In progress | SVG | I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also??? | Inkscape |
Any XML | Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2 | ||
Awaiting spec | OpenOffice.org (OASIS) | Extend OASIS extractor | OO.org Add-In |
Done | MS Office (old format) | Extend existing msoffice extractor | MSOffice Add-in |