Tracker CC Indexing
Revision as of 05:16, 10 July 2007 by Jason Kivlighn (talk | contribs)
Google Summer of Code Project: “Indexing Embedded License Claims in Tracker”
Here's some relevant (now revised) sections of the Summer of Code application:
License Metadata Summary
Format | Form of Metadata | Location of Metadata | Links |
Audio | |||
MP3 | XMP / Native id3 tags | The PRIV,XMP field / WCOP tag | XMP Spec ID3v2.3 Spec |
Vorbis | XMP / Native comment field | XMP comment field / LICENSE comment field | Ogg Vorbis Docs |
FLAC | Native comment fields (id3v2 or vorbis-style comments) | Same as with MP3 for id3v2 or Vorbis for vorbis-style comments | FLAC Format Spec |
Monkey's Audio (APE) | Native Vorbis-like comment field | AFAIK, there is no standard tag spec | |
Images | |||
JPEG | XMP | APP1 Markers | XMP Spec |
JPEG 2000 | XMP | UUID Box | XMP Spec |
TIFF | XMP | XMLPacket tag | XMP Spec |
PNG | XMP | iTXt, XML:com:adobe:xmp field | XMP Spec |
GIF | XMP | Application block | XMP Spec |
SVG | RDF | /svg/metadata/rdf | CC Wiki, SVG, based on Inkscape |
PSD (Adobe Photoshop) | XMP | Resource block | XMP Spec |
Video | |||
AVI | ? | ? | XMP Spec |
Matroska | Native tag | COPYRIGHT tag | Matroska Tagging Spec |
Quicktime | Native tag | kMDItemCopyright(old)/kUserDataTextCopyright(new) tag | Quicktime 7 API Reference XMP Spec |
OGG | No metadata standard | Ogg Metadata Draft | |
Theora | Theora comments (similar to Vorbis comments) | Theora Spec | |
Flash | RDF | ? | |
Documents | |||
XMP | metadata field | XMP Spec | |
Postscript/EPS | XMP | Document-level metadata | XMP Spec |
HTML | RDFa | <a rel="license" href="..."></a> | CC Wiki, RDFa |
SMIL | RDF | /smil/head/metadata@id="meta-rdf"/RDF | CreativeCommons SMIL Module |
RSS 1.0 | /RDF/channel/license or /RDF/channel/item/license | CreativeCommons RSS 1.0 Module | |
RSS 2.0 | /rss/channel/cc:license or /rss/channel/item/cc:license | CreativeCommons RSS 2.0 Module | |
Atom | /feed/entry/link@rel=license | Atom License Extension | |
Any XML | XMP | Wherever valid | XMP Spec |
OpenOffice.org (OASIS) | OO.org CC License Add-In SoC Project is working on the spec | ||
MS Office (2003) | DocumentSummaryInformation Infile | CreativeCommons_LicenseURL property | Office Add-in |
MS Office OpenXML (2007) | ? | ? | OpenXML Spec Relevant mailing list post |
Indexing Licenses in Tracker Summary
Status | Format | Extraction Method | Test content |
Done, GStreamer patch pending | MP3 | Reading native tags already complete. Maybe extend GStreamer extractor to read XMP. | XMP embedded with Exempi / Tags embedded with id3v2 |
In progress | Vorbis | Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10). | XMP embedded with vorbiscomment |
Done, GStreamer patch pending | FLAC | Native tags already extracted through the GStreamer extractor. Maybe extend GStreamer extractor to read XMP. | embedded with id3v2 or metaflac |
In progress | JPEG | Extend the Imagemagick extractor, using 'convert file.jpg xmp:-' to read XMP | XMP embedded with Exempi |
Done | TIFF | Extend the Imagemagick extractor, using 'convert file.tif xmp:-' to read XMP | XMP embedded with Exempi (Note: there's a bug in Adobe's XMP SDK that prevents Exempi from embedding valid XMP) |
Done/td> | PNG | Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.) | XMP embedded with Exempi |
In progress | GIF | Would need to write a GIF extractor | Palimpsest |
Done | Extend the current PDF extractor (which uses Poppler) to read the metadata field. | XMP embedded with Exempi | |
Done | HTML | Write a new HTML extractor, using libxml2, and scan for RDFa | Various actual sites, including creativecommons.org |
In progress | SVG | I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also??? | Inkscape |
Any XML | Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2 | ||
Awaiting spec | OpenOffice.org (OASIS) | Extend OASIS extractor | OO.org Add-In |
Done | MS Office (old format) | Extend existing msoffice extractor | MSOffice Add-in |