Tracker CC Indexing

Google Summer of Code Project: “Indexing Embedded License Claims in Tracker”

Here's some relevant (now revised) sections of the Summer of Code application:

License Metadata Summary

Format	Form of Metadata	Location of Metadata	Links
Audio
MP3	XMP / Native id3 tags	The PRIV,XMP field / WCOP tag	XMP Spec ID3v2.3 Spec
Vorbis	XMP / Native comment field	XMP comment field / LICENSE comment field	Ogg Vorbis Docs
FLAC	Native comment fields (id3v2 or vorbis-style comments)	Same as with MP3 for id3v2 or Vorbis for vorbis-style comments	FLAC Format Spec
Monkey's Audio (APE)	Native Vorbis-like comment field	AFAIK, there is no standard tag spec
Images
JPEG	XMP	APP1 Markers	XMP Spec
JPEG-2000	XMP	UUID Box	XMP Spec
TIFF	XMP	XMLPacket tag	XMP Spec
PNG	XMP	iTXt, XML:com:adobe:xmp field	XMP Spec
GIF	XMP	Application block	XMP Spec
PSD (Adobe Photoshop)	XMP	Resource block	XMP Spec
Video
AVI	?	?	XMP Spec
Matroska	Native tag	COPYRIGHT tag	Matroska Tagging Spec
Quicktime	Native tag	kMDItemCopyright(old)/kUserDataTextCopyright(new) tag	Quicktime 7 API Reference XMP Spec
OGG	No metadata standard		Ogg Metadata Draft
Documents
PDF	XMP	metadata field	XMP Spec
Postscript/EPS	XMP	Document-level metadata	XMP Spec
HTML	RDFa	<a rel="license" href="..."></a>	CC Wiki, RDFa
SMIL	RDF	/smil/head/metadata@id="meta-rdf"/RDF	CreativeCommons SMIL Module
SVG	RDF	/svg/metadata/rdf	CC Wiki, SVG, based on Inkscape
Any XML	XMP	Wherever valid	XMP Spec
OpenOffice.org (OASIS)	OO.org CC License Add-In SoC Project is working on the spec
MS Office	DocumentSummaryInformation Infile	CreativeCommons_LicenseURL property	Office Add-in

Indexing Licenses in Tracker Summary

Status	Format	Extraction Method	Test content
Done, GStreamer patch pending	MP3	Reading native tags already complete. Maybe extend GStreamer extractor to read XMP.	XMP embedded with Exempi / Tags embedded with id3v2
In progress	Vorbis	Extend the GStreamer extractor to check for the presence of an XMP comment field. GStreamer places this within the EXTENDED_COMMENTS tag (requires GStreamer 0.10.10).	XMP embedded with vorbiscomment
Done, GStreamer patch pending	FLAC	Native tags already extracted through the GStreamer extractor. Maybe extend GStreamer extractor to read XMP.	embedded with id3v2 or metaflac
In progress	JPEG	Extend the Imagemagick extractor, using 'convert file.jpg xmp:-' to read XMP	XMP embedded with Exempi
Patch pending	TIFF	Extend the Imagemagick extractor, using 'convert file.tif xmp:-' to read XMP	XMP embedded with Exempi (Note: there's a bug in Adobe's XMP SDK that prevents Exempi from embedding valid XMP)
Pending release of libpng-1.3/Pending Tracker patch	PNG	Extend the PNG extractor, adding a check for XML:com:adobe:xmp. (For backwards compatibility, the ability to read iTXt in libpng is disabled by default until version 1.3.)	XMP embedded with Exempi
In progress	GIF	Would need to write a GIF extractor	Palimpsest
Poppler/Tracker patch pending	PDF	Extend the current PDF extractor (which uses Poppler) to read the metadata field.	XMP embedded with Exempi
Patch pending	HTML	Write a new HTML extractor, using libxml2, and scan for RDFa	Various actual sites, including creativecommons.org
In progress	SVG	I could specifically parse the XML, checking for the RDF schema used by Inkscape. Should I check for XMP also???	Inkscape
	Any XML	Write a generic XML extractor (and/or extractor for each particular format), scanning with libxml2
Awaiting spec	OpenOffice.org (OASIS)	Extend OASIS extractor	OO.org Add-In
Patch pending	MS Office	Extend existing msoffice extractor	MSOffice Add-in

Timeline

UPDATE: It's going to be difficult to estimate progress of when I'll do what. Patches are being sent upstream of various dependencies of Tracker. Completion of indexing of particular formats varies as I await approval of patches and feedback from upstream, as well as Jamie's approval of patches into Tracker itself. I'll be working on various aspects of the project in parallel as I await feedback/approval on a particular aspect.

Check out the above table of progress for where I'm at with what.

By week 12: Tie things up for submission of code

Implementation

XMP

The following is an excerpt of raw XMP describing a work licensed under the CC Attribution 3.0 license.

<?xpacket begin= id=?><x:xmpmeta xmlns:x='adobe:ns:meta/'> <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

<rdf:Description rdf:about=
 xmlns:xapRights='http://ns.adobe.com/xap/1.0/rights/'>
 <xapRights:Marked>True</xapRights:Marked>
</rdf:Description>

<rdf:Description rdf:about=
 xmlns:dc='http://purl.org/dc/elements/1.1/'>
 <dc:rights>
  <rdf:Alt>
   <rdf:li xml:lang='x-default' >This work is licensed under a Creative Commons
Attribution 3.0  License.</rdf:li>
  </rdf:Alt>
 </dc:rights>
</rdf:Description>

<rdf:Description rdf:about=
 xmlns:cc='http://web.resource.org/cc/'>
 <cc:license rdf:resource='http://creativecommons.org/licenses/by/3.0/'/>
</rdf:Description>

</rdf:RDF> </x:xmpmeta> <?xpacket end='r'?>

To fit Tracker's metadata structure, the extracted license claim would be placed into the File:License field in a form such as: "This work is licensed under a Creative Commons Attribution 3.0 License (http://creativecommons.org/licenses/by/3.0/)". Such a format allows for searching the license field by both license name and URL.

Specifically,

/rdf:Description[@xmlns:dc='http://purl.org/dc/elements/1.1/']/dc:rights/rdf:Alt/rdf:li/text()

and

/rdf:Description[@xmlns:cc='http://web.resource.org/cc/']/cc:license/attribute:rdf:resource

Will be stored in Tracker's File:License field.

Links

Related works:

 cclookup (http://wiki.creativecommons.org/CcLookup) - Python application for extracting license RDF metadata or license metadata from mp3s. Code may be adapted for parsing license claims in C
 ccpublisher (http://wiki.creativecommons.org/CcPublisher) - Licenses embedded by ccpublisher should all be correctly extracted.

Tracker CC Indexing

Contents

Google Summer of Code Project: “Indexing Embedded License Claims in Tracker”

License Metadata Summary

Indexing Licenses in Tracker Summary

Timeline

Implementation

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

default links

wiki navigation

Tools