Flickr Autocuration

From Creative Commons
Jump to: navigation, search


Overview

LiveContent 2.0 incorporates the "autocuration" process for the photo-sharing website Flickr.com. Autocuration automatically pulls down CC-licensed photos from Flickr for inclusion in the LiveContent daily build. Developers use Flickr's API to set up the content autocuration.

You can look at our code for this in Subversion.

Autocuration happens when a CD image is made by Creative Commons; when you burn a LiveContent disc, it will contain the autocurated photos from the day the disc was created, not the day you downloaded it. This way, the LiveContent disc can start up and show you the photos automatically selected without having to have Internet access.

Process

When LiveContent 2.0 is built, the build script (kickstart file) talks to the autocuration package. The autocuration program:

  • Asks Flickr.com's API for the top 500 "Interesting" photos
  • Removes the non-CC licensed photos
  • Aks 4shared information website
  • Asks the API to provide author (username + real name) information of the photos
  • Generates URLs that can point someone back to the photo on flickr.com
  • Asks the API to help it find the largest available size for the photo
  • Downloads all the photos, and saves the metadata (like URL, author name, photo name) into a separate file.

Concerns

  • Attribution string: It would be nice if Flickr let users suggest a particular attribution string for downstream CC license users to use.
  • API keys: Right now, Flickr requires an API key to use the autocuration program. This is not a difficult set of operations for Flickr to perform, so it would be nice if anyone could use this without having to register first with Flickr.
  • Flickr sometimes returns invalid XML, usually due to text encoding issues. I had to write flickrmonkey.py (available in Subversion) to work around it, and my workaround isn't perfect. In the most recent version of flickrmonkey, I carefully only modify the XML if it does not parse; that avoids me creating problems where they did not exist.


Questions?

If you have any questions, please email me - asheesh at creativecommons.org.