Flickr Autocuration

From Creative Commons
Revision as of 18:17, 3 March 2008 by Paulproteus (talk | contribs) (Process)
Jump to: navigation, search


Overview

LiveContent 2.0 incorporates the "autocuration" process for the photo-sharing website Flickr.com. Autocuration automatically pulls down CC-licensed photos from Flickr for inclusion in the LiveContent daily build. Developers use Flickr's API to set up the content autocuration.

You can look at our code for this in Subversion.

Process

When LiveContent 2.0 is built, the build script (kickstart file) talks to the autocuration package. The autocuration program:

  • Asks Flickr.com's API for the top 500 "Interesting" photos
  • Removes the non-CC licensed photos
  • Asks the API to provide author (username + real name) information of the photos
  • Generates URLs that can point someone back to the photo on flickr.com
  • Asks the API to help it find the largest available size for the photo
  • Downloads all the photos, and saves the metadata (like URL, author name, photo name) into a separate file.

Concerns

  • Attribution string: It would be nice if Flickr let users suggest a particular attribution string for downstream CC license users to use.
  • API keys: Right now, Flickr requires an API key to use the autocuration program. This is not a difficult set of operations for Flickr to perform, so it would be nice if anyone could use this without having to register first with Flickr.
  • Flickr sometimes returns invalid XML, usually due to text encoding issues. I had to write flickrmonkey.py (available in Subversion) to work around it, and my workaround isn't perfect. In the most recent version of flickrmonkey, I carefully only modify the XML if it does not parse; that avoids me creating problems where they did not exist.

Recommendations for other web services

  • Make it easy to get a list of CC-licensed content that you recommend!
    • Note that "most recent uploads" is not the same as what you'd recommend. Recommendable work probably has a lot of viewers or has been hand-selected.
    • You can implement this as an Atom/RSS feed that links to "interesting"/recommendable content rather than just the top
  • Make sure it's easy for us to detect which license different works are under.
  • Don't generate broken XML or feeds. I'm happy to help you diagnose issues related to XML or feed correctness.

Make sure we can get, for any work you want us to put on the disc:

  • The title of the work
  • A permanent link to information about the work (like an album page on the web)
  • The author's name and other attribution information
  • The CC license the work is under

Above and beyond

If you'd like, you can write the autocurate module for your web service. It should be easy to write if you know Python - a readable demo is in autocurate_flickr.py. That would make it extremely easy for us to integrate your site into the next LiveContent release.

Questions?

If you have any questions, please email me - asheesh at creativecommons.org.