Difference between revisions of "Flickr Autocuration"

From Creative Commons
Jump to: navigation, search
(setting up skeleton autocuration documentation for asheesh)
 
m (Reverted edits by Deyong (talk) to last revision by Arikos)
 
(9 intermediate revisions by 4 users not shown)
Line 2: Line 2:
 
[[Category:Developer]]
 
[[Category:Developer]]
 
[[Category:Documentation]]
 
[[Category:Documentation]]
 +
 +
== Overview ==
  
 
LiveContent 2.0 incorporates the "autocuration" process for the photo-sharing website Flickr.com. Autocuration automatically pulls down CC-licensed photos from Flickr for inclusion in the LiveContent daily build. Developers use [http://www.flickr.com/services/api/ Flickr's API] to set up the content autocuration.
 
LiveContent 2.0 incorporates the "autocuration" process for the photo-sharing website Flickr.com. Autocuration automatically pulls down CC-licensed photos from Flickr for inclusion in the LiveContent daily build. Developers use [http://www.flickr.com/services/api/ Flickr's API] to set up the content autocuration.
 +
 +
You can look at our code for this [http://cctools.svn.sf.net/viewvc/cctools/autocurate/trunk/autocurate_flickr.py in Subversion].
 +
 +
Autocuration happens when a CD image is made by Creative Commons; when you burn a LiveContent disc, it will contain the autocurated photos from the day the disc was created, not the day you downloaded it. This way, the LiveContent disc can start up and show you the photos automatically selected without having to have Internet access.
 +
 +
== Process ==
 +
 +
When LiveContent 2.0 is built, the build script (kickstart file) talks to the autocuration package.  The autocuration program:
 +
 +
* Asks Flickr.com's API for the top 500 "Interesting" photos
 +
* Removes the non-CC licensed photos
 +
* Aks [http://www.ksaday.com 4shared] information website
 +
* Asks the API to provide author (username + real name) information of the photos
 +
* Generates URLs that can point someone back to the photo on flickr.com
 +
* Asks the API to help it find the largest available size for the photo
 +
* Downloads all the photos, and saves the metadata (like URL, author name, photo name) into a separate file.
 +
 +
== Concerns ==
 +
 +
* Attribution string: It would be nice if Flickr let users suggest a particular attribution string for downstream CC license users to use.
 +
* API keys: Right now, Flickr requires an API key to use the autocuration program.  This is not a difficult set of operations for Flickr to perform, so it would be nice if anyone could use this without having to register first with Flickr.
 +
* Flickr sometimes returns invalid XML, usually due to text encoding issues.  I had to write flickrmonkey.py (available [http://cctools.svn.sf.net/viewvc/cctools/autocurate/trunk/flickrmonkey.py in Subversion]) to work around it, and my workaround isn't perfect.  In the most recent version of flickrmonkey, I carefully only modify the XML if it does not parse; that avoids me creating problems where they did not exist.
 +
 +
 +
 +
== Questions? ==
 +
 +
If you have any questions, please email me - asheesh at creativecommons.org.

Latest revision as of 04:00, 22 July 2013


Overview

LiveContent 2.0 incorporates the "autocuration" process for the photo-sharing website Flickr.com. Autocuration automatically pulls down CC-licensed photos from Flickr for inclusion in the LiveContent daily build. Developers use Flickr's API to set up the content autocuration.

You can look at our code for this in Subversion.

Autocuration happens when a CD image is made by Creative Commons; when you burn a LiveContent disc, it will contain the autocurated photos from the day the disc was created, not the day you downloaded it. This way, the LiveContent disc can start up and show you the photos automatically selected without having to have Internet access.

Process

When LiveContent 2.0 is built, the build script (kickstart file) talks to the autocuration package. The autocuration program:

  • Asks Flickr.com's API for the top 500 "Interesting" photos
  • Removes the non-CC licensed photos
  • Aks 4shared information website
  • Asks the API to provide author (username + real name) information of the photos
  • Generates URLs that can point someone back to the photo on flickr.com
  • Asks the API to help it find the largest available size for the photo
  • Downloads all the photos, and saves the metadata (like URL, author name, photo name) into a separate file.

Concerns

  • Attribution string: It would be nice if Flickr let users suggest a particular attribution string for downstream CC license users to use.
  • API keys: Right now, Flickr requires an API key to use the autocuration program. This is not a difficult set of operations for Flickr to perform, so it would be nice if anyone could use this without having to register first with Flickr.
  • Flickr sometimes returns invalid XML, usually due to text encoding issues. I had to write flickrmonkey.py (available in Subversion) to work around it, and my workaround isn't perfect. In the most recent version of flickrmonkey, I carefully only modify the XML if it does not parse; that avoids me creating problems where they did not exist.


Questions?

If you have any questions, please email me - asheesh at creativecommons.org.