Images of People, Networks of Sharing: How Does Creative Commons Impact Flickr

Applicants: Chimaera Institute
Affiliation: Chimaera Institute
CC affiliated? No
Contact: Remy Cross, Ben Lind, Kelly Ramsey
Coordinator: Ben Lind
Project Start: 2011/08/15
Project End: 2011/02/15
Describe the project you are proposing as clearly as possible in just five sentences.

We propose to study how Creative Commons (CC) licensed content is consumed and shared by audiences. Our intent is to study the photo-sharing service Flickr to assess (a) whether CC licensed materials reach more viewers than non-CC licensed materials, and (b) whether CC licensed materials encourage larger, more robust social network communities than non-CC licensed materials. Because Flickr allows for users to offer and share a variety of important network information we would like use this community as a test case to better understand CC media production and consumption. We will collect the data that Flickr makes freely available to the public and analyze it, analyze the data, and produce a report that will contribute to the understanding of how CC material is made and consumed.

Detail the tangible project output (e.g., paper, blog post, written materials, video/film, etc.; this would be in addition to the final written report that successful grant recipients will be expected to deliver to CC at the conclusion of the project).

This project has two important forms of output: the data and the analysis. Our intention is to make both freely available to any and all interested parties so that our methods will be open to all and any followup research will have an excellent starting point. We will license the data, the code for data collection and analysis, and our report with Creative Commons non-commercial, attribution, and share-alike ownership.

In terms of data, we will gather a large amount of raw network data from a programming script that scrapes This photograph data includes a page view count, CC or non-CC license, the date uploaded, comments, as well as the Flickr user network it is embedded in. After this initial dataset has been gathered we will also use it to produce a more refined, accessible dataset suitable for a variety of statistical and network analyses. Both the raw data and the cleaned dataset will be freely available, and our intention is to host the data through our website as we collect and process it.

The analysis will be presented as a white paper offering our analysis of how CC licensed material is shared and remixed by a community such as Flickr. The analysis will include a statistical comparison of CC and non-CC licensed works to test for differences in their respective online communities.

Describe the community you are targeting. How would the project benefit the community?

Our beneficiaries will be websites that host - or are thinking of hosting - large amounts of user-contributed media content. Our analysis will show how the public accesses and organizes around this content when available under a CC license. This will help both website operators and potential users alike in that it will show them how their choice of licensing affects the kinds of communities that form around the content they share. If our findings encourage further usage of CC licenses, this would yield an increase in content available for the broader public to distribute, commercialize, or make derivative works when />

What is your relationship with the community you are targeting? Why are you the best individual/organization to lead this project? Do you have prior experience in related projects?

We are part of a 501(c)(3) (approval pending) nonprofit chartered for educational purposes. Our core mission is to conduct research and educate the public on issues of sustainable communities, open access to knowledge, and informed consumption of goods and services both tangible and virtual.

Our mission of providing open access to knowledge and informed consumption have lead us to become advocates of Creative Commons style production and practices.

We have over 20 years of combined research experience in both quantitative and qualitative research methods in our areas of emphasis, which include information technologies, social networks, and complex organizations.

How will you measure and evaluate your project’s impact - on your main participants? Other contributors? On the larger community?

The simplest measure will be the amount of traffic generated as we make the data and the final product available. Because we are also part of a wider community of non-profits and academic institutions, as well as fellow CC enthusiasts, we will also send out occasional notification to these groups to make sure they are aware of the ongoing progress of the project.

How many participants do you expect to be involved in your project? How will you seek and sustain their involvement?

This project will only directly involve the three primary researchers conducting the data collection and analysis activities. However, it will involve the indirect participation of Flickr users numbering in the thousands as the study pertains to their image sharing patterns. All information gained from these users is freely available to the general public through the Flickr website and all users agree to these terms. is also indirectly involved, as we will write robots to extract data from their website. Our robots will conform to their manifest on robot conduct ( Additionally, we are open to collaboration and assistance from any interested parties who feel they have something to add to the project.

Describe how your project will benefit Creative Commons' mission to increase the amount of creativity (cultural, educational, and scientific content) in "the commons".

This project will create payoffs for both the scientific, cultural, and educational communities served by Creative Commons by providing valuable research data on how users incorporate CC licensed images in their Flickr experience.

The end product will hopefully lower the barriers for future CC licensed projects and products and will spur more people to opt to use CC to release their work. In this way it should increase all three types of creativity among “the commons”.

Describe what technologies and tools your project will use. What kinds of technical skills and expertise do you bring to the project? What are your technical needs?

Our project will use web-scraping robots to collect the data as well as statistical and network software for the analysis. The web-scraping robots will be created through code written in R (, relying largely upon its "XML" package. One of our researchers brings expertise in writing robots in R, as he has created similar robots to extract newspaper data from ProQuest and Google News Archive while acting as a research assistant for National Science Foundation grants at the University of California, Irvine.

The data analyses will also be performed in R. R is an open-source statistical software package commonly used in academia. Beyond its powerful base packages that allow for a great range of statistical analyses, it has a rich community of contributors who write specific software packages. For our network analyses, we will draw upon the "sna" and "network" packages.

The three lead researchers have over twenty combined years of experience with statistical and network analysis during their advanced graduate training at the University of California, Irvine. This includes coursework, research assistantships, and direct research pertaining to publications, conference presentations, and theses.

What challenges do you expect to face, and how do you plan to overcome them?

We expect that Flickr will require some delay when accessing their webpages using an automated script (robot). To overcome this challenge, we will incorporate page access delays into the script's code. This will lessen the burden on the site's bandwidth.

We also anticipate some difficulty in producing a representative sample that includes CC licensed photos, as the collecting data on the population would be impractical and unwieldy. We will overcome this challenge by sampling from Flickr's recently uploaded photos both at large ( and also specific to CC-Attribution (, CC-Attribution-NoDerivs (, CC-Attribution-NonCommercial-NoDerivs (, CC-Attribution-NonCommercial (, CC-Attribution-NonCommercial-ShareAlike (, and CC-Attribution-ShareAlike ( licenses.

Lastly, we anticipate some challenge in finding adequate community involvement behind a number of these photos. Indeed, many of the photos uploaded have relatively few views, comments, and have never been "favorited" by Flickr users. To overcome this obstacle, we will also sample from Flickr's "interesting" photos ( by date. Sampling from these photos ensures ample community involvement and it allows us to test how frequently CC licensed materials make Flickr's "interestingness" threshold.

How do you plan to sustain your project after the Creative Commons funding has ended? Detail specific plans. How do you plan to raise revenue to continue your efforts in the future?

Should we be awarded Creative Commons funding, after the funding period we plan to sustain the project in two ways. First, we will apply for additional grants to extend the study to other mediums such as user-generated music and film. Second, because all of our output from this project will be CC-licensed, we encourage future researchers to use and modify our data and methods.

How can this project be scalable, or have a scalable impact?

This project is scalable in that its sample size may grow as researchers wish to analyze more photos and Flickr users. This can be done through increasing the sample size or the frequency of the sampling procedure. Further, the code may be adapted for other websites featuring different mediums. While pairing users across different web services is highly difficult, we can compare aggregate findings and trends across different websites.

What resources and support do you expect Creative Commons to provide to your project to ensure its success (if any)?

Our hope is that Creative Commons will provide some funds to pay for the time of a programmer who will write the automation scripts and for the time and expertise of the data analysts who will clean and process the collected data. We also hope that Creative Commons will provide online web-hosting support to aid in disseminating our materials and knowledge gained.

Describe how your organization currently communicates with its community members and network partners. (100 words)

Currently we maintain an email list for interested parties and our board members. We are in the process of building our website and are looking at a late summer launch for it. We are also actively engaged in the professional communities of our interest and are regular attendees at professional conferences and seminars.