Difference between revisions of "Metrics/License statistics"

From Creative Commons
Jump to: navigation, search
m (/* help uncoditional for a better life/*)
(Undo revision 23679 by Freedomillenium (Talk))
Line 1: Line 1:
 
''See [[metrics]] for a broader discussion of CC adoption and impact.''
 
''See [[metrics]] for a broader discussion of CC adoption and impact.''
  
LIcensing must
+
=Caveats=
          For reason to do obtain this licensing is for reason of humanitary and not following to search reasons to obtain unnecesary material using or ather nature .This mast to be a point of starting to not be neccesary greeding and help aor neighbour unconditionaire this motivation it's result of situation provide from aor society were we living and many peoples aprox.5 miliards of peoples is in situation limit substitence and they heve right of a beter life from peoples who rules destiny of this world.Theyer mission is to know uncionditinaire the history of humanity and to not repet same mistakes and to bring new ideas,projects wityh fonds materials ,spirituals for help each man from this planet to echilibrate the balance from poverty and reach region of globe and between members of this reason.It is posibility and is creating now a disproportion of global nivel the cause do this echilibre balance and entry in incapacity to development in all domains of activity and wat is must badly to emphasize of fenemonen in the future economical and finnancial.It exist posibility to not find if the problems persist to bloked and stok all mondial economy in 2-3 years.All we can do is must to reechilibrate the social life of each colectivity,community who live from this planet progresiv and put accent from theyer life social and economic.This site must to bring a graeat cotribution of this procces and for this I request that all ilumination minds to help to development this programm in benefit of all.
+
 
  Sincerl;y,
+
Estimating license adoption is a ''very'' inexact science. There is no authoritative source and we neither control nor have inside knowledge of the construction and volatility of the most comprehensive sources -- web search engines. From 2003 through 2005 we relied primarily on Yahoo! link: queries (Google's link: operator obtains ''very'' incomplete results). In late 2005 Google added CC queries via their advanced search page and also via their API (Yahoo! also allows this, but in this case Google's results seem more comprehensive).
  Mr.Marasoiu Marian Puiu
+
 
  creator and fondator organization Freedomillenium
+
'''NOTE:''' Currently the best analysis of the data similar to that discussed below (based on a snapshot  independently gathered in January, 2007 may be found in Giorgos Cheliotis' [http://hoikoinoi.wordpress.com/2007/07/02/cc-stats/ presentation on CC statistics] from June, 2007.
  
 
= Raw search engine query data =
 
= Raw search engine query data =

Revision as of 03:17, 10 June 2009

See metrics for a broader discussion of CC adoption and impact.

Caveats

Estimating license adoption is a very inexact science. There is no authoritative source and we neither control nor have inside knowledge of the construction and volatility of the most comprehensive sources -- web search engines. From 2003 through 2005 we relied primarily on Yahoo! link: queries (Google's link: operator obtains very incomplete results). In late 2005 Google added CC queries via their advanced search page and also via their API (Yahoo! also allows this, but in this case Google's results seem more comprehensive).

NOTE: Currently the best analysis of the data similar to that discussed below (based on a snapshot independently gathered in January, 2007 may be found in Giorgos Cheliotis' presentation on CC statistics from June, 2007.

Raw search engine query data

Creative Commons has irregularly run programs that collect estimated total results from search engine link:{license_uri} queries and queries filtered by license property (Yahoo! and Google advanced search support filtering by license).

Linkback data

This data is in the public domain. To read more about open data, see the Science Commons Protocol for Implementing Open Access Data.

You can download raw MySQL dumps that are generated nightly from http://labs.creativecommons.org/~paulproteus/sql-dumps/all.sql.gz -- this includes all data gathered programmatically by CC to date.

Single day data is available in CSV format from http://labs.creativecommons.org/~paulproteus/csv-dumps/. Here is a guide to the columns in the file:

  1. internal ID number for this row (e.g., 5041)
  2. License URI (e.g., http://creativecommons.org/licenses/by-nc/1.0/jp/)
  3. search engine (e.g., All The Web)
  4. number of hits (e.g., 4680)
  5. date and time this linkback query run started (e.g., 2004-Apr-04 0:00:00)
  6. short form of license jurisdiction (e.g., jp)
  7. short form of license name (e.g., by-nc)
  8. license version (e.g., 1)
  9. long form of license jurisdiction (e.g., Japan)

WARNING: There are gaps in the data and results from any given method may be volatile to extremely volatile. Take the raw numbers with a huge grain of salt.

Graphs of the linkback data

Every night, we generate graphs of this linkback data. Right now there is no particularly friendly interface to these graphs, but for each jurisdiction (as well as all jurisdictions), and for each search engine, we generate these graphs:

  • Pie graph of the ratio of the different jurisdictions (meaningless for a particular jurisdiction, but meaningful for the "all" pseudo-jurisdiction)
  • Pie graph of different licenses used within that jurisdiction
  • Time-series chart of license usage (by short name, aggregating the various versions) within that jurisdiction (log base 10)
  • Time-series chart of the relative popularity of the different license versions
  • Time-series chart of the linkbacks to each license by short name (aggregating the various license versions)
  • Bar graph of the properties in use by the licenses (e.g., "How many licensed works contain Share-Alike?")
  • Time-series chart of total linkbacks to any license at all

You can browse these charts by going to http://a6.creativecommons.org/~paulproteus/charts/ and clicking on a date. With that selected, you can choose a jurisdiction (or the "all" pseudo-jurisdiction) and see big ugly page with all the graphs we have calculated.

Flickr data

Information generated from Flickr is also available here, either as:

Software

Data gathering

The code used to gather the above data is available from the stats module from our sourceforge repository.

If you want to run it yourself, here's what to do:

Check out the software:

Configure database access:

  • rename dbconfig_EXAMPLE.py to dbconfig.py with a MySQL database and password you want to use (note that if you want to use a non-MySQL database, most of the tools we use are actually database-agnostic and require only tiny changes)
  • mysql -h dbserver -uusername -p databasename < create_tables.sql

Check for dependencies. Note that the script expects Tor to be running on localhost!

  • python sanity_check.py

Do a stats crawl!

  • python link_counts.py log

Charting

The stats module also contains some chart generating code. To run this code:

  • Import all.sql.gz into a MySQL database
  • Configure the database in dbconfig.py
  • Run chart generation software from the stats/reports/ directory.
  • Warning: the output is raw and the program takes a long time to run

Baseline numbers from specific collections

We can also know the number of works licensed at various content curators. The largest of these based on recent (December 2006) for various formats may (there could easily be a larger CC-licensed video collection than Revver) be:

Licensed work counts at leading repositories
Repository 2005-08 2005-11 2005-12 2006-01 2006-04 2006-05 2006-07 2006-09 2006-12 2007-03 2007-06 2008-01
Flickr (graph) (photos) 4.1m 7.1m 10.8m 12.7m 19.7m 25.5m 32.5m 38.7m 57.9m
Soundclick (graph) (audio) 159k 200k 220k 249k 294k 324k 372k 430k
Revver* (graph) (video) na 0 19k 119k 214k 296k 417k

Also see Jamendo stats and Magnatune stats.

* Revver is an overestimate, probably total number of uploads to date, some of which may have been removed or never published.

License property charts

These charts show a breakdown of the types of licenses deployed and the properties of deployed licenses, based on Yahoo! queries as of 2006-06-13. (As above the Google API is now superior for an aggregate count, but Yahoo link: searches are superior for measuring the relative deployment of specific licenses and thus specific license types.)

Distribution of licenses deployed. Those without 'by' (attribution) were not versioned past 1.0 (excepting public domain, which is not versioned).
Distribution of license properties across licenses deployed, e.g., 3% non-by are public domain and non-by 1.0 licenses.

Estimates over time

2008-07-01 -- 130 million total works estimated using Ankit's implementation of Giorgos' scaling algorithm.

2008-05-02 -- 67 million photos licensed at Flickr http://flickr.com/creativecommons

2007-06-14 -- Multifaceted metrics presented at iSummit [1]

2007-03-31 -- 33 million photos licensed at Flickr and growth over 1 year [2]

Based on a swivel.com user's data collection from http://flickr.com/creativecommons

2006-06-13 -- 140 million pages licensed [3]

Based on Google queries.

2005-12 -- 45 million pages licensed [4]

Based on Google queries.

2005-08-09 -- 53 million pages licensed [5]

Again based on Yahoo! queries, this number turned out to be overstated as Yahoo! tuned their results estimation after growing their index.

2005-06-13 -- CC search query breakdown[6]

Breakdown of search requests and desired license properties -- people searching for video want the least freedom.

2005-05-27 -- CC in Yahoo! Advanced Search[7]

Yahoo! queries say 16m pages linking to a CC license.

2005-03-23 -- Yahoo! Search for Creative Commons[8]

Close to 14m pages link to a CC license according to Yahoo! queries.

2005-03-07 -- CC search index breakdown[9]

Breakdown of (small) CC-nutch index -- audio publishers are most permissive, video publishers least.

2005-02-25 -- License Distribution [10]

Based on Yahoo! queries there are now 10m licensed documents. Pie chart of what those licenses are.

2005-02-18 -- How many pages link to a CC license? [11]

Based on Yahoo! queries, "well over 5m." At the end of 2003 it was 1m.

2004-09-17 -- Searching for Creative Commons on Yahoo![12]

4.7m pages link to CC licenses according to Yahoo! queries.

2003-12

1m

Issues

Fixed

  • Until 2008-07-01, the backlinks (e.g. [13] and [14] for 2004-04-01, Yahoo and Google respectively) between 2004-04-01 and 2005-06-20 were incorrectly labeled.
    • The problem was bad importing between data formats in 2005.
    • The issue was fully corrected by 2008-07-01.

Confirmed

  • Google API queries are not working properly right now (as of 2008-06-25).