Data and CC licenses
Where CC0 is not desired for whatever reason (business requirements, community wishes, institutional policy…), CC licenses can and should be used for data and databases — with the important caveat that CC 3.0 license conditions do not apply to uses of data and databases that do not implicate copyright. Read more about this here.
- 1 Data and CC license use cases
- 1.1 Australia Federal Government
- 1.2 Australia Queensland State Government
- 1.3 Austrian government
- 1.4 ChEMBL
- 1.5 DBpedia
- 1.6 Finnish Libraries
- 1.7 Freebase
- 1.8 Geocommons
- 1.9 German railway
- 1.10 Google
- 1.11 Greece Government
- 1.12 Italian Government
- 1.13 MusicBrainz
- 1.14 Mydosis Portal
- 1.15 New Zealand Government
- 1.16 Open Directory Project (dmoz)
- 1.17 OpenStreetMap
- 1.18 Paleobiology Database
- 1.19 Powerhouse Museum
- 1.20 Spain (Basque) Government - Open Data Euskadi
- 1.21 Stack Overflow
- 1.22 Uniprot
- 1.23 United Kingdom Government
- 2 Other public datasets
Data and CC license use cases
Three of the largest sources of Australian federal government data sets — Australian Bureau of Statistics (ABS), Geoscience Australia and the still beta data.gov.au — are all licensed by default under CC Attribution. Together these sites provide free access to all of Australia's census data, official geoscientific information and knowledge, and other miscellaneous government data (such as the location of public toilets). The ABS and Geoscience Australia have detailed copyright and attribution guidelines to assist with user implementation. data.gov.au played a major role in the Mashup Australia competition run by Australia's Government 2.0 Taskforce. Results from the contest (over 50 datasets) were released on data.gov.au.
Various data in the Australian state of Queensland's Office of Economic and Statistical Research are licensed under CC Attribution. The Queensland Government Information Licensing Framework (GILF) seeks to create and implement a new standardized CC licensing arrangement for all Queensland Government information.
DBpedia is a community organized effort to extract structured data from Wikipedia and make it available on the web so that it can be queried and linked to other datasets. DBpedia currently describes 3.5 million things, and is available for download under CC Attribution-ShareAlike.
Several Finnish libraries have opened up their data via the CC BY-SA license:
Freebase is a collaborative project that imports structured data from a variety of sources on the web, including Wikipedia, Wikimedia Commons, and the Stanford University Library. Freebase currently contains information about 20 million topics, or entities, and its data is available for reuse under CC Attribution.
Google Ngram Viewer has released its dataset under CC BY: http://ngrams.googlelabs.com/datasets.
Greece has opened up its geospatial data by implementing CC on geodata.gov.gr/geodata. The data is available under CC Attribution or CC Attribution-ShareAlike according to the type of data. Greek geodata is also available at opengeodata.gr under CC Attribution-ShareAlike, an implementation of the INSPIRE directive.
Italy's National Institute of Statistics has released all data on its site under the CC Attribution license. The Italian Chamber of Deputies shares its data via CC BY-SA. The Italian Ministry of Education, University and Research also launched its Open Data portal under CC BY.
MusicBrainz is a user-maintained database of information about artists and their music, including title, artist, release date, format, and other data. The data on MusicBrainz is available as public domain material free to be reused without restrictions or under the CC Attribution-Noncommercial-ShareAlike. The distinctions between types of data are explained here.
The contents of the database Mydosis are licensed under a Creative Commons Attribution-Non Commercial-Share Alike 3.0 Unported License.
New Zealand's Ministry for the Environment’s Land Cover Database and the Land Environments New Zealand classification was released under a CC Attribution license on the Koordinates website. More info is available at CC New Zealand.
Open Directory Project (dmoz)
OpenStreetMap is a user-generated map of the world, amassing geodata collaboratively from around the globe. Its dataset is available under CC Attribution-ShareAlike. After the earthquake in Haiti, OpenStreetMap found an immediate niche to fill, launching their Project Haiti page in an effort to map out what was, at the time, a largely incomplete geographical picture, helping those on the ground in Haiti get to where they needed to be with greater accuracy.
One of the largest compendia of fossil data assembled to date is the Paleobiology Database (PBDB), founded in 1998 by John Alroy and Charles Marshall. The PBDB has since grown to include an international group of more than 150 contributing scientists with diverse research agendas. Collectively, this body of volunteer and grant-supported investigators have spent more than 9 continuous person years entering more than 280,000 taxonomic names, nearly 500,000 published opinions on the status and classification of those names, and over 1.1 million taxonomic occurrences. After a year of community feedback and discussion, the Paleobiology Database has taken the decision that “All records are made available to the public based on a Creative Commons license that requires attribution before use.” The Paleobiology Database is now licensed under a CC BY 4.0 License.
In 2009, the Basque government opened up its data via the portal Open Data Euskadi, licensing all of its public data under CC Attribution. The Basque government listed as its reasoning for opening data, to "generate value and wealth," "create transparency," and "facilitate interoperability between administrations." The government especially encourages reuse of its data by the private sector, other public administrations, and stakeholders to promote transparency in government.
- CC BY-SA
Uniprot, the world’s most comprehensive catalog of information on proteins, is available for reuse under CC Attribution-NoDerivs. The license is viable for all copyrightable parts of Uniprot's database.
Through data.gov.uk, the United Kingdom has made available a growing number of government datasets (currently at 5,400) under terms that are interoperable with the CC Attribution license. This portal includes all affiliated websites such as the Ordinance Survey's maps.
Other public datasets
A list on Github of Awesome public datasets, not necessarily under CC licenses: https://github.com/caesar0301/awesome-public-datasets.