This page supersedes Databases and Creative Commons.
Much of the potential value of data is to society at large — more data has the potential to facilitate enhanced scientific collaboration and reproducibility, more efficient markets, increased government and corporate transparency, and overall to speed discovery and understanding of solutions to planetary and societal needs.
A big part of the potential value of data, in particular its society-wide value, is realized by use across organizational boundaries. How does this occur legally? Many sites give narrow permission to use data via terms of service. Much ad hoc data sharing also occurs among researchers. And increasingly, sharing of data is facilitated by distribution under standard, public legal tools used to manage copyright and similar restrictions that might otherwise limit dissemination or reuse of data, e.g. CC licenses or the CC0 public domain dedication.
Many organizations, institutions, and governments are using CC tools for data. For case studies about how these tools are applied, see:
- 1 Frequently asked questions about data and CC licenses
- 1.1 Can databases be released under CC licenses?
- 1.2 When a CC license is applied to a database, what is being licensed?
- 1.3 How do I apply a CC legal tool to a database?
- 1.4 How do the different CC license elements operate for a CC-licensed database?
- 1.5 Can I conduct text/data mining on a CC-licensed database?
- 1.6 How does the treatment of sui generis database rights vary in prior versions of CC licenses?
- 1.7 What is the difference between the Open Data Commons licenses and the CC 4.0 licenses?
- 2 Frequently asked questions about data, generally
- 2.1 Which components of databases are protected by copyright?
- 2.2 How do I know whether a particular use of a database is restricted by copyright?
- 2.3 If my use of a database is restricted by copyright, how do I comply with the license?
- 2.4 Which components of a database are protected by sui generis database rights?
- 2.5 How do I know whether a particular use of a database is restricted by sui generis database rights?
- 2.6 What constitutes a “substantial portion” of a database?
- 2.7 If my use of a database is restricted by sui generis database rights, how do I comply with the license?
- 3 Notes
Frequently asked questions about data and CC licenses
Can databases be released under CC licenses?
Yes, CC licenses can be used to license databases. The most recent version (4.0) may be used to license databases subject to copyright and, where applicable, sui generis database rights. Sui generis database rights prevent copying and reusing of substantial parts of a database (including frequent extraction of insubstantial parts). However unlike copyright, database rights protect the maker's investment, not originality.
CC does not recommend use of its NonCommercial (NC) or NoDerivatives (ND) licenses on databases intended for scholarly or scientific use. See [link] below for more information.
In addition to our licenses, the CC0 Public Domain Dedication may be used on databases to maximize reuse of databases. When applied, the effect is to waive all copyright and related rights in the database and its contents, placing it as close as possible into the worldwide public domain. In certain domains, such as science and government, there are important reasons to consider using CC0. Waiving copyright and related rights eliminates all uncertainty for potential users, encouraging maximal reuse and sharing of information.
When a CC license is applied to a database, what is being licensed?
The license terms and conditions apply to the database structure (its selection and arrangement, to the extent copyrightable), its contents (if copyrightable), and in those instances where the database maker has sui generis database rights then the rights that are granted those makers. Notwithstanding, licensors can choose to license some rather than all of the rights they have in a database. Creative Commons advises against this practice. However, if a licensor chooses to do so anyway, we strongly encourage licensors to clearly demarcate what is and is not licensed. See below for more information regarding how to provide clear notice of what is licensed.
How do I apply a CC legal tool to a database?
Before making a database available under a CC license, a database provider should first make sure she has all rights necessary to do so. Often, the database provider is not the original author of the database contents. If that is the case, the database provider should secure separate permission from the other author(s) before publishing the database under a CC legal tool. If a database maker decides to license the database without securing permission from the author(s) of the database contents, it should clearly indicate the material for which permission has not been secured and clearly mark the material as not being offered under the terms of the license. For more information, read our pre-licensing guidelines.
Database providers should also consider carefully what elements of the database she wants covered by the CC legal tool and identify those elements in a manner that reusers will see and understand. Please see our marking page for more information on how to clearly distinguish unlicensed content.
How do the different CC license elements operate for a CC-licensed database?
Under version 4.0, if an NC license has been applied then any use of the licensed database or its contents that is restricted by copyright law or sui generis database rights requires compliance with the NC term, even if the database is not publicly shared. The other license elements (BY, ND, and SA, as applicable) must be complied with only if your use is so restricted and public sharing is involved. Learn more about how to comply when your use implicates copyright and/or sui generis database rights.
Prior CC license versions do not require compliance with the license restrictions or conditions when only sui generis database rights (and not copyright) are implicated. Please see below for more detail about how this works in the current and prior versions of the licenses.
Can I conduct text/data mining on a CC-licensed database?
Yes. However, you should be aware that whether you have to comply with the CC license terms and conditions will depend on whether the type of mining activity you conduct implicates copyright or any applicable sui generis database rights. If you are not exercising an exclusive right held by the database maker, then you do not need to rely on the license to mine. Because there are many different methods for conducting text and data mining, however, there may be some types of mining activities that will implicate the licensed rights.
If and only if your particular use is one that would require permission, you should note the following:
- Permission: All six of the 4.0 licenses allow for text and data mining by granting express permission to privately reproduce, extract, and reuse the contents of a licensed database and create adapted databases.
- Commercial purposes: If you are conducting text and data mining for commercial purposes, you should not mine NC-licensed databases or other material.
- Outputs: If you publicly share the results of your mining activity or the data you mined, you should attribute the rights holder. If what you publicly share qualifies as an adaptation of the licensed material, you should not mine ND-licensed material. If you share an adaptation of material under an SA license, you must apply the same license to the adaptation that results.
If your use is not one that requires permission under the license, you may conduct text and data mining activity without regard to the above considerations.
How does the treatment of sui generis database rights vary in prior versions of CC licenses?
As explained above, the current version of the CC license suite (4.0) licenses sui generis database rights in addition to copyright and other closely related rights. Past versions of CC licenses operate differently with respect to sui generis database rights.
In the CC version 3.0 licenses, the legal treatment of sui generis database rights varies, but the practical result is always the same: compliance with the license restrictions and conditions is not required where sui generis database rights--but not copyright--are implicated. This means that if someone extracts a substantial portion of a CC-licensed database and uses it in a way that does not implicate copyright (e.g., by rearranging purely factual data), the license does not require her to attribute the licensor or comply with any other restrictions or conditions, even if the database is protected by sui generis database rights.
While this result is the same across all CC version 3.0 licenses, the reason for this outcome varies. In the 3.0 licenses ported to the laws of EU jurisdictions, the scope of the licenses expressly cover databases subject to copyright and/or sui generis database rights. However, the conditions of the license are explicitly waived when use of the licensed work only involves the exercise of database rights.
By contrast, the 3.0 unported licenses and all other ported licenses do not expressly license sui generis database rights. As a result, those licenses do not apply when sui generis database rights alone are implicated. This means a licensee may need separate permission to use the database in a way that implicates sui generis database rights (although arguably an implied license to exercise those rights may be deemed granted in some jurisdictions).
More information on the underlying 3.0 policy decision the treatment of sui generis database rights those licenses can be found on our wiki (.pdf).
What is the difference between the Open Data Commons licenses and the CC 4.0 licenses?
The Open Database License (ODbL) and the Open Data Commons Attribution License (ODC-BY) are licenses designed specifically for use on databases and not on other types of material. There are many differences between those licenses and CC licenses, but the most important to be aware of relate to license scope and operation. The ODC licenses apply only to sui generis database rights and any copyright in the database structure, they do not apply to the individual contents of the database. The latest version of the CC licenses on the other hand apply to sui generis database rights and all copyright and neighboring rights in the database structure as well as the contents. (See above for more detail about how past versions of CC licenses vary with respect to sui generis database rights.)
Another important difference is that ODC licenses may create contractual obligations even in jurisdictions where database rights would not otherwise exist and but for the license permission would not be necessary. CC has crafted its licenses to ensure that they never impose obligations where permission is not otherwise required to use the licensed material.
Frequently asked questions about data, generally
Which components of databases are protected by copyright?
With databases, there are likely four components to consider: (1) the database model or structure, (2) the data entry and output sheet, (3) field names, and (4) the data or other content.
The database model refers to how a database is structured and organized, including database tables and table indexes. The selection, coordination, and arrangement of the database is subject to copyright if it is sufficiently original. The originality threshold is fairly low in many jurisdictions. For example, while courts in the United States have held that an alphabetical telephone directory was insufficiently original to merit copyright protection, an organized directory of Chinese-American businesses in a particular area did. These determinations are very fact-specific (no pun intended) and vary by jurisdiction.
The data entry and output sheets contain questions, and the answers to these questions are stored in a database. For example, a web page asking a scientist to enter a gene’s name, its pathway information, and its ontology would constitute a data entry sheet. The format and layout of these sheets are protected by copyright according to the same standard of originality used to determine if the database model is copyrightable.
Field names describe the contents or data. For example, “address” might be the name of the field for street address information. These are less likely to be protected by copyright because they often lack sufficient originality.
The data or other contents contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names or city populations) would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content.
How do I know whether a particular use of a database is restricted by copyright?
When the database structure or its contents is subject to copyright, reproducing, distributing, or modifying the database will often be restricted by copyright law. However, it is important to note that some uses of a copyrighted database will not be restricted by copyright. It may be possible, for example, to rearrange or modify the uncopyrightable data in a way that does not implicate the copyright in the database structure. For example, while (as noted above) a court in the United States held that a directory of Chinese-American businesses was restricted by copyright, the same court went on to hold that a directory that duplicated hundreds of its listings was not infringing because the listings were categorized and arranged in a sufficiently dissimilar way. In those situations, compliance with the license conditions is not required unless the database contents are themselves restricted by copyright.
Similarly, even where database contents are subject to copyright and published under a CC license, use of the facts and ideas embedded within the contents will not require attribution (or compliance with other applicable license conditions), unless doing so implicates copyright in the database structure as explained above. This important limitation of all CC licenses is highlighted on the license deeds in the Notice section, where we emphasize that compliance with the license is not required for elements of the material in the public domain.
If my use of a database is restricted by copyright, how do I comply with the license?
All CC licenses require that you attribute the licensor when your use involves public sharing. Your other obligations depend on the particular CC license applied to the database. If it is a NC license, any regulated use must be limited to noncommercial purposes only. If a ND is applied, you may produce an adapted database but cannot share it publicly. If it is a ShareAlike (SA) license, you must apply the same or a compatible license to any adaptation of the database you share publicly.
Which components of a database are protected by sui generis database rights?
In contrast to copyright, sui generis database rights are designed to protect a maker's substantial investment in a database. In particular, the right prevents the unauthorized extraction and reuse of a substantial portion of the contents.
How do I know whether a particular use of a database is restricted by sui generis database rights?
When a database is subject to sui generis database rights, extracting and reusing a substantial portion of the database contents is prohibited absent some express exception.
It is important to remember that sui generis database rights exist in only a few countries outside the European Union, such as Korea and Mexico. Generally, if you are using a CC-licensed database in a location where those rights do not exist, you do not have to comply with license restrictions or conditions unless copyright (or some other licensed right) is implicated.
Note that if you are using a database in a jurisdiction where you must respect database rights, and you receive a CC-licensed work from someone located in a jurisdiction without database rights, you should determine whether database rights exist and have been licensed. If so, you need to properly mark and attribute as the license requires, since the person from whom you received the database may not have been required to keep that information. If you are using a licensed database and you do not have to comply with the license terms because such rights do not exist in your jurisdiction, we recommend that you retain this information where possible. Doing so assists downstream reusers who are required to provide it when they share further.
What constitutes a “substantial portion” of a database?
There is no bright line test for what constitutes a “substantial portion”. The answer will depend on the law in the relevant jurisdiction. Note that what constitutes a substantial portion is determined both quantitatively and qualitatively. Also, using several insubstantial portions can add up to a substantial portion.
If my use of a database is restricted by sui generis database rights, how do I comply with the license?
If the database is released under the current version (4.0) of CC licenses, you must attribute the licensor if you share a substantial portion of the database contents. The other requirements depend on the particular license applied to the database. Under the NC licenses, you may not extract and reuse a substantial portion of the database contents for commercial purposes. The ND licenses prohibit you from including a substantial portion of the database contents in another publicly shared database in which you have sui generis database rights of your own. And finally, the SA licenses require you to apply the same or a compatible license to any database (but not its individual contents) you share publicly and in which you include a substantial portion of the licensed database contents.
- Key Publications, Inc. v. Chinatown Today Publishing Enterprises Inc., 945 F.2d 509 (2d Cir. 1991).