Difference between revisions of "Data"

From Creative Commons
Jump to: navigation, search
m (Protected "Data" (‎[edit=autoconfirmed] (indefinite) ‎[move=autoconfirmed] (indefinite)))
(Clarify and simplify by avoiding long sentences and words such as 'however', 'notwithstanding', etc. Replace some very specific terminology with text that will be understood by non-Americans. Correct various errors in grammar and typos.)
 
(51 intermediate revisions by 9 users not shown)
Line 1: Line 1:
''This page supersedes the 2006 document [[Databases and Creative Commons]].''
+
<noinclude>
 +
''This page supersedes [[Databases and Creative Commons]].''
  
Much of the potential value of data is to society at large — more data has the potential to facilitate enhanced scientific collaboration and reproducibility, more efficient markets, increased government and corporate transparency, and overall to speed discovery and understanding of solutions to planetary and societal needs[http://itshumour.blogspot.com/2012/05/very-short-funny-jokes.html .]
+
In today's society, the potential value of data is very high. Access to more data facilitates enhanced scientific collaboration and reproducibility, more efficient markets, increased government and corporate transparency. This contributes to accelerated discovery and understanding of solutions to planetary and societal needs.
  
A big part of the potential value of data, in particular its society-wide value, is realized by use across organizational boundaries. How does this occur (legally)? Many sites give narrow permission to use data via terms of service[http://www.thefunnyquotessayings.com/cool-hilarious-funny-quotes-sayings/ .] Much ad hoc data sharing occurs among researchers. And increasingly, open data is facilitated by sharing under public terms to manage copyright restrictions that might otherwise limit dissemination or reuse of data, e.g. [http://creativecommons.org/licenses/ CC licenses] or the [[CC0]] public domain dedication.
+
A large part of the potential value of data is realized when data is used across organizational boundaries. There are legal aspects involved in this. Many sites give limited permission to use data via terms of service. There is much ad hoc data sharing among researchers. Increasingly, sharing of data is facilitated by distribution under standard public legal tools. These tools are used to manage copyright and similar restrictions that might otherwise limit dissemination or reuse of data, e.g. [http://creativecommons.org/licenses/ CC licenses] or the [[CC0]] public domain dedication.
  
 
Many organizations, institutions, and governments are using CC tools for data. For case studies about how these tools are applied, see:
 
Many organizations, institutions, and governments are using CC tools for data. For case studies about how these tools are applied, see:
  
===[[Data_and_CC_licenses|Uses of CC Licenses with Data and Databases]]===
+
:[[Data_and_CC_licenses|Uses of CC Licenses with Data and Databases]]
===[[CC0_use_for_data|Uses of CC0 with Data and Databases]]===
+
:[[CC0_use_for_data|Uses of CC0 with Data and Databases]]
 +
</noinclude>
 +
<includeonly>==</includeonly>==Frequently asked questions about data and CC licenses==<includeonly>==</includeonly>
 +
<includeonly>==</includeonly>===Can databases be released under CC licenses?===<includeonly>==</includeonly>
 +
[[Frequently_Asked_Questions#Can_I_apply_a_Creative_Commons_license_to_databases.3F|CC licenses can be used to license databases]]. The most recent version (4.0) may be used to license databases subject to copyright and, where applicable, sui generis database rights. Sui generis database rights prevent copying and reusing of [[Data#What_constitutes_a_.E2.80.9Csubstantial_portion.E2.80.9D_of_a_database.3F|substantial parts]] of a database (including frequent extraction of insubstantial parts). Unlike copyright, database rights protect the maker's investment and not their originality.
  
You can also read more about [http://creativecommons.org/weblog/entry/26283 Creative Commons' most up-to-date thinking on data and databases], and what you can do to contribute.
+
CC does not recommend use of its NonCommercial (NC) or NoDerivatives (ND) licenses on databases intended for scholarly or scientific use.
  
==Frequently asked questions about data==
+
In addition to our licenses, the [[CC0_FAQ|CC0 Public Domain Dedication]] may be used to maximize reuse of databases.  When applied, the effect is to waive all copyright and related rights in the database and its contents, placing it as close as possible into the worldwide public domain. In certain domains, such as science and government, there are important reasons to consider using CC0. Waiving copyright and related rights eliminates all uncertainty for potential users, encouraging maximal reuse and sharing of information.
  
==='''''Can databases be released under CC licenses?'''''===
+
<includeonly>==</includeonly>===When a CC license is applied to a database, what is being licensed?===<includeonly>==</includeonly>
 +
The license terms and conditions apply to the database structure (its selection and arrangement, [[Data#Which_components_of_databases_are_protected_by_copyright.3F|to the extent copyrightable]]), its contents (if copyrightable), and in those instances where the database maker has [[Data#Which_components_of_a_database_are_protected_by_sui_generis_database_rights.3F|sui generis database rights]], to the rights that are granted those makers. It is possible for licensors to license some rather than all of the rights they have in a database. Creative Commons advises against this practice.  If a licensor chooses to do so, we strongly encourage licensors to clearly demarcate what is and is not licensed.  See [[Data#How_do_I_apply_a_CC_legal_tool_to_a_database.3F|below]] for more information regarding how to provide clear notice of what is licensed.
  
Yes, CC licenses can be used on any copyrighted work, including a copyrighted database. A CC license may be applied to any or all copyrighted aspects of a database and its contents. See [[Data#How_do_I_apply_a_CC_legal_tool_to_a_database.3F|below]] for more information regarding how to provide clear notice of what is licensed. Any use of the licensed database or its contents that is restricted by copyright law requires compliance with the relevant license conditions (BY, SA, NC, ND). In their current version (3.0) CC licenses do not require compliance with the license conditions when only sui generis database rights (and not copyright) are implicated. Additionally, the international and "ported" version 3.0 licenses, excluding EU jurisdiction ports, do not grant any permissions where sui generis database rights are implicated. Please see [[Data#How_.28if_at_all.29_are_sui_generis_database_rights_addressed_in_CC_licenses.3F|below]] for more detail.
+
<includeonly>==</includeonly>===How do I apply a CC legal tool to a database?===<includeonly>==</includeonly>
  
CC0, the public domain dedication, can also be used on databases. The effect is to waive all copyright and related rights in the database, placing it as close as possible into the worldwide public domain. In certain domains, such as science and government, there are important reasons to consider using tools like CC0. Waiving copyright and related rights eliminates all uncertainty for potential users, encouraging maximal reuse and sharing of information. Where waiver is not a viable option and some conditions on reuse are necessary, rights holders should [http://www.ivir.nl/publications/eechoud/CC_PublicSectorInformation_report_v3.pdf consider] using CC licenses that give the public more freedom to reuse and remix the content.  
+
Before making a database available under a CC license, database providers should first make sure they have all rights necessary to do so. Often, the database provider is not the original author of the database contents. If that is the case, the database provider should secure separate permission from the other author(s) before publishing the database under a CC legal tool. If database makers decide to license the database without securing permission from the author(s) of the database contents, they should clearly indicate the material for which permission has not been secured and clearly mark the material as not being offered under the terms of the license.  For more information, read our [[Considerations_for_licensors_and_licensees#Considerations_for_licensors|pre-licensing]] guidelines.
  
==='''''Which components of a database are protected by copyright?'''''===
+
Database providers should also consider carefully what elements of the database they want covered by the CC legal tool and identify those elements in a manner that reusers will see and understand. Please see our [[Marking_your_work_with_a_CC_license|marking page]] for more information on how to clearly distinguish unlicensed content.
  
With databases, there are likely four components to consider: (1) the database model or structure, (2) the data entry and output sheet, (3) field names, and (4) the data.
+
<includeonly>==</includeonly>===How do the different CC license elements operate for a CC-licensed database?===<includeonly>==</includeonly>
 +
Under version 4.0, if an NC license has been applied then any use of the licensed database or its contents [[Data#How_do_I_know_whether_a_particular_use_of_a_database_is_restricted_by_copyright.3F|that is restricted by copyright law]] or [[Data#How_do_I_know_whether_a_particular_use_of_a_database_is_restricted_by_sui_generis_database_rights.3F|sui generis database rights]] requires compliance with the [[Frequently_Asked_Questions#Does_my_use_violate_the_NonCommercial_clause_of_the_licenses.3F|NC term]], even if the database is not publicly shared.  The other license elements (BY, ND, and SA, as applicable) must be complied with only if your use is so restricted and public sharing is involved. Learn more about how to comply when [[Data#If_my_use_of_a_database_is_restricted_by_copyright.2C_how_do_I_comply_with_the_license.3F|your use implicates copyright]] and/or [[Data#If_my_use_of_a_database_is_restricted_by_sui_generis_database_rights.2C_how_do_I_comply_with_the_license.3F|sui generis database rights]].  
  
The '''database model''' is a specification describing how a database is structured and organized, including database tables and table indexes. The selection, coordination, and arrangement of the contents is subject to copyright if it is sufficiently original. The threshold of originality required for copyright is fairly low in many jurisdictions. For example, while courts in the United States have held that an alphabetical telephone directory did not have sufficient originality to merit copyright protection, an organized directory of Chinese-American businesses in a particular area did.  These determinations are very fact-specific (no pun intended) and vary by jurisdiction.
+
Prior CC license versions do not require compliance with the license restrictions or conditions when only sui generis database rights (and not copyright) are implicated. Please see below for more detail about [[Data#If_my_use_of_a_database_is_restricted_by_sui_generis_database_rights.2C_how_do_I_comply_with_the_license.3F|how this works in the current]] and [[Data#How_does_the_treatment_of_sui_generis_database_rights_vary_in_prior_versions_of_CC_licenses.3F|prior versions]] of the licenses.
  
The '''data entry and output sheets''' contain questions, and the answers to these questions are stored in a database. For example, a web page asking a scientist to enter a gene’s name, its pathway information, and its ontology would constitute a data entry sheet. The format and layout of these sheets are protected by copyright according to the same standard of originality used to analyze copyright in the database model.
+
<includeonly>==</includeonly>===Can I conduct text/data mining on a CC-licensed database?===<includeonly>==</includeonly>
  
'''Field names''' describe data sets. For example, “address” might be the name of the field for street address information. These are less likely to be protected by copyright because they often do not reflect originality.
+
It is possible to conduct mining activities on a CC-licensed database. Whether you have to comply with the CC license terms and conditions will depend on whether the type of mining activity you conduct implicates copyright or any applicable sui generis database rights. If you are not exercising an exclusive right held by the database maker, then you do not need to rely on the license to mine. As there are many different methods for conducting text and data mining, there may be some types of mining activities that will implicate the licensed rights.  
  
The '''data''' contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names without more) contained in a database would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content.
+
'''''If your particular use is one that would require permission''''', you should note the following:
 +
* ''Permission:'' All six of the 4.0 licenses allow for text and data mining by granting express permission to privately reproduce, extract, and reuse the contents of a licensed database and create adapted databases.
 +
* ''Commercial purposes:'' If you are conducting text and data mining for [[Frequently_Asked_Questions#Does_my_use_violate_the_NonCommercial_clause_of_the_licenses.3F|commercial purposes]], you should not mine NC-licensed databases or other material.
 +
* ''Outputs:'' If you publicly share the results of your mining activity or the data you mined, you should attribute the rights holder. If what you publicly share qualifies as an adaptation of the licensed material, you should not mine ND-licensed material. If you share an adaptation of material under an SA license, you must apply the same license to this adaptation.  
  
==='''''How do I know whether a particular use of a database is restricted by copyright?'''''===
+
[[Frequently_Asked_Questions#Do_I_always_have_to_comply_with_the_license_terms.3F_If_not.2C_what_are_the_exceptions.3F|If your use is not one that requires permission under the license]], the above considerations do not apply and you may conduct text and data mining activity.
  
When the database structure or contents are subject to copyright, reproducing, distributing, or modifying the database will often be restricted by copyright law. If the database is released under a CC license, that means reproduction, distribution, or modification will likely require compliance with the relevant license conditions, including attribution.
+
<includeonly>==</includeonly>===How does the treatment of sui generis database rights vary in prior versions of CC licenses?===<includeonly>==</includeonly>
  
However, it is important to note that some uses of a copyrighted database will not be restricted by copyright. It may be possible, for example, to rearrange or modify the uncopyrightable data in a way that does not implicate the copyright in the database structure. For example, while (as noted above) a court in the United States held that a directory of Chinese-American businesses was restricted by copyright, the same court went on to hold that a directory that duplicated hundreds of its listings was not infringing because the listings were categorized and arranged in a sufficiently dissimilar way. In those situations, compliance with the license conditions is not required unless the database contents are themselves restricted by copyright.
+
As explained [[Data#Can_databases_be_released_under_CC_licenses.3F|above]], the current version of the CC license suite (4.0) licenses sui generis database rights in addition to copyright and other closely related rights. Past versions of CC licenses operate differently with respect to sui generis database rights.  
  
Similarly, even where database contents are subject to copyright and published under a CC license, use of the facts and ideas found in the contents will not require attribution (or compliance with other applicable license conditions), unless doing so implicates copyright in the database structure as explained above. This important limitation of CC licenses is reflected in the license deed, where it indicates that the license does not extend to those elements of the work in the public domain.  
+
In the CC version 3.0 licenses, the legal treatment of sui generis database rights varies, but the practical result is always the same: compliance with the license restrictions and conditions is not required where sui generis database rights - but not copyright - are implicated. This means that if a substantial portion of a CC-licensed database is extracted and used in a way that does not implicate copyright (e.g., by rearranging purely factual data), the license does not require the user to attribute the licensor or comply with any other restrictions or conditions, even if the database is protected by sui generis database rights.
  
==='''''What are sui generis database rights?'''''===
+
While this result is the same across all CC version 3.0 licenses, the reason for this outcome varies. In the 3.0 licenses ported to the laws of EU jurisdictions, the scope of the licenses expressly covers databases subject to copyright and/or sui generis database rights. The conditions of the license are explicitly waived when use of the licensed work only involves the exercise of database rights.
  
Sui generis database rights are different from, but often overlap with, copyright. Sui generis database rights exist to recognize the investment required to compile a database, whether or not the database meets the originality requirement in copyright law.  Established by [http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31996L0009:EN:HTML Directive 96/9/EC] of the European Parliament, sui generis database rights prohibit the extraction or reutilization of a substantial portion (defined in both qualitative and quantitative terms) of the contents of a database. The Directive has been implemented in the national legislation of all EU member countries.  Outside of the European Union, similar database-like rights have been established in several countries, including Mexico and South Korea.  
+
By contrast, the 3.0 unported licenses and all other ported licenses do not expressly license sui generis database rights. As a result, those licenses do not apply when sui generis database rights alone are implicated. This means a licensee may need separate permission to use the database in a way that implicates sui generis database rights (although arguably an implied license to exercise those rights may be deemed granted in some jurisdictions).
  
==='''''How (if at all) are sui generis database rights addressed in CC licenses?'''''===
+
More information on the underlying 3.0 policy decision the treatment of sui generis database rights those licenses can be found [[Media:V3_Database_Rights.pdf|on our wiki (.pdf)]].
  
The treatment of sui generis database rights varies among the CC version 3.0 licenses, but the practical result is always the same: compliance with the license conditions is not required where sui generis database rights - but not copyright - are implicated. This means that if someone takes a substantial portion of a CC-licensed database and uses it in a way that does not implicate copyright (e.g., by rearranging purely factual data), she does not have to attribute the licensor or comply with the other license conditions, even if the database is protected by sui generis database rights.
+
<includeonly>==</includeonly>===What is the difference between the Open Data Commons licenses and the CC 4.0 licenses?===<includeonly>==</includeonly>
  
While this treatment is the same across all CC version 3.0 licenses, the reason for this outcome varies. In the ported 3.0 licenses ported to the laws of EU jurisdictions, works subject to copyright and/or sui generis database rights are licensed and subject to the CC license terms and conditions. In those ported licenses, however, the conditions of the license are explicitly waived when use of the licensed work only involves the exercise of database rights and not copyright.
+
The [http://opendatacommons.org/licenses/odbl/1.0/ Open Database License (ODbL)] and the [http://opendatacommons.org/licenses/by/1.0/ Open Data Commons Attribution License (ODC-BY)] are licenses designed specifically for use on databases and not on other types of material. There are many differences between those licenses and CC licenses, but the most important are related to license scope and operation. The ODC licenses apply only to sui generis database rights and any copyright in the database structure. These licenses do not apply to the individual contents of the database.  The latest version of the CC licenses on the other hand, apply to sui generis database rights and all copyright and neighboring rights in the database structure as well as the contents. See [[Data#How_does_the_treatment_of_sui_generis_database_rights_vary_in_prior_versions_of_CC_licenses.3F|above]] for more detail about how past versions of CC licenses vary with respect to sui generis database rights.  
  
By contrast, all other 3.0 licenses (including ported licenses for non-EU jurisdictions and the international licenses) do not license sui generis database rights at all. As a result, the license conditions do not (nor could they) attach to uses implicating database rights and not copyright. It also means a licensee may need separate permission if they plan to use the database in a way that implicates database rights (although there may arguably be an implied right to do so).
+
Another important difference is that ODC licenses may create contractual obligations even in jurisdictions where database rights would not otherwise exist and would be necessary only for the license permission. CC has crafted its licenses to ensure that they [[Frequently_Asked_Questions#How_do_CC_licenses_operate.3F|never impose obligations where permission is not otherwise required]] to use the licensed material.
  
CC is [https://creativecommons.org/weblog/entry/29639 leaning toward] changing how its licenses treat sui generis database rights in version 4.0. If pursued, those rights would be fully licensed and subject to the same terms and conditions as copyright, without any waiver of the license conditions where only those rights are implicated.  Read more about the issue -- including important limitations that would avoid imposing restrictions where those rights do not exist -- on the [[4.0/License subject matter|version 4.0 wiki]].  
+
<includeonly>==</includeonly>==Frequently asked questions about data in general==<includeonly>==</includeonly>
 +
<includeonly>==</includeonly>===Which components of databases are protected by copyright?===<includeonly>==</includeonly>
 +
There are four components of a database to consider: (1) the database model or structure, (2) the data entry and output sheet, (3) field names, and (4) the data or other content.
  
==='''''How do I apply a CC legal tool to a database?'''''===
+
The '''database model''' refers to how a database is structured and organized, including database tables and table indexes. The selection, coordination, and arrangement of the database is subject to copyright if it is sufficiently original. The originality threshold is fairly low in many jurisdictions. For example, while courts in the United States found an alphabetical telephone directory to be insufficiently original to merit copyright protection, an organized directory of Chinese-American businesses in a particular area was considered to meet this criterion.<ref>Key Publications, Inc. v. Chinatown Today Publishing Enterprises Inc., 945 F.2d 509 (2d Cir. 1991).</ref>  These determinations are very fact-specific and vary by jurisdiction.
  
Before making a database available under a CC license, a database provider must first make sure she has all rights necessary to do so. Often, the database provider is not the original author of the database contents, which may mean the database provider needs separate permissions from third parties before publishing the database under a CC legal tool. For more information, read our [[Before_Licensing|pre-licensing]] guidelines.
+
The '''data entry and output sheets''' contain questions, and the answers to these questions are stored in a database. For example, a web page asking a scientist to enter a gene’s name, its pathway information, and its ontology would constitute a data entry sheet. The format and layout of these sheets are protected by copyright according to the same standard of originality used to determine if the database model is copyrightable.
  
Also, the database provider must consider what elements of the database she wants to be covered by the CC legal tool and identify those elements in a manner that reusers will see and understand. Please see our [[Marking/Creators|marking page]] for more information on how to clearly distinguish unlicensed content.
+
'''Field names''' describe the contents or data. For example, “address” might be the name of the field for street address information. These are less likely to be protected by copyright because they often lack sufficient originality.
  
 +
The '''data''' or other contents contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names or city populations) would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content.
  
 +
<includeonly>==</includeonly>===How do I know whether a particular use of a database is restricted by copyright?===<includeonly>==</includeonly>
 +
When the database structure or its contents is subject to copyright, the reproduction, distribution, or modification of the database will often be restricted by copyright law. It is important to note that some uses of a copyrighted database will not be restricted by copyright. It may be possible, for example, to rearrange or modify the uncopyrightable data in a way that does not implicate the copyright in the database structure. In the case of (as noted above) the court in the United States that held that a directory of Chinese-American businesses was restricted by copyright, the same court went on to hold that a directory that duplicated hundreds of its listings was not infringing because the listings were categorized and arranged in a sufficiently dissimilar way. In those situations, compliance with the license conditions is not required unless the database contents are themselves restricted by copyright.
  
 +
Similarly, even where database contents are subject to copyright and published under a CC license, use of the facts and ideas embedded within the contents will not require attribution (or compliance with other applicable license conditions), unless doing so implicates copyright in the database structure as explained above. This [[Frequently_Asked_Questions#How_do_CC_licenses_operate.3F|important limitation of all CC licenses]] is highlighted on the license deeds in the Notice section, where we emphasize that compliance with the license is not required for elements of the material in the public domain.
  
 +
<includeonly>==</includeonly>===If my use of a database is restricted by copyright, how do I comply with the license?===<includeonly>==</includeonly>
 +
All CC licenses require that you attribute the licensor when your use involves public sharing.  Your other obligations depend on the particular CC license applied to the database. If it is an NC license, any regulated use must be limited to [[Frequently_Asked_Questions#Does_my_use_violate_the_NonCommercial_clause_of_the_licenses.3F|noncommercial purposes]]. If an ND is applied, you may produce an adapted database but cannot share it publicly.  If it is a ShareAlike (SA) license, you must apply the same or a [[FAQ#If_I_derive_or_adapt_material_offered_under_a_Creative_Commons_license.2C_which_CC_license.28s.29_can_I_use.3F|compatible license]] to any adaptation of the database you share publicly.
  
 +
<span id="What_are_sui_generis_database_rights.3F"></span>
 +
<includeonly>==</includeonly>===Which components of a database are protected by sui generis database rights?===<includeonly>==</includeonly>
 +
In contrast to copyright, sui generis database rights are designed to protect a maker's substantial investment in a database.  In particular, these rights prevent the unauthorized extraction and reuse of a [[Data#What_constitutes_a_.E2.80.9Csubstantial_portion.E2.80.9D_of_a_database.3F|substantial portion]] of the contents.
  
 +
<includeonly>==</includeonly>===How do I know whether a particular use of a database is restricted by sui generis database rights?===<includeonly>==</includeonly>
  
 +
When a database is subject to sui generis database rights, extracting and reusing a [[Data#What_constitutes_a_.E2.80.9Csubstantial_portion.E2.80.9D_of_a_database.3F|substantial portion]] of the database contents is prohibited without express exception.
  
 +
'''It is important to remember that sui generis database rights exist in only a few countries outside the [http://en.wikipedia.org/wiki/Database_Directive#Implementation European Union], such as Korea and Mexico. Generally, if you are using a CC-licensed database in a location where those rights do not exist, you do not have to comply with license restrictions or conditions unless copyright (or some other licensed right) is implicated.'''
  
 +
Note that if you are using a database in a jurisdiction where you must respect database rights, and you receive a CC-licensed work from someone located in a jurisdiction without database rights, you should determine whether database rights exist and have been licensed.  If so, you need to properly mark and attribute as the license requires, since the person from whom you received the database may not have been required to keep that information. If you are using a licensed database and you do not have to comply with the license terms because such rights do not exist in your jurisdiction, we recommend that you retain this information where possible.  Doing so assists downstream reusers who are required to provide this information when they share further.
  
 +
<includeonly>==</includeonly>===What constitutes a “substantial portion” of a database?===<includeonly>==</includeonly>
 +
There is no clearly defined rule or standard for what constitutes a “substantial portion”.  The answer will depend on the law in the relevant jurisdiction. Note that what constitutes a substantial portion is determined both quantitatively and qualitatively. Also, using several insubstantial portions can add up to a substantial portion.
  
 +
<span id="How_(if_at_all)_are_sui_generis_database_rights_addressed_in_CC_licenses.3F'"></span>
 +
<includeonly>==</includeonly>===If my use of a database is restricted by sui generis database rights, how do I comply with the license?===<includeonly>==</includeonly>
  
 +
If the database is released under the current version (4.0) of CC licenses, you must attribute the licensor if you share a [[Data#What_constitutes_a_.E2.80.9Csubstantial_portion.E2.80.9D_of_a_database.3F|substantial portion]] of the database contents. The other requirements depend on the particular license applied to the database. Under the NC licenses, you may not extract and reuse a substantial portion of the database contents for [[Frequently_Asked_Questions#Does_my_use_violate_the_NonCommercial_clause_of_the_licenses.3F|commercial purposes]]. The ND licenses prohibit you from including a substantial portion of the database contents in another publicly shared database in which you have sui generis database rights of your own.  The SA licenses require you to apply the same or a [[FAQ#If_I_derive_or_adapt_material_offered_under_a_Creative_Commons_license.2C_which_CC_license.28s.29_can_I_use.3F|compatible license]] to any database you share publicly and in which you include a substantial portion of the licensed database contents. Note that this does '''not''' require you to ShareAlike any copyright or other rights you have in the individual contents of the database.
  
 +
== Notes ==
 +
<references/>
  
 
+
<noinclude>
 
+
[[Category:FAQ]]
 
+
</noinclude>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[http://www.thefunnyquotessayings.com/funny-quotes-on-life/ funny quotes about life]
 

Latest revision as of 10:29, 23 October 2019

This page supersedes Databases and Creative Commons.

In today's society, the potential value of data is very high. Access to more data facilitates enhanced scientific collaboration and reproducibility, more efficient markets, increased government and corporate transparency. This contributes to accelerated discovery and understanding of solutions to planetary and societal needs.

A large part of the potential value of data is realized when data is used across organizational boundaries. There are legal aspects involved in this. Many sites give limited permission to use data via terms of service. There is much ad hoc data sharing among researchers. Increasingly, sharing of data is facilitated by distribution under standard public legal tools. These tools are used to manage copyright and similar restrictions that might otherwise limit dissemination or reuse of data, e.g. CC licenses or the CC0 public domain dedication.

Many organizations, institutions, and governments are using CC tools for data. For case studies about how these tools are applied, see:

Uses of CC Licenses with Data and Databases
Uses of CC0 with Data and Databases

Frequently asked questions about data and CC licenses

Can databases be released under CC licenses?

CC licenses can be used to license databases. The most recent version (4.0) may be used to license databases subject to copyright and, where applicable, sui generis database rights. Sui generis database rights prevent copying and reusing of substantial parts of a database (including frequent extraction of insubstantial parts). Unlike copyright, database rights protect the maker's investment and not their originality.

CC does not recommend use of its NonCommercial (NC) or NoDerivatives (ND) licenses on databases intended for scholarly or scientific use.

In addition to our licenses, the CC0 Public Domain Dedication may be used to maximize reuse of databases. When applied, the effect is to waive all copyright and related rights in the database and its contents, placing it as close as possible into the worldwide public domain. In certain domains, such as science and government, there are important reasons to consider using CC0. Waiving copyright and related rights eliminates all uncertainty for potential users, encouraging maximal reuse and sharing of information.

When a CC license is applied to a database, what is being licensed?

The license terms and conditions apply to the database structure (its selection and arrangement, to the extent copyrightable), its contents (if copyrightable), and in those instances where the database maker has sui generis database rights, to the rights that are granted those makers. It is possible for licensors to license some rather than all of the rights they have in a database. Creative Commons advises against this practice. If a licensor chooses to do so, we strongly encourage licensors to clearly demarcate what is and is not licensed. See below for more information regarding how to provide clear notice of what is licensed.

How do I apply a CC legal tool to a database?

Before making a database available under a CC license, database providers should first make sure they have all rights necessary to do so. Often, the database provider is not the original author of the database contents. If that is the case, the database provider should secure separate permission from the other author(s) before publishing the database under a CC legal tool. If database makers decide to license the database without securing permission from the author(s) of the database contents, they should clearly indicate the material for which permission has not been secured and clearly mark the material as not being offered under the terms of the license. For more information, read our pre-licensing guidelines.

Database providers should also consider carefully what elements of the database they want covered by the CC legal tool and identify those elements in a manner that reusers will see and understand. Please see our marking page for more information on how to clearly distinguish unlicensed content.

How do the different CC license elements operate for a CC-licensed database?

Under version 4.0, if an NC license has been applied then any use of the licensed database or its contents that is restricted by copyright law or sui generis database rights requires compliance with the NC term, even if the database is not publicly shared. The other license elements (BY, ND, and SA, as applicable) must be complied with only if your use is so restricted and public sharing is involved. Learn more about how to comply when your use implicates copyright and/or sui generis database rights.

Prior CC license versions do not require compliance with the license restrictions or conditions when only sui generis database rights (and not copyright) are implicated. Please see below for more detail about how this works in the current and prior versions of the licenses.

Can I conduct text/data mining on a CC-licensed database?

It is possible to conduct mining activities on a CC-licensed database. Whether you have to comply with the CC license terms and conditions will depend on whether the type of mining activity you conduct implicates copyright or any applicable sui generis database rights. If you are not exercising an exclusive right held by the database maker, then you do not need to rely on the license to mine. As there are many different methods for conducting text and data mining, there may be some types of mining activities that will implicate the licensed rights.

If your particular use is one that would require permission, you should note the following:

  • Permission: All six of the 4.0 licenses allow for text and data mining by granting express permission to privately reproduce, extract, and reuse the contents of a licensed database and create adapted databases.
  • Commercial purposes: If you are conducting text and data mining for commercial purposes, you should not mine NC-licensed databases or other material.
  • Outputs: If you publicly share the results of your mining activity or the data you mined, you should attribute the rights holder. If what you publicly share qualifies as an adaptation of the licensed material, you should not mine ND-licensed material. If you share an adaptation of material under an SA license, you must apply the same license to this adaptation.

If your use is not one that requires permission under the license, the above considerations do not apply and you may conduct text and data mining activity.

How does the treatment of sui generis database rights vary in prior versions of CC licenses?

As explained above, the current version of the CC license suite (4.0) licenses sui generis database rights in addition to copyright and other closely related rights. Past versions of CC licenses operate differently with respect to sui generis database rights.

In the CC version 3.0 licenses, the legal treatment of sui generis database rights varies, but the practical result is always the same: compliance with the license restrictions and conditions is not required where sui generis database rights - but not copyright - are implicated. This means that if a substantial portion of a CC-licensed database is extracted and used in a way that does not implicate copyright (e.g., by rearranging purely factual data), the license does not require the user to attribute the licensor or comply with any other restrictions or conditions, even if the database is protected by sui generis database rights.

While this result is the same across all CC version 3.0 licenses, the reason for this outcome varies. In the 3.0 licenses ported to the laws of EU jurisdictions, the scope of the licenses expressly covers databases subject to copyright and/or sui generis database rights. The conditions of the license are explicitly waived when use of the licensed work only involves the exercise of database rights.

By contrast, the 3.0 unported licenses and all other ported licenses do not expressly license sui generis database rights. As a result, those licenses do not apply when sui generis database rights alone are implicated. This means a licensee may need separate permission to use the database in a way that implicates sui generis database rights (although arguably an implied license to exercise those rights may be deemed granted in some jurisdictions).

More information on the underlying 3.0 policy decision the treatment of sui generis database rights those licenses can be found on our wiki (.pdf).

What is the difference between the Open Data Commons licenses and the CC 4.0 licenses?

The Open Database License (ODbL) and the Open Data Commons Attribution License (ODC-BY) are licenses designed specifically for use on databases and not on other types of material. There are many differences between those licenses and CC licenses, but the most important are related to license scope and operation. The ODC licenses apply only to sui generis database rights and any copyright in the database structure. These licenses do not apply to the individual contents of the database. The latest version of the CC licenses on the other hand, apply to sui generis database rights and all copyright and neighboring rights in the database structure as well as the contents. See above for more detail about how past versions of CC licenses vary with respect to sui generis database rights.

Another important difference is that ODC licenses may create contractual obligations even in jurisdictions where database rights would not otherwise exist and would be necessary only for the license permission. CC has crafted its licenses to ensure that they never impose obligations where permission is not otherwise required to use the licensed material.

Frequently asked questions about data in general

Which components of databases are protected by copyright?

There are four components of a database to consider: (1) the database model or structure, (2) the data entry and output sheet, (3) field names, and (4) the data or other content.

The database model refers to how a database is structured and organized, including database tables and table indexes. The selection, coordination, and arrangement of the database is subject to copyright if it is sufficiently original. The originality threshold is fairly low in many jurisdictions. For example, while courts in the United States found an alphabetical telephone directory to be insufficiently original to merit copyright protection, an organized directory of Chinese-American businesses in a particular area was considered to meet this criterion.[1] These determinations are very fact-specific and vary by jurisdiction.

The data entry and output sheets contain questions, and the answers to these questions are stored in a database. For example, a web page asking a scientist to enter a gene’s name, its pathway information, and its ontology would constitute a data entry sheet. The format and layout of these sheets are protected by copyright according to the same standard of originality used to determine if the database model is copyrightable.

Field names describe the contents or data. For example, “address” might be the name of the field for street address information. These are less likely to be protected by copyright because they often lack sufficient originality.

The data or other contents contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names or city populations) would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content.

How do I know whether a particular use of a database is restricted by copyright?

When the database structure or its contents is subject to copyright, the reproduction, distribution, or modification of the database will often be restricted by copyright law. It is important to note that some uses of a copyrighted database will not be restricted by copyright. It may be possible, for example, to rearrange or modify the uncopyrightable data in a way that does not implicate the copyright in the database structure. In the case of (as noted above) the court in the United States that held that a directory of Chinese-American businesses was restricted by copyright, the same court went on to hold that a directory that duplicated hundreds of its listings was not infringing because the listings were categorized and arranged in a sufficiently dissimilar way. In those situations, compliance with the license conditions is not required unless the database contents are themselves restricted by copyright.

Similarly, even where database contents are subject to copyright and published under a CC license, use of the facts and ideas embedded within the contents will not require attribution (or compliance with other applicable license conditions), unless doing so implicates copyright in the database structure as explained above. This important limitation of all CC licenses is highlighted on the license deeds in the Notice section, where we emphasize that compliance with the license is not required for elements of the material in the public domain.

If my use of a database is restricted by copyright, how do I comply with the license?

All CC licenses require that you attribute the licensor when your use involves public sharing. Your other obligations depend on the particular CC license applied to the database. If it is an NC license, any regulated use must be limited to noncommercial purposes. If an ND is applied, you may produce an adapted database but cannot share it publicly. If it is a ShareAlike (SA) license, you must apply the same or a compatible license to any adaptation of the database you share publicly.

Which components of a database are protected by sui generis database rights?

In contrast to copyright, sui generis database rights are designed to protect a maker's substantial investment in a database. In particular, these rights prevent the unauthorized extraction and reuse of a substantial portion of the contents.

How do I know whether a particular use of a database is restricted by sui generis database rights?

When a database is subject to sui generis database rights, extracting and reusing a substantial portion of the database contents is prohibited without express exception.

It is important to remember that sui generis database rights exist in only a few countries outside the European Union, such as Korea and Mexico. Generally, if you are using a CC-licensed database in a location where those rights do not exist, you do not have to comply with license restrictions or conditions unless copyright (or some other licensed right) is implicated.

Note that if you are using a database in a jurisdiction where you must respect database rights, and you receive a CC-licensed work from someone located in a jurisdiction without database rights, you should determine whether database rights exist and have been licensed. If so, you need to properly mark and attribute as the license requires, since the person from whom you received the database may not have been required to keep that information. If you are using a licensed database and you do not have to comply with the license terms because such rights do not exist in your jurisdiction, we recommend that you retain this information where possible. Doing so assists downstream reusers who are required to provide this information when they share further.

What constitutes a “substantial portion” of a database?

There is no clearly defined rule or standard for what constitutes a “substantial portion”. The answer will depend on the law in the relevant jurisdiction. Note that what constitutes a substantial portion is determined both quantitatively and qualitatively. Also, using several insubstantial portions can add up to a substantial portion.

If my use of a database is restricted by sui generis database rights, how do I comply with the license?

If the database is released under the current version (4.0) of CC licenses, you must attribute the licensor if you share a substantial portion of the database contents. The other requirements depend on the particular license applied to the database. Under the NC licenses, you may not extract and reuse a substantial portion of the database contents for commercial purposes. The ND licenses prohibit you from including a substantial portion of the database contents in another publicly shared database in which you have sui generis database rights of your own. The SA licenses require you to apply the same or a compatible license to any database you share publicly and in which you include a substantial portion of the licensed database contents. Note that this does not require you to ShareAlike any copyright or other rights you have in the individual contents of the database.

Notes

  1. Key Publications, Inc. v. Chinatown Today Publishing Enterprises Inc., 945 F.2d 509 (2d Cir. 1991).