Data

From Creative Commons
Revision as of 21:21, 21 December 2011 by CCID-shinchpearson (talk | contribs)
Jump to: navigation, search

Much of the potential value of data is to society at large — more data has the potential to facilitate enhanced scientific collaboration and reproducibility, more efficient markets, increased government and corporate transparency, and overall to speed discovery and understanding of solutions to planetary and societal needs.

A big part of the potential value of data, in particular its society-wide value, is realized by use across organizational boundaries. How does this occur (legally)? Many sites give narrow permission to use data via terms of service. Much ad hoc data sharing occurs among researchers. And increasingly, open data is facilitated by sharing under public terms to manage copyright restrictions that might otherwise limit dissemination or reuse of data, e.g. CC licenses or the CC0 public domain dedication.

Many organizations, institutions, and governments are using CC tools for data. For case studies about how the tools are applied, click here. You can also read more about Creative Commons' most up-to-date thinking on data and databases, and what you can do to contribute.

Frequently asked questions about data

Can databases be released under CC licenses?

Yes, CC licenses can be used on any copyrighted work, including a copyrighted database. A CC license may be applied to any or all copyrighted aspects of a database and its contents. See below for more information regarding how to provide clear notice of what is licensed. Any use of the licensed database or its contents that is restricted by copyright law requires compliance with the relevant license conditions (BY, SA, NC, ND). In their current version (3.0) CC licenses do not require compliance with the license conditions when only sui generis database rights (and not copyright) are implicated. Additionally, the international and "ported" version 3.0 licenses, excluding EU jurisdiction ports, do not grant any permissions where sui generis database rights are implicated. Please see below for more detail.

CC0, the public domain dedication, can also be used on databases. The effect is to waive all copyright and related rights in the database, placing it as close as possible into the worldwide public domain. In certain domains, such as science and government, there are important reasons to consider using tools like CC0. Waiving copyright and related rights eliminates all uncertainty for potential users, encouraging maximal reuse and sharing of information. Where waiver is not a viable option and some conditions on reuse are necessary, rights holders should consider using CC licenses that give the public more freedom to reuse and remix the content.

Which components of a database are protected by copyright?

With databases, there are likely four components to consider: (1) the database model or structure, (2) the data entry and output sheet, (3) field names, and (4) the data.

The database model is a specification describing how a database is structured and organized, including database tables and table indexes. The selection, coordination, and arrangement of the contents is subject to copyright if it is sufficiently original. The threshold of originality required for copyright is fairly low in many jurisdictions. For example, while courts in the United States have held that an alphabetical telephone directory did not have sufficient originality to merit copyright protection, an organized directory of Chinese-American businesses in a particular area did. These determinations are very fact-specific (no pun intended) and vary by jurisdiction.

The data entry and output sheets contain questions, and the answers to these questions are stored in a database. For example, a web page asking a scientist to enter a gene’s name, its pathway information, and its ontology would constitute a data entry sheet. The format and layout of these sheets are protected by copyright according to the same standard of originality used to analyze copyright in the database model.

Field names describe data sets. For example, “address” might be the name of the field for street address information. These are less likely to be protected by copyright because they often do not reflect originality.

The data contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names without more) contained in a database would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content.

How do I know whether a particular use of a database is restricted by copyright?

When the database structure or contents are subject to copyright, reproducing, distributing, or modifying the database will often be restricted by copyright law. If the database is released under a CC license, that means reproduction, distribution, or modification will likely require compliance with the relevant license conditions, including attribution.

However, it is important to note that some uses of a copyrighted database will not be restricted by copyright. It may be possible, for example, to rearrange or modify the uncopyrightable data in a way that does not implicate the copyright in the database structure. For example, while (as noted above) a court in the United States held that a directory of Chinese-American businesses was restricted by copyright, the same court went on to hold that a directory that duplicated hundreds of its listings was not infringing because the listings were categorized and arranged in a sufficiently dissimilar way. In those situations, compliance with the license conditions is not required unless the database contents are themselves restricted by copyright.

Similarly, even where database contents are subject to copyright and published under a CC license, use of the facts and ideas found in the contents will not require attribution (or compliance with other applicable license conditions), unless doing so implicates copyright in the database structure as explained above. This important limitation of CC licenses is reflected in the license deed, where it indicates that the license does not extend to those elements of the work in the public domain.

What are sui generis database rights?

Sui generis database rights are different from, but often overlap with, copyright. Sui generis database rights exist to recognize the investment required to compile a database, whether or not the database meets the originality requirement in copyright law. Established by Directive 96/9/EC of the European Parliament, sui generis database rights prohibit the extraction or reutilization of a substantial portion (defined in both qualitative and quantitative terms) of the contents of a database. The Directive has been implemented in the national legislation of all EU member countries. Outside of the European Union, similar database-like rights have been established in several countries, including Mexico and South Korea.

How (if at all) are sui generis database rights addressed in CC licenses?

The treatment of sui generis database rights varies among the CC version 3.0 licenses, but the practical result is always the same: compliance with the license conditions is not required where sui generis database rights - but not copyright - are implicated. This means that if someone takes a substantial portion of a CC-licensed database and uses it in a way that does not implicate copyright (e.g., by rearranging purely factual data), she does not have to attribute the licensor or comply with the other license conditions, even if the database is protected by sui generis database rights.

While this treatment is the same across all CC version 3.0 licenses, the reason for this outcome varies. In the ported 3.0 licenses ported to the laws of EU jurisdictions, works subject to copyright and/or sui generis database rights are licensed and subject to the CC license terms and conditions. In those ported licenses, however, the conditions of the license are explicitly waived when use of the licensed work only involves the exercise of database rights and not copyright.

By contrast, all other 3.0 licenses (including ported licenses for non-EU jurisdictions and the international licenses) do not license sui generis database rights at all. As a result, the license conditions do not (nor could they) attach to uses implicating database rights and not copyright. It also means a licensee may need separate permission if they plan to use the database in a way that implicates database rights (although there may arguably be an implied right to do so).

CC is leaning toward changing how its licenses treat sui generis database rights in version 4.0. If pursued, those rights would be fully licensed and subject to the same terms and conditions as copyright, without any waiver of the license conditions where only those rights are implicated. Read more about the issue -- including important limitations that would avoid imposing restrictions where those rights do not exist -- on the version 4.0 wiki.

How do I apply a CC legal tool to a database?

Before making a database available under a CC license, a database provider must first make sure she has all rights necessary to do so. Often, the database provider is not the original author of the database contents, which may mean the database provider needs separate permissions from third parties before publishing the database under a CC legal tool. For more information, read our pre-licensing guidelines.

Also, the database provider must consider what elements of the database she wants to be covered by the CC legal tool and identify those elements in a manner that reusers will see and understand. Please see our marking page for more information on how to clearly distinguish unlicensed content.