A Guide to Mining CC-Licensed Material

This guide summarizes whether and how CC licenses operate in the context of text and data mining. The analysis begins with a fundamental principle of CC licensing: the licenses apply only when permission is needed under the copyright or similar rights being licensed. If permission is not necessary to undertake the particular mining activity, then the CC license does not apply and you can disregard its terms and conditions. Even if the CC license does not apply, there may be other agreements (such as website terms of use) that restrict what you may do with the material. While this guide does not address how those may affect your ability to mine CC-licensed content, you should be aware that those agreements may exist and adversely affect your ability to do so.[1]

As to what you can do with CC-licensed material, the determination involves asking the three questions given below. Question #1 will tell you if permission under the license is necessary. If no permission is necessary, then you do not need to answer the other questions because the license does not apply and you are free to text and data mine (at least as a matter of copyright and similar rights[2]). If the answer is yes permission is necessary, the license applies and you should answer Question #2 and Question #3 to determine any CC license-imposed limits on your use of the licensed material. See Table 1, below, for a summary of how CC licenses operate in the context of text and data mining.

Q1. Is permission necessary to undertake the particular mining activity? Under the 4.0 licenses, a licensor grants the public permission to exercise rights under copyright, neighboring rights, and similar rights closely related to copyright, including sui generis database rights.[3] The license only applies when at least one of these rights held by the licensor applies to the use made by the licensee. Note that in many (if not most) cases, text and data mining falls an under an applicable exception and limitation to copyright, such as fair use in the United States or if done for the purpose of non commercial research in the United Kingdom. In such cases, permission is not necessary unless another licensed right applies to your use, such as database rights.[4]
If the answer is no, the CC license does not restrict the text and data mining activity and the analysis is complete. You do not need to answer Q2 or Q3.
If the answer is yes, then you should answer the following questions to determine how, if at all, the license terms and conditions affect your text and data mining activities.
Q2. Is the text and data mining activity being undertaken primarily for commercial purposes?
If the answer is no, you can freely mine material licensed under any of the six CC licenses.
If the answer is yes, you may not mine material licensed under a CC NonCommercial license (BY-NC, BY-NC-SA, BY-NC-ND).
Q3. Are any outputs of your mining activity an adaptation of the underlying material you mined? Not all outputs of text and data mining are necessarily adaptations of the licensed material: for example, an analysis based on your findings would not constitute an adaptation. The data set resulting from material you mined, however, may be an adaptation if it is a modified version of the original dataset.
If the answer is no, you can share all outputs from mining any of the six CC licenses so long as you provide proper attribution and otherwise abide by the CC license terms and conditions.
If the answer is yes, you can still use the adaptation but you can only share it with others if the CC license on the material mined allows you to do so (BY, BY-SA, BY-NC-SA, BY-NC).


  1. Creative Commons cannot control or prohibit separate agreements or understandings that involve or affect our standard licenses, since CC is not a party to the licenses. CC strongly discourages the imposition of additional terms and conditions that carve back on the permissions granted or that add reuse restrictions. See the CC License Modification Policy for more information.
  2. Copyright and similar rights are defined in the version 4.0 licenses in Section 1(c).
  3. Prior versions of CC licenses operate the same way, although pre-4.0 licenses cover fewer categories of rights.
  4. Sui generis database rights only exist in the European Union and a few other countries, such as Mexico and South Korea. If you are using CC-licensed content and are located in a jurisdiction where sui generis database rights are not granted by national law, then it is unlikely these rights apply to your use.