Good data practices: Removing barriers to data reuse with CC0 licensing

Guest post contributed by Bryan Gee, Data Curator at Dryad.

Why is CC0 a great choice for open data? Learn to love this frequently misunderstood license waiver.

Authors who submit data to Dryad are asked to consent to the publication of their data under the The Creative Commons Public Domain Dedication, more commonly known as CC0. In doing so, authors are being asked to confirm that any materials that have been previously published by another author or working group were published under conditions compatible with CC0 and that they agree to novelly publish any previously unpublished materials under this waiver. 

Given the continually evolving research landscape, our curation team frequently receives questions about what CC0 means in relation to their data. Let’s review the advantages of CC0 as well as some common concerns and misconceptions that we encounter to guide researchers in data sharing and to explain why we only publish data under CC0. 

CC0 as a waiver, not a license

Creative Commons (CC) licenses are a widely adopted standard for scholarly outputs and are also employed for a wide range of other digitally disseminated media (many images on Wikipedia are hosted under a CC license, for example). The CC BY (Attribution) license is particularly common in research communities, as it is the license under which open access articles are frequently published. There are also many other standard open licenses, such as those specifically for software.

In contrast, CC0 is not a license but rather a waiver of the owner or creator’s copyright. It dedicates a work to the public domain without restrictions or conditions for reuse, modification, or redistribution. 

CC0 is the best choice to maximize data reuse

Data reuse is the ideal outcome of open data sharing. Improper assertions of copyright and licensing restrictions over material that is unlikely to fall under the umbrella of copyright law places the onus on potential users to decipher legal texts in order to ascertain whether content is in fact copyrightable. This can be rather onerous and may create aversion to reuse out of fear of legal action over improper use, particularly for more restrictive licensing. Even if there is little ambiguity over whether a copyright claim can be made, uncertainty about how to follow the prescribed conditions can also stifle reuse out of fear of legal action. For example, all CC licenses require attribution, but this must be done in the specific fashion prescribed by the creator(s). When many works with such licensing are compiled, this can create greater ambiguity and burdens for users.

With CC0, there is no ambiguity about restrictions on the data, which, again, does not license potential users to ignore established community standards like citation or collaboration. Additionally, it avoids complications around so-called “attribution stacking,” which are becoming increasingly common as researchers compile large datasets sourced from many independently licensed works (a typical downside of CC BY compared to CC0). Lastly, it further relieves the publisher of the data of any legal burden to monitor their data’s reuse, and if need be, to seek legal recourse against perceived improper actions (which many individuals likely lack the time or resources to do). 

How will I get credit for my data?

The concept of waiving copyright to scholarly outputs is frequently unnerving to researchers, who expect to be accorded credit for their work. In particular, authors sometimes inquire why we do not publish datasets under CC BY, which requires attribution of the original creator but otherwise does not impose other conditions. 

The expectation that one be cited for one’s scholarly outputs, such as a hypothesis published in an article, is a standard research community practice. It is not, however, a legal requirement, and, therefore, should not be conflated with attribution, which is a legal requirement. Ultimately, the best way to promote practices like citation is to lean on the weight of community norms, not the threat of legal action.

Will I get “scooped” if I publish my data under CC0?

Some authors are concerned that releasing a dataset for which they have future projects intended may lead them to be “scooped” by another researcher. Scooping is more of a community problem and a moral ill than it is a legal violation. In many cases, restrictive licensing will not prevent scooping because most licenses do not require a potential user to request permission from the creator of an output. CC BY, for example, does not offer any extra protection relative to CC0 to guard against another researcher using your published data for their own study, whether you intended to do a similar study or not. 

CC0 does not exempt or preclude users of outputs published under this waiver from observing established community standards, of which proper citation is but one of many. Expectations of citation and the practice of doing so should be thought of as a positive contribution to a research community, not as an action taken under duress out of fear of legal action. 

The bottom line

Dryad has long used and advocated for the publication of research data under CC0 because we believe that it is the best approach for ensuring and maximizing accessibility and reuse potential for datasets. Under CC0, there is no improper or ambiguous assertion of copyright when none may in fact be applied, and potential users need not fear legal actions taken in relation to their reuse. CC0 does not preclude users from following established community norms related to recognition of data generation; instead, by removing potential obstacles to reuse, publishing data under CC0 supports broad community reuse in tandem with continual practice of community norms like citation.

###

Feedback and questions are always welcome, to hello@datadryad.org. To keep in touch with the latest updates from Dryad, follow us on Twitter, Facebook or LinkedIn.