BMJ Open: a new partner and an expanded scope

Dryad is pleased to welcome BMJ Open as a new partner journal, reflecting the recently expanded scope of repository to be inclusive of all of basic and applied biosciences, including medicine. BMJ Open is a new online-only, open access journal from the esteemed London-based BMJ Group.  It is dedicated to publishing medical research from all disciplines and therapeutic areas, utilizing fully open peer review and immediate online publication.

BMJ Open authors are now being strongly encouraged to deposit the data underlying their articles in Dryad or a more specialized repository, as appropriate.  Authors submitting articles to the journal will benefit from Dryad’s journal submission integration, the process by which data deposit is streamlined for authors through behind-the-scenes communication between the journal and the repository.

An extremely important issue with archiving medical data is, of course, the need to protect patient privacy. To assist its authors, BMJ Open is providing special guidance on data sharing.  Authors must be able to release data to the public domain as with all data in Dryad, and the repository will err on the side of caution by turning back any data that may compromise patient privacy.

To quote from the BMJ Group press release:

Data sharing aims to help scientists and doctors validate and scrutinise researchers’ findings in a bid to prevent fraud and eradicate the kind of selective reporting that has enabled some treatments to acquire regulatory approval, based on incomplete and biased data. In some cases this lack of transparency has prompted the subsequent restriction or withdrawal of certain treatments because of patient safety or effectiveness concerns, which were already evident in the unpublished data.  Data repositories also allow researchers to develop new methods of analysis and use the data to answer questions that the original researchers have not thought of. They also facilitate the acquisition of data for meta analysis (more in-depth comparative reviews).

Commenting on the move, Dr Trish Groves, editor in chief of BMJ Open, said: “Since launch, BMJ Open has championed transparency in medical research through open peer review, open access, and full reporting of studies’ methods and results, all exemplified by last week’s paper on the safety (or not) of medical devices (doi:10.5061/dryad.585t4)…”

This data package in Dryad, which illustrates the tremendous value of medical data for informing medical policy and practice without compromising patient privacy, is available at:

  • Heneghan C, Thompson M, Billingsley M, Cohen D (2011) Data from: Medical-device recalls in the UK and the device-regulation process: retrospective review of safety notices and alerts. Dryad Digital Repository. doi:10.5061/dryad.585t4

Groves goes on to say

We strongly encourage authors to share their datasets, and now we’re delighted to be making that easier to do, with the help of DryadUK.

Kudos to the Dryad UK project team, based at the British Library, for facilitating this pioneering partnership.

Request for input: archiving and licensing software

Behind a scientific finding, in addition to unique data, there is often unique software. If Dryad archives data in part to allow others to validate the findings reported in the literature, then should we not also enable researchers to archive the software that was used to process, analyze and, in the case of simulations — create those data?

Some users have already deposited software source code alongside their data (e.g. doi:10.5061/dryad.8384, doi:10.5061/dryad.18) [1]. If users are willing and able to release their code under a CC-Zero waiver [2], then there is nothing stopping this practice. In fact, Creative Commons and the Free Software Foundation have recently stated that CC-Zero is appropriate for release of software to the public domain [3].

Yet, a number of journal partners and users have requested that Dryad provide more, or different, options for software, and that authors should not be required to waive legal rights with CC-Zero. Since software is clearly a creative work, source code unambiguously carries copyrightable intellectual property. Enabling a greater range of licensing options could open the door to more authors archiving software that is integral to their paper, and this would further Dryad’s mission of enabling scientists to validate and build upon previously work. So, how should we do that?

One important consideration is that we aim to make the submission process as easy as possible for users. This would be compromised by presenting a confusing array of licensing options, and having those differ between types of files.

The principle desiderata of a license for deposited software are more or less the same as for data: freedom to reuse, modify (analogous to the “recombine” for data), and redistribute (in original or modified form), with no more than attribution expected or required. It turns out that these are also the principles common to all licenses approved by the Open Source Initiative, or OSI [4].

So, could we just pick one of the minimally restrictive OSI-approved licenses (since we want to facilitate reuse rather than hamper it), and require release of software under those terms? We are currently of the opinion that the answer is “no”, for a couple of reasons:

(1) Some, though not all, software will already be licensed. Asking a user to choose a different one would clearly be a burden, since changing a license requires express consent from all copyright holders, including possibly the employer or funder.

(2) If the software includes third-party code to which a ‘share-alike’ license has been assigned (e.g. the GNU Public License, or GPL [5]) , then the user is required to release the code under equivalent licensing terms. Unlike for data, it would be highly unusual to combine software source code from many different sources, and so this does not pose an insurmountable barrier to archiving and reuse for scientific purposes.

Given the above, our current thinking is that Dryad should enable users to select any OSI-approved license they deem appropriate. However, we also wish to strongly guide users, when there is no prior license assigned to any part of their software, to choose either a non-share alike OSI license or a CC-Zero waiver. It is currently unclear whether dedicating software to the public domain with CC-Zero would be of as much value as it is for data [6]. We’d welcome your thoughts on that.

There are some other considerations on our plate, as well:

  • We want to be careful to avoid steering users away from using a public source code repository when that is more appropriate [7]. Is it better for Dryad to host code snapshots, or to direct users to specific versions of software in a public code repository?
  • Some users bundle software and data together in tarballs or zip archives. Since we cannot easily assign different terms to the data and software within such a combined file, it could increase the burden on users to separate these components out.
  • In addition to software, there is other content that publishers host in Supplemental Materials that some of our partner journals would like Dryad to host, instead. To the extent that some of this content is neither data nor software, should we be recognizing a third category of intellectual property, to which a license such as CC-BY [8] would be assigned?

If you have opinions or ideas, we would like to encourage you to share them with us as public comments on this blog. What’s the best way to accommodate software (and other non-data material) within Dryad?


[1] Some software source code in Dryad is already available under grandfathered license terms, such as in doi:10.5061/dryad.18.

[2] Dryad currently requires users to assign CC-Zero to all archived files. This waives all copyright and related rights in the data (to the extent legally possible in an author’s jurisdiction), effectively dedicating the data to the public domain. The use of CC-Zero is predicated on most data being “facts”, and facts in most jurisdictions cannot be copyrighted, although this not universally true (e.g. photographs). Note that Dryad has a policy that the original article and the data package are to be cited when the data are reused, but we feel that this is most appropriately enforced through scholarly practice, not through a license.

[3] According to Creative Common’s FAQ, CC-Zero “is suitable for dedicating your copyright and related rights in computer software to the public domain, to the fullest extent possible under law. Unlike CC licenses, which should not be used for software, CC0 is compatible with many software licenses, including the GPL“.



[6] For the motivation behind the recommended use of CC-Zero for data, see the Science Commons Protocol for Implementing Open Access Data

[7] Public open source code repositories include generic ones, such as Sourceforge, as well as those specific to particular types of code, such as R-forge for R, and CPAN for Perl. For more about best practices in scientific software development, see Baxter SM, Day SW, Fetrow JS, Reisinger SJ (2006) Scientific Software Development Is Not an Oxymoron. PLoS Comput Biol 2(9): e87. doi:10.1371/journal.pcbi.0020087


[9] Many thanks to H. Lapp for starting this post. I (T. Vision) take responsibility for the opinions expressed here, as well as any sins of omission or commission.

A new creature in the biodiversity world: the data paper

Dryad is happy to announce a new initiative with Pensoft Publishers, the pioneering publisher behind ZooKeys and other rapid-publication open access journals, including BioRisk, Comparative Cytogenetics, International Journal of Myriapodology, Journal of Hymenoptera Research, NeoBiota, PhytoKeys, and Subterranean Biology.  Dryad is working with Pensoft to support publication of data papers in the area of biodiversity, together with the Global Biodiversity Information Facility and the Barcode of Life.  Through this effort, we aim to make the data publishing experience as smooth and rewarding as possible for authors, while at the same time making sure these important data are vetted through peer review and available for reuse in public repositories.  The full press release from Pensoft is below.

Data publishing policies and guidelines for biodiversity data published by Pensoft

Pensoft Publishers announced a data publishing project for biodiversity data in response to the increasing demands from institutions and scientists to open scientific data to anyone who would be interested to use them.

“An opinion survey amongst the authors, readers and editors of the Pensoft journal ZooKeys carried out in April convinced us that the majority of participants (84 %) are willing to publish their data, so that to make them available to anyone to use, share or integrate with other data” said Dr Lyubomir Penev, managing director of Pensoft Publishers. Among the most important incentives to publish data, the scientists mentioned  that  “open data increases transparency and the overall quality of science, the potential for collaborative research as well as an opportunity to increase academic credit in the form of citations. Therefore, providing a service to ensure a permanent publication record for published data is of key importance for the success of the project”, adds Dr Penev.

The core of the project is the concept of the “Data Paper” developed in a cooperation with the Global Biodiversity Information Facility (GBIF). Data Papers are peer-reviewed scholarly publications that describe the published datasets and provide an opportunity to data authors to receive the academic credit for their efforts. Currently, Pensoft offers the opportunity to published Data papers describing biodiversity data, Barcode of Life genome data and biodiversity-related software tools, such as interactive keys and others.

Pensoft reached an agreement for cooperation in data hosting and developing of data publishing workflows with the GBIF, the Dryad Data Repository and the Consortium for Barcode of Life.

“Data publishing becomes increasingly important and already affects the policies of the world’s leading science funding frameworks and organizations. Opening and integrating biodiversity data will be the future basis to increase efficiency of monitoring the processes of global change, conservation of nature and saving life on our planet” concluded Dr Vincent Smith, coordinator of the European Union FP7 project ViBRANT, in the framework of which a part of the work has been carried out.