Dryad is a general purpose repository for data underlying scholarly publications. Each new submission we receive is reviewed by our curation team before the data are archived. Our main priority is to ensure compliance with Dryad’s Terms of Service, but we also strongly believe that curation activities add value to your data publication, since curated data are more likely to be FAIR (findable, accessible, interoperable, and reusable).


Before we register a DOI, a member of our curation team will check each data package to ensure that the data files can be opened, that they appear to contain information associated with a scientific publication, and that metadata for the associated publication are technically correct. We prefer common, non-proprietary file types and thorough documentation, and we may reach out if we are unable to view files as provided.

Our curators are also on the lookout for sensitive information such as personally identifiable human subjects data or protected location information, and for files that contain copyright and license statements that are incompatible with our required CC0 waiver.

To make the data archiving process more straightforward for authors, our curation team has authored sets of guidelines that may be consulted when preparing a data submission for a public repository such as Dryad. We hope these guidelines will help you as you prepare your Dryad data package, and that they will lessen the amount of time from point of submission to registered data DOI!

A series of blog posts will highlight each of the guidelines we’ve created. First up is our best practices for sharing human subjects data in an open access repository, from former Dryad curator Rebecca Kameny.

— Erin Clary, Senior Curator – curator@datadryad.org


Preparing human subject data for open access

Collecting, cleaning, managing, and analyzing your data is one thing, but what happens when you are ready to share your data with other researchers and the public?

peopleBecause our researchers come from fields that run the gamut of academia — from biology, ecology, and medicine, to engineering, agriculture, and sociology — and because almost any field can make use of data from human subjects, we’ve provided guidance for preparing such data for open access. We based our recommendations and requirements on well-respected national and international sources from government institutions, universities, and peer-reviewed publications.

Dryad curators will review data files for compliance with these recommendations, and may make suggestions to authors, however, authors who submit data to Dryad are ultimately responsible for ensuring that their data are properly anonymized and can be shared in a public repository.

handle-43946_960_720In a nutshell, Dryad does not allow any direct identifiers, but we do allow up to three indirect identifiers. Sound simple? It’s not. If the study involves a vulnerable population (such as children or indigenous people), if the number of participants is small, or if the data are sensitive (e.g., HIV status, drug use), three indirect identifiers may be too many. We evaluate each submission on a case-by-case basis.

If you have qualitative data, you’ll want to pay close attention to open-ended text, and may need to replace names with pseudonyms or redact identifiable text.

Quick tips for preparing human subjects data for sharing

  • Ensure that there are no direct identifiers.
  • Remove any nonessential identifying details.
  • Reduce the precision of a variable – e.g., remove day and month from date of birth; use county instead of city; add or subtract a randomly chosen number.
  • Aggregate variables that are potentially revealing, such as age.
  • Restrict the upper or lower ranges of a continuous variable to hide outliers by collapsing them into a single code.
  • Combine variables by merging data from two variables into a summary variable.

It’s also good research practice to provide clear documentation of your data in a README file. Your README should define your variables and allowable values, and can be used to alert users to any changes you made to the original dataset to protect participant identity.

Our guidelines expand upon the tips above, and link to some useful references that will provide further guidance to anyone who would like to share human subjects data safely.

Over the past 3 ½ years, Dryad has become an independent organization with a committed team and organizational capacity. During this time, integrations and partnerships have expanded and sustainability plans have grown. From these efforts, we increased the amount of curated and openly published data available to the public. With great pride and bittersweet feelings, I will be moving on to pursue a new opportunity on Feb 23.

Working with the staff has meant collaborating with a group of committed, mission-driven professionals. Leading this group to become a collegial and very high-functioning team has been my absolute pleasure. I have also been honored to be accepted as an equal in the field of open data advocates and crafters of scholarly communication workflows, and to be able to share my vision of Dryad as a critical service. The support, encouragement, and concern of the Dryad board of directors was always behind me, and I’ve been energized at what we’ve accomplished in support of curated, open, and FAIR data.

A search for a new Executive Director has begun. This person will have the opportunity to develop mission-critical business strategies and to offer an innovative vision for promoting data openness in the scientific community and securing Dryad’s place as a key facilitator of data sharing. With the Dryad board’s support, Elizabeth Hull, Dryad Operations Manager, is filling in during the interim.

I want to thank our incredibly supportive community of submitters, members, partners, and collaborators for their dedication to open data and to Dryad’s mission. This next phase for the organization is now beginning. We invite you to join us and grow Dryad!


Chain link fence with highway in backgroundDryad is a curated, non-profit, general-purpose repository specifically for data underlying scientific and medical publications — mainly journal articles. As such, we place great importance on linking data packages to the articles with which they are associated, and we try our best to encourage authors and journals to link back to the Dryad data from the article, ideally in the form of a reference in the works cited section. (There’s still a long way to go in this latter effort; see this study from 2016 for evidence).

Submission integration provides closer coordination between Dryad and journals throughout the publishing workflow, and simplifies the data submission process for authors. We’ve already implemented this free service with 120 journals. If you’re interested in integrating your journal, please contact us.

We’re excited to share a few recent updates that are helping to make our data-article linkages more efficient, discoverable, and re-usable by other publishers/systems.

The Automated Publication Updater

One of the greatest housekeeping challenges for our curation team lies in finding out when the articles associated with Dryad data packages become available online. Once they do, we want to add the article citation and DOI link to our record as quickly as possible, and to release any data embargoes placed “until the article appears.” Historically, we’ve achieved this through a laborious patchwork of web searches, journal alert emails, and notifications from authors or editors themselves.

But over the past year or so, we’ve built and refined a webapp that we call the APU (or Automated Publication Updater). This super-handy tool essentially compares data packages in the Dryad workflow with publication metadata available at Crossref. When a good match is found, it automatically updates article-related fields in the Dryad record, and then sends our curation team an email alert so they they can validate the match and finalize the record. The webapp can be easily run by curators as often as needed (usually a few times a week).

While the APU doesn’t find everything, it has dramatically improved both efficiency with which we add article information and links to Dryad records — and our curators’ happiness levels. Big win. (If you’re interested in the technical details, you can find them on our wiki).


Dryad is also pleased to be a contributor to Scholix, or Scholarly Link Exchange, an initiative of the Research Data Alliance (RDA) and the World Data System (WDS). Scholix is a high-level interoperability framework for exchanging information about the links between scholarly literature and data.

  • The problem: Many disconnected sources of scholarly output, with different practices including various persistent identifier (PID) systems, ways of referencing data, and timing of citing data.
  • The Scholix solutionA standard set of guidelines for exposing and consuming data-article links, using a system of hubs.

Here’s how it works:

  1. As a DataCite member repository, Dryad provides our data-publication links to DataCite, one of the Scholix Hubs. 
  2. Those links are made available via Scholix aggregators such as the DLI service
  3. Publishers can then query the DLI to find datasets related to their journal articles, and generate/display a link back to Dryad, driving web traffic to us, increasing data re-use, and facilitating research discovery.

Crossref publishers, DataCite repositories/data centers, and institutional repositories can all participate — information on how is available on the Scholix website.

Programmatic data access by ISSN

Did you know that content in Dryad is available via a variety of APIs (Application Program Interfaces)? Details are available at the “Data Access” page on our wiki.

The newest addition to this list is the ability to access Dryad data packages via journal ISSN. So, for example, if you wanted access to all Dryad content associated with the journal Evolution Letters, you would format your query as follows:


If you’re a human instead of a machine, you might prefer to visit our “journal page” for Evolution Letters:



Dryad is committed to values of openness, collaboration, standardization, seamless integration, reduction of duplication and effort, and increased visibility of research products (okay, data especially). The above examples are just some of the ways we’re working in this direction.

If you’re part of an organization who shares these values, please contact us to find out how you can be part of Dryad.

Today we celebrate our Board of Directors, and introduce three new members whose expertise and wide-ranging skills will help advance Dryad’s mission to provide free and easy access to data.

Dryad’s 12-member BOD supports and promotes our mission to make the data underlying scientific publications discoverable, freely reusable, and citable. The Board is comprised of diverse stakeholders, representing publishing, research, policy development, data networks, private funding, and scholarly organizations. BOD members are nominated by Dryad members and are elected or re-elected each year. They do not represent the organizations to which they belong; rather, they act as individuals in their involvement in the strategic planning and fiscal oversight of the company.

Who are the new members for 2017?

Adding to our esteemed Board of Directors this summer, we introduce our newest members:

Brian Hole (Class of 2020) will serve as treasurer of the Board. He is the CEO of Ubiquity Press, an open access publisher that focuses on alternative research outputs such as data, software, hardware, and bioresources. Previously, he managed the DryadUK project at the British Library, which focused on establishing a sustainable business model and publisher integrations, and also on building cost models for digital preservation. Brian brings a valued data-centric research background and detailed knowledge of open access publishing to Dryad this year.

 Fiona Murphy (Class of 2020) will serve as secretary of the Board. She is an independent research data and publishing consultant for institutions, societies, and commercial publishing companies and an Associate Fellow at the University of Reading. Fiona has written and presented widely on data publishing, open data, and open science. She has been involved in several research projects including PREPARDE, Data2Paper, and the Scholarly Commons Working Group. As an active member and sometime Co-Chair for several Research Data Alliance Groups focusing on data publishing policies, workflows, and accreditation systems, Fiona has organized several data-related events and sessions at scientific meetings.

Carly Strasser (Class of 2020) is a Program Officer at the Gordon and Betty Moore Foundation and is especially interested in open science and scholarly communication. She works in the Data-Driven Discovery Initiative, which is focused on promoting both the researchers and the practices required for high impact data-driven research. Previously, Carly was a Research Data Specialist at the California Digital Library where she was involved in development and implementation of many of the University of California Curation Center’s services, and worked to promote data sharing and good data management practices. Carly’s prior experience as a researcher in marine science and mathematical ecology has informed her work of ushering in the new era of open, transparent, and collaborative science.

We wish to thank our current and past members for bringing their expertise and passion to help advance Dryad’s mission and we look forward to their contributions and to another exciting year of open data.

In 2011 Peggy Schaeffer penned an entry for this blog titled “Why does Dryad use CC0?” While 2011 seems like a long time ago, especially in our rapidly evolving digital world, the information in that piece is still as valid and relevant now as it was then. In fact, Dryad curators routinely direct authors to that blog entry to help them understand and resolve licensing issues. Since dealing with licensing matters can be confusing, it seems about time to revisit this briefly from a practical perspective.

Dryad uses Creative Commons Zero (CC0) to promote the reuse of data underlying scholarly literature. CC0 provides consistent, clear, and open terms of reuse for all data in our repository by allowing researchers, authors, and others to waive all copyright and related rights for a work and place the work in the public domain. Users know they can reuse any data available in Dryad with minimal impediments; authors gain the potential for more citations without having to spend time responding to requests from those wishing to use their data. In other words, CC0 helps eliminate the headaches associated with copyright and licensing issues for all stakeholders, leading to more data reuse.

So what does this mean in practical terms? Dryad’s curators have come up with a few suggestions to keep in mind as you prepare your data for submission. These tips can help you manage the CC0 requirements and avoid any problems:


  • Make sure any software included with your submission can be released under CC0. For example, licenses such as GPL or MIT are common and are not compatible with CC0. Be sure there are no licensing statements displayed in the software itself or in associated readme files.
  • Be aware that there are software applications out there that automatically place any output produced by the software under a non-CC0 compatible license. Consider this when you are deciding which software to use to prepare your data.
  • Know the terms of use for any information you get from a website or database.
  • Ensure that any images, videos, or other media that are not your own work can be released under CC0.
  • Be sure to clean up your data before submitting it, especially if you are compressing it using a tool such as zip or tar. Remove anything that can’t be released under CC0, along with any other extraneous materials, such as user manuals for hardware or software tools. Not only does removing extraneous files lessen the chance something will conflict with Dryad’s CC0 policy, it also makes your data more streamlined and easier to use.


  • Don’t add text anywhere in your data submission requiring permission or attribution for reuse. Community norms do a great job of putting in place the expectation that anyone reusing your data will provide the proper citations. CC0 actually encourages citation by keeping the process as simple as possible.
  • Don’t include your entire manuscript or parts of your manuscript in your data package. Most publications have licensing that restricts reuse and is not compatible with CC0.

I hope this post leaves you with a little more understanding about why Dryad uses CC0 and with a few tips that will help make following Dryad’s CC0 requirement easier.


Keeping research data open and accessible has always been our goal at Dryad. Now, we’ve partnered with Data Archiving and Networked Services (DANS) to ensure long-term preservation of curated data. We are proud to be taking this step to safeguard open data and ensure future discoverability.

Public content on Dryad servers, currently over 15,000 data packages and 50,000 files, will soon be backed up in the DANS archive regularly (with multiple copies in different locations), to add an extra layer of protection.

DANS will also serve as Dryad’s successor archive, to ensure that functionality of Dryad Digital Object Identifiers (DOIs) is maintained for the long term. Metadata will be available in open access format to all researchers using the DANS online archiving system, EASY.

This partnership ensures that data in Dryad will remain accessible and linked to the scholarly literature in the unlikely case of disruption of Dryad services. DANS has proven to be a natural fit for us in this effort. Dryad and DANS share a deep commitment to the stewardship of global scientific data on behalf of more than 50,000 researchers who trust us with their data and hundreds of publishing partners working with Dryad.

Henk Harmsen, Deputy director of DANS, says:

Together with Dryad we are committed to making digital research data and related outputs Findable, Accessible, Interoperable, and Reusable (FAIR). This collaboration minimizes the risk of loss or corruption of data over time. We are pleased to extend our capacity and data archive by partnering with Dryad.

We present a guest post from researcher Falk Lüsebrink highlighting the benefits of data sharing. Falk is currently working on his PhD in the Department of Biomedical Magnetic Resonance at the Otto-von-Guericke University in Magdeburg, Germany. Here, he talks about his experience of sharing early MRI data and the unexpected impact that it is having on the research community.

Early release of data

The first time I faced a decision about publishing my own data was while writing a grant proposal. One of our proposed objectives was to acquire ultrahigh resolution brain images in vivo, making use of an innovative development: a combination of an MR scanner with ultrahigh field strength and a motion correction setup to remediate subject motion during data acquisition. While waiting for the funding decision, I simply could not resist acquiring a first dataset. We scanned a highly experienced subject for several hours, allowing us to acquire in vivo images of the brain with a resolution far beyond anything achieved thus far.

 MRI data showing the cerebellum in vivo

MRI data showing the cerebellum in vivo at (a) neuroscientific standard resolution of 1 mm, (b) our highest achieved resolution of 250 µm, and (c) state-of-the-art 500 µm resolution.

When our colleagues saw the initial results, they encouraged us to share the data as soon as possible. Through Scientific Data and Dryad, we were able to do just that. The combination of a peer-reviewed open access journal and an open access digital repository for the data was perfect for presenting our initial results.

17,000 downloads and more

‘Sharing the wealth’ seems to have been the right decision; in the three months since we published our data, there has been an enormous amount of activity:

A distinct need for data re-use

MRI studies are highly interdisciplinary, opening up numerous opportunities for sharing and re-using data. For example, our data might be used to build MR brain atlases and illustrate brain structures in much greater detail, or even for the first time. This could advance our understanding of brain functions. Algorithms used to quantify brain structures needed in the research of neurodegenerative disorders could be enhanced, increasing accuracy and reproducibility. Furthermore, by making available raw signals measured by the MR scanner, image reconstruction methods could be used to refine image quality or reduce the time it takes to collect the data.

There are also opportunities beyond those that our particular dataset offers. A recent emerging trend in MRI comes from the field of machine learning. Neuronal networks are being built to perform and potentially improve all kinds of tasks, from image reconstruction, to image processing, and even diagnostics. To train such networks, huge amounts of data are necessary; these data could come from repositories open to the public. Such re-use of MRI data by researchers in other disciplines is having a strong impact on the advancement of science. By publicly sharing our data, we are allowing others to pursue new and exciting directions.

Download the data for yourself and see what you can do with it. In the meantime, I am still eagerly awaiting the acceptance of the grant application . . . but that’s a different story.

The data: http://dx.doi.org/10.5061/dryad.38s74

The article: http://dx.doi.org/10.1038/sdata.2017.32

— Falk Lüsebrink