U.S. Policy: Dryad’s role in the NIH’s new Policy for Data Management and Sharing

Building on its longstanding commitment to data sharing, the National Institutes of Health (NIH) has introduced an updated Policy for Data Management and Sharing, with the goal of expediting “the translation of research results into knowledge, products, and procedures to improve human health” and enhancing public trust in publicly funded research. As the world’s largest public funder of biomedical research, NIH policies have an outsize impact on research practices.

Following the policy, from January 25, 2023, all projects seeking NIH-funding will be required to submit a data management and sharing plan at the application stage and to follow it over the course of the project lifecycle. The policy sets a clear expectation for researchers “to maximize the appropriate sharing of scientific data.” Let’s unpack a few features of “appropriate” data sharing according to the policy, and take a look at how Dryad helps researchers readily comply with the requirements. 

  1. Use of established repositories. The policy “strongly encourages the use of established repositories,” for data sharing. Data hosted on personal or institutional devices or cloud storage accounts is vulnerable to loss or alteration, and may lack the metadata and contextual material needed for its interpretation and reuse.

    Dryad has over a decade of experience providing high-quality data publishing, curation, and preservation. All data published with Dryad is open by default, preserved in the CoreTrustSeal-certified Merritt repository, and curated by our expert team to ensure it has appropriate metadata and context for discovery and reuse. As an independent and multidisciplinary repository, Dryad welcomes all researchers, regardless of institutional affiliation, research area, or funding source.

  2. Timely publication. The policy asks researchers to make data accessible “as soon as possible, and no later than the time of an associated publication, or the end of performance period” and for the length of time they anticipate it being useful for the larger research community and/or the broader public.

    Dryad’s team of expert curators works to minimize the delay from submission to publication and can work with authors to time data to go live at the same time as an associated publication. All data published with Dryad are retained indefinitely, mirrored in multiple locations, and routinely curated to ensure bit-level integrity over time. 

  3. Data quality assurance. Data sharing is more than a box to check–it’s a practice intended to enable and promote reuse by other researchers. The policy therefore asserts that, to be compliant, “data should be of sufficient quality to validate and replicate research findings.”

    Dryad’s hands-on curation process ensures that data published with us meet this standard. Our team of expert curators reviews each submission with an eye towards metadata quality, usability of files and code, and identification of any sensitive data. Where needed, they correspond with authors to resolve issues and enhance metadata quality. Dryad provides DOIs; permits dataset versioning; and links the dataset with associated research outputs and any software or code needed for replication. Each of these features contributes to a high-quality data publication that is meant to be shared, reused, and built upon.

Dryad’s focus is always on how to make high-quality data publishing as easy as possible for the researcher. By building best practices into our infrastructure and workflows, we ensure that researchers, and the institutions that support them, can trust us to steward their data in compliance with current and future funder policies. 

Dryad is grateful to have had the opportunity to offer feedback into the design of this policy and to be helping other generalist repositories to establish common approaches to support NIH-funded researchers as part of the NIH Generalist Repository Ecosystem Initiative (GREI). Learn more.

A new tool for fighting the file drawer effect

Until recently, Mark Hahnel was a PhD student in stem cell biology. Frustrated by seeing how much of his own research output didn’t make it to publications, he endeavored to do something about it by developing a scientific file sharing platform called FigShare. Recently, Mark and FigShare were taken under the wing of Digital Science, a Nature Publishing Group spinoff, and a sleek new FigShare was relaunched in January 2012 with many more features and an ambitious scope.

FigShare allows researchers to publish all of their research outputs in seconds in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. By opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient.

Users do not have to pay for access to the content: public data is made available under the terms of a CC0 waiver and other content under CC-BY.  And FigShare is currently providing unlimited public space and 1GB of private storage space for free.

This is a promising solution for getting negative and otherwise unpublished results out into the world (figures, tables, data, etc.) in a way that is discoverable and citable.  Importantly, much of this content would not be appropriate for Dryad, since it is not associated with (and not documented by) an authoritative publication.

There are clearly some challenges to the FigShare model.  A big one, shared with many other Open Science experiments that disseminate prior to peer review, is ensuring that there is adequate documentation for users to assess fitness for reuse.  Another challenge that Dryad is greatly concerned about is guaranteeing that the content will still be usable, and there will be the means to host it, ten or twenty years down the road.  These are reflections of larger unanswered questions about how the research community can best take advantage of the web for scholarly communication, and how to optimize filtering, curating or preserving such communications. To answer these questions, the world of open data needs many more more innovative projects like FigShare.

Considering FigShare’s relaunch suggests a few strengths of the Dryad model:

  • Dryad works with journals to integrate article and data submission, streamlining the deposit process.
  • Dryad curators review files for technical problems before they are released, and ensure that their metadata enables optimal retrieval.
  • Dryad’s scope is focused on data files associated with published articles in the biosciences (plus software scripts and other files important to the article.)
  • Dryad can make data securely available during peer review, at the request of the journal.
  • Dryad is community-led, with priorities and policies shaped by the members of the Dryad Consortium, including scientific societies, publishers, and other stakeholder organizations.
  • Dryad can be accessed programmatically through a sitemap or OAI-PMH interface.
  • Dryad content is searchable and replicated through the DataONE network, and it handshakes with other repositories to coordinate data submission.

For more about Dryad, browse the repository or see Why Should I Choose Dryad for My Data?

A file sharing platform and a data repository are different animals, to be sure; both have a place in a lively open data ecosystem. We wish success to the Digital Science team, and look forward to both working together, and challenging each other, to better meet the needs of the research community.  To see what other options are out there for different disciplines and types of data, DataCite provides an updated list of list of research data repositories.

Proteomics journals mandate data sharing

Several journals in the field of proteomics have decided to mandate data sharing at the time of publication. These journals are leading the way toward data sharing out of a conviction that “the provenance of data sets and their proper citation is central to the research process,” as described in a recent commentary in Bio-IT World Share the Data: Making Large-Scale Proteomics Data Widely Available.

Mass Spectrometer, photo from U.S. Department of Energy Genome Programs

Now “authors who publish a manuscript containing mass spectrometry data in Molecular and Cellular Proteomics (MCP) must submit the raw data to a publicly accessible site.”   The journal Proteomics also requires data deposit in a public archive.

There are several specialized data repositories in the field, and several are working together as ProteomExchange “to provide a single point of submission to proteomics repositories, and encourage the data exchange and sharing of identifiers between the repositories so that the community may easily find datasets in the participating repositories.”

Nature journals include Dryad among recommended repositories

Nature journals now list Dryad among their suggested data repositories. Citing “an inherent principle of publication is that others should be able to replicate and build upon the authors’ published claims,” the editorial policies mandate data sharing and archiving.

The policy on data sets reads:

A condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to others without preconditions.

Data sets must be made freely available to readers from the date of publication, and must be provided to editors and peer-reviewers at submission, for the purposes of evaluating the manuscript.

For the following types of data set, submission to a community-endorsed, public repository is mandatory. Accession numbers must be provided in the paper. Examples of appropriate public repositories are listed below.

Logo links journal article to data in repository

PANGAEA (Publishing Network for Geoscientific & Environmental Data) is a repository for geoscience data with many features similar to Dryad, including use of DOIs for data files.  A recent press release reports that Elsevier and PANGAEA have implemented reciprocal linking between data in the repository and journal articles.   Research data sets deposited at PANGAEA are now automatically linked to the corresponding articles in Elsevier journals on its electronic platform ScienceDirect and vice versa.   The data are freely available from the publication’s page in ScienceDirect, without a login or subscription.

Try it out:

  1. From this PANGAEA record, follow the DOI to the article in ScienceDirect (citations and abstracts only, unless you or your institution have subscription access)
  2. The PANGAEA link is to the right of the article with Supplementary Data beside it

This valuable two-way connectivity between data and article is most easily achieved when the data are captured at the time of article submission.  See this previous post for more on Dryad’s approach to this problem, which is designed to work across multiple publishers.

Similar to the appearance of the PANGAEA logo in the online version of the article, we are toying with the idea of calling attention to the link in the opposite direction by placing  journal cover images next to article DOIs in the Dryad display.  We’d like to hear your thoughts on that.  Is it helpful signage?  Or distracting eye candy?