For authors: Keep your data current with Dryad’s data versioning feature

Guest post by Dryad curator Molly Hirst

Have you ever published a manuscript or dataset and later realized something should have been changed or implemented better? We have all been there. While we can’t help you edit your manuscript, the ability to make changes to your published datasets on Dryad is easy and encouraged.

Each time you change your published dataset, a new version is created. Dataset versions allow users of the data to see the history of changes made to the dataset, strengthening research integrity and data management practices.

When to version a dataset

After your dataset has been curated and published on Dryad, you can make changes to the published dataset. Changes that result in a new dataset version might include but are not limited to: 

  • corrections to your data, 
  • the inclusion of additional data, 
  • or adding output from new analyses. 

Minor changes to the metadata, such as adding the DOI of an associated manuscript, will not create a new version of the dataset. Whether you’re updating your dataset or the metadata, the Dryad DOI will remain the same.

Maintaining versions of datasets and properly documenting the historical differences between versions ensures the integrity of data hosted on Dryad. As such, it’s important for authors to also update their README to detail any changes they made while creating a new version of their dataset. Updating your README to include this information is vital for transparency and reproducibility and will ensure that your data can be understood by a diverse audience well into the future.

Why versioning is important

Versioning is crucial for proper data management. First, it ensures reproducibility and transparency in research by documenting all changes to the data, thus maintaining trust in the scientific process. This comprehensive documentation allows other researchers to accurately understand and replicate studies. Secondly, compliance with funders and publishers is greatly facilitated by versioning. Many funding bodies and publishers mandate that data be regularly updated and made accessible, and versioning helps meet these requirements by enabling efficient data sharing and preservation. Lastly, versioning enhances the discoverability and citability of datasets. By assigning a unique, permanent DOI to each dataset, regardless of its version, both current and previous versions remain accessible and are linked to a single DOI. This ensures that data remains discoverable and usable over time, making it easier for researchers to locate and cite the datasets they need.

How to version a dataset in Dryad

To create a new version of your dataset at Dryad, follow these simple steps:

  1. Login to your Dryad dashboard. 
  2. On the “My datasets” page, find the dataset you need to version and click “create new version”. 
  3. Make your changes, submit, and voilà! 

Our curation team will begin evaluating the versioned dataset as soon as possible, and pending any questions or requests from our team, the new version will be made publicly available for download through the “Download dataset” button. Previous versions can still be accessed in the “Data files” section, which is sorted by publication date. To ensure that users can easily identify differences between versions, it’s essential that the README file is updated to detail the modifications made since the previously published version.

Best practices for data versioning

In addition to following our best data practices, here are some important notes to keep in mind while versioning your dataset: 

1. Document the Changes

It is vital to keep detailed logs of changes made between versions, to allow users to choose the version that best suits their needs, and allow future users of your data to quickly understand what has changed. This is especially important if there are errors–such as miscalculations–in previous versions.

We recommend maintaining clear and descriptive change logs in your README file, with the date listed for each version change. Here’s an excellent example of a Dryad dataset with a version change log in the README: https://doi.org/10.5061/dryad.2z34tmptf

2. Use Standard Formats

If you’re adding new data files, it’s important to ensure that the file formats are standard, common, and non-proprietary (e.g., CSV, tiff, etc.), and can be opened with free, open-source software. This practice helps ensure long-term accessibility and interoperability of the data hosted on Dryad.

3. Community Engagement

By encouraging feedback from the research community, you can help to improve your data quality and relevance. For example, you can ask a colleague or peer who may not be familiar with your data to take a look at your published dataset, and see if it makes sense to them!

Conclusion

Researchers are continually refining, updating, and building upon their work – this is the goal of research at its core. Versioning datasets, and including an adequate change log, is a way to capture this continual learning process and allow for better collaboration throughout the research community. Taking the time to update your data and its associated metadata enhances the usability, reliability, and impact of your research. Dryad’s dataset versioning feature is easy to use and enables you to adopt best practices for reproducibility and transparency in your research. 

Have questions? 

We’re here to help! Click here to get in touch with our Helpdesk.

About the author
Molly Hirst holds a Ph.D. in Ecology and Evolutionary Biology from the University of Michigan, where she studied genomics and sperm biology in hybridizing Platyrrhine primates. She gained extensive experience and a passion for curation as a graduate student curatorial assistant for nearly all UM Museum of Zoology and Herbarium divisions. Molly can be found cuddling her cats, reading, traveling the world as a naturalist, gardening, ice skating, and spoiling her nephew.

Funding
This work was, in part, funded by the U.S. National Institutes of Health, Office of Data Science Strategy and the Generalist Repository Ecosystem Initiative (GREI) OTA-21-00 [3OT2DB000005-01S3]. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH.

Feedback and questions are always welcome, to hello@datadryad.org

To keep in touch with the latest updates from Dryad, follow us on LinkedIn, Mastodon, and Bluesky and subscribe to our quarterly newsletter.