For authors: Creating a README for rapid data publication

The top reason for publication delays: Insufficient README files

The most common reason we can’t publish a Dryad data submission straight away is because the README file contains insufficient information (e.g., does not describe all files, does not define abbreviations). When this happens, Dryad curators must request a revised version from the submitting author, introducing a delay in publication. The purpose of a README file is to clearly explain all data included within your submission. A highly detailed, well-organized README file is important because it helps ensure that anyone who is interested in reusing your data, as well as the Dryad curators who evaluate your submission, can easily understand it. It can also help to remind you of your own dataset organization in the future. Ultimately, this helps save time for everyone.

Here, we provide you with the information and resources you need to get your README right the first time so that you can publish on time.

Writing README files for broad understanding and reusability

While the most likely users of your data are subject or methods specialists in your discipline, keep in mind that your data may be accessed and reused by researchers outside of your subject for purposes other than reproducing the results of your study. Future users of your data might be new to the field and unfamiliar with discipline-specific common terminology and metrics or your logic for file organization. In keeping with recommended practices for open and inclusive data, approach your README documentation with a broader audience in mind, including students, teachers, and data scientists.

Scientific discovery can be achieved faster and more easily when researchers can focus their time on doing the science—not puzzling over the meaning of empty cells or lab-specific shorthand terminology. Also, because datasets on Dryad are published as independent publications, it’s important that the data can be easily understood without relying on links or references to another source or published work (e.g. “see manuscript”), especially since associated outputs may not provide those specificities anyway.

In order for our team of experienced curators to easily, efficiently, and comprehensively evaluate data submissions, a detailed README file that clearly describes the full contents and nature of the submission is essential (and required for submission). Our curators carefully access submissions to:

1) verify that data are accessible, organized, and comprehensively explained to ensure ease of understanding and readiness for re-use

2) confirm compliance with ethical standards for publication — human subjects data must be properly anonymized and species data must be carefully compiled to prevent any risk or threat to vulnerable populations.

3) ensure data follow FAIR principles

Curators also help authors navigate data publication requirements and raise questions when there are concerns about reuse, questions about where/how the data was collected, and interoperability of the data submitted for publication in Dryad.

Tips for crafting outstanding README files

Here are some important tips to keep in mind when creating a README file for your data.

Do

Use Markdown (.md) format
Use a template as a guide (Dryad template, Cornell template)
Keep it concise, yet informative
Use headings, line breaks, tables, and bullet points for readability; avoid long paragraphs
Include clear instructions on accessing and using the data
Define hierarchy of nested contents (e.g,. MATLAB arrays, Excel sheets, Prism dataframes, RData objects/lists)
Explain special formatting (e.g., highlighting in Excel), scoring keys (categorical data in general)
Define the variable list, including full names and definitions (spell out abbreviated words) of column headings for tabular data, units of measurement, explain abbreviations (in general) and any empty cells
Describe any scripts, code, notebooks and the software used to run them (e.g., R, Python, Mathematica, MatLab) as well as the software versions, including packages, that you used to run those files
Provide links to publications that cite or use the data, other publicly accessible locations of the data and/or the related research article
List other sources, if any, that the data was derived from

Do not

Assume that variables, abbreviations/shorthand, acronyms, units, scoring keys, etc. are always used in the same way or universally understood
Include author names or other identifying information (initials, email addresses, ORCIDs) if the journal follows a double-blind review process
Include the Abstract or Methods sections of your manuscript as a substitute for explaining your data
Include statements that are phrased in a way to suggest any legal imperative for attribution (e.g., “required,” “must”) or other conditions for reuse; instead encourage potential users to contact you or cite the data for additional information or potential for collaboration

Resources for preparing README files

We recommend using Dryad’s README file template, which can help you understand exactly what information you need to provide.

There are lots of great examples README files on the Dryad platform. Here are a few:

Induction of Sis1 promotes fitness but not feedback in the heat shock response
https://doi.org/10.5061/dryad.b2rbnzsm6
Tropical plant–hummingbird interactions withstand short-term experimental removal of a common flowering plant
https://doi.org/10.5061/dryad.jwstqjqbh (tabular data)
New indicators of ecological resilience and invasion resistance to support prioritization and management in the sagebrush biome, United States
https://doi.org/10.5061/dryad.h18931zpb (mixed data and code)
Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal
https://doi.org/10.5061/dryad.dz08kps2x (nested directories)

Watch our training session with Dryad Senior Curator Bryan Gee, which covers all of these tips and more.

Our curators are here to help. Contact help@datadryad.org with any questions, and review our full guidance on good data practices. Curious about other ways to avoid publication delays when sharing data with Dryad? Check out our top 5 reasons we can’t publish your data (yet).

Funding
This work was, in part, funded by the U.S. National Institutes of Health, Office of Data Science Strategy and the Generalist Repository Ecosystem Initiative (GREI) OTA-21-00 [3OT2DB000005-01S3]. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH.

Feedback and questions are always welcome, to hello@datadryad.org.

To keep in touch with the latest updates from Dryad, follow us on LinkedIn, Mastodon, and Bluesky and subscribe to our quarterly newsletter.

Dryad news

The latest from the open data publishing platform & community committed to the open availability and routine re-use of all research data

The top reason for publication delays: Insufficient README files

Writing README files for broad understanding and reusability

Tips for crafting outstanding README files

Do

Do not

Resources for preparing README files

Share this:

Related

Discover more from Dryad news