For authors: Creating a README for rapid data publication

The top reason for publication delays: Insufficient README files

The most common reason we can’t publish a Dryad data submission straight away is because the README file contains insufficient information (e.g., does not describe all files, does not define abbreviations). When this happens, Dryad curators must request a revised version from the submitting author, introducing a delay in publication. The purpose of a README file is to clearly explain all data included within your submission. A highly detailed, well-organized README file is important because it helps ensure that anyone who is interested in reusing your data, as well as the Dryad curators who evaluate your submission, can easily understand it. It can also help to remind you of your own dataset organization in the future. Ultimately, this helps save time for everyone. 

Here, we provide you with the information and resources you need to get your README right the first time so that you can publish on time. 

Writing README files for broad understanding and reusability

While the most likely users of your data are subject or methods specialists in your discipline, keep in mind that your data may be accessed and reused by researchers outside of your subject for purposes other than reproducing the results of your study. Future users of your data might be new to the field and unfamiliar with discipline-specific common terminology and metrics or your logic for file organization. In keeping with recommended practices for open and inclusive data, approach your README documentation with a broader audience in mind, including students, teachers, and data scientists. 

Scientific discovery can be achieved faster and more easily when researchers can focus their time on doing the science—not puzzling over the meaning of empty cells or lab-specific shorthand terminology. Also, because datasets on Dryad are published as independent publications, it’s important that the data can be easily understood without relying on links or references to another source or published work (e.g. “see manuscript”), especially since associated outputs may not provide those specificities anyway.

In order for our team of experienced curators to easily, efficiently, and comprehensively evaluate data submissions, a detailed README file that clearly describes the full contents and nature of the submission is essential (and required for submission). Our curators carefully access submissions to: 

1) verify that data are accessible, organized, and comprehensively explained to ensure ease of understanding and readiness for re-use

2) confirm compliance with ethical standards for publication — human subjects data must be properly anonymized and species data must be carefully compiled to prevent any risk or threat to vulnerable populations. 

3) ensure data follow FAIR principles

Curators also help authors navigate data publication requirements and raise questions when there are concerns about reuse, questions about where/how the data was collected, and interoperability of the data submitted for publication in Dryad.

Tips for crafting outstanding README files

Here are some important tips to keep in mind when creating a README file for your data.

Do

  • Use Markdown (.md) format
  • Use a template as a guide (Dryad template, Cornell template)
  • Keep it concise, yet informative
  • Use headings, line breaks, tables, and bullet points for readability; avoid long paragraphs
  • Include clear instructions on accessing and using the data
  • Define hierarchy of nested contents (e.g,. MATLAB arrays, Excel sheets, Prism dataframes, RData objects/lists)
  • Explain special formatting (e.g., highlighting in Excel), scoring keys (categorical data in general)
  • Define the variable list, including full names and definitions (spell out abbreviated words) of column headings for tabular data, units of measurement, explain abbreviations (in general) and any empty cells
  • Describe any scripts, code, notebooks and the software used to run them (e.g., R, Python, Mathematica, MatLab) as well as the software versions, including packages, that you used to run those files
  • Provide links to publications that cite or use the data, other publicly accessible locations of the data and/or the related research article
  • List other sources, if any, that the data was derived from

Do not

  • Assume that variables, abbreviations/shorthand, acronyms, units, scoring keys, etc. are always used in the same way or universally understood
  • Include author names or other identifying information (initials, email addresses, ORCIDs) if the journal follows a double-blind review process
  • Include the Abstract or Methods sections of your manuscript as a substitute for explaining your data
  • Include statements that are phrased in a way to suggest any legal imperative for attribution (e.g., “required,” “must”) or other conditions for reuse; instead encourage potential users to contact you or cite the data for additional information or potential for collaboration

Resources for preparing README files

We recommend using Dryad’s README file template, which can help you understand exactly what information you need to provide.

There are lots of great examples README files on the Dryad platform. Here are a few:

Watch our training session with Dryad Senior Curator Bryan Gee, which covers all of these tips and more.

Our curators are here to help. Contact help@datadryad.org with any questions, and review our full guidance on good data practices. Curious about other ways to avoid publication delays when sharing data with Dryad? Check out our top 5 reasons we can’t publish your data (yet).

Feedback and questions are always welcome, to hello@datadryad.org

To keep in touch with the latest updates from Dryad, follow us on TwitterFacebook or LinkedIn and subscribe to our quarterly newsletter.