What’s in a README? Why your README matters, and how to create the best one possible

README files are a simple but crucial component of data sharing. Without a README, the benefits of open data — improved understanding, reproducibility, trust, and attention — become moot. That’s because data cannot be interpreted, validated, or reused without certain key details that a README provides.

Data without context

Have you ever downloaded a dataset and wondered ‘what am I looking at here?’ What time period and locality does this data cover? What do the abbreviations stand for? Is there a difference between the .xlsx and the .rmd files, given that they have the exact same file name? And what R package — packages? — am I going to need to install to actually analyze this file?

You’re not alone. That’s why README files are so important. A README file is simply a text document that accompanies your dataset. It functions as a kind of quick start guide to your data, a short compendium of the essential information another researcher might need to understand, reanalyze, or repurpose your dataset.

Without that context, prospective readers could spend hours combing through the related manuscript in an attempt to decode your data files, and using Google to try and determine what your acronyms might possibly refer to. Or, more likely, they’ll move on to another dataset; one that’s presented more clearly.

What’s in a README?

A good README should give researchers both in and outside of your discipline enough context to interpret, validate, or reuse your dataset.

Usually, that means including:

A short summary of the related research investigation
An overview of files and folder structure
A key to file and column labels, variables, data codes, and units of measurement
Details of processing and analysis, such as software
Links to other data sources, if applicable

Of course, don’t hesitate to include anything else a reader might want or need to know in order to effectively work with your data.

Preparing your README

If you work with a large team you may already have some of the required information documented — maybe in an email or a lab notebook.

If not, don’t worry. Preparing your README is simply a matter of describing the dataset underlying your investigation. Clear your mind, and do your best to approach the project with fresh eyes, providing the necessary information for someone coming to the work with no prior knowledge.

Sometimes when you’re deep in an investigation, it can be hard to imagine what elements might be confusing to someone else. If you’re at a loss as to what to include, try imagining yourself returning to the work after a five-year hiatus. What information would you need to get back up to speed? That’s what belongs in your README.

Another good option is asking for a review from the newest member of your lab. Being newer to the work, they may have insight into what aspects are intuitive and which require further explanation. Even if your newest colleague has been around a little while, they may still remember initial questions and points of confusion.

When in doubt, err on the side of detail. An insufficiently detailed README is the most common reason Dryad curators request revisions. Reader our tips for creating a README that checks all the boxes so you can publish your data without delay.

A well-prepared README is one of the simplest, most effective ways to ensure your data lives up to its full potential, not just for availability, but for understanding and usability too. A thoughtful approach will not only save others time and frustration, but maximize the impact of your study.

Looking for inspiration? Explore these examples README files from published Dryad datasets:

For code: consult the README of Evolution of intraspecific floral variation in a generalist-specialist pollination system, by Marion Leménager.
What makes it great: Features notably detailed descriptions of code files and how to open and use them.
For .csv: consult the README of Effects of ecology on the sociality of coral dwelling gobies, genus Gobiodon by Catheline Froehlich et al.
What makes it great: Uses MD cell formatting to present variables.
For .xlsx: consult the README of Data from: High foraging fidelity and plant-pollinator network dominance of non-native honeybees (Apis mellifera) in the Ecuadorian Andes by Erin Rankin et al
What makes it great: Well-organized and consistent, formatted in a way that’s easy to understand and review.
For .mat: consult the README of Data from: Interactions between circuit architecture and plasticity in a closed-loop cerebellar system by Hannah Payne et al
What makes it great: Logical file naming convention, clear explanation of variables and units.
For large datasets: consult the README of Data from: A synthetic biology and green bioprocess approach to recreate agarwood sesquiterpenoid mixtures by Sergio Gutiérrez et al
What makes it great: Well-organized and clearly formatted explanation of complex folders and multiple formats.

Feedback and questions are always welcome, to hello@datadryad.org.

To keep in touch with the latest updates from Dryad, follow us on LinkedIn, Mastodon, and Bluesky and subscribe to our quarterly newsletter.

Dryad news

The latest from the open data publishing platform & community committed to the open availability and routine re-use of all research data

Data without context

What’s in a README?

Preparing your README

Share this:

Related

Discover more from Dryad news