Guest post by Dryad Senior Data Curator Molly Hirst
A 2017 study revealed that over one-third of researchers surveyed admitted to some level of research misconduct, such as falsifying or omitting data. These alarming findings underscore the systemic challenges researchers face, including intense pressure to publish frequently, a lack of affordable infrastructure for preserving and sharing data, limited support for best practices, and the absence of standardized guidelines. This environment, often referred to as the “publish or perish” culture, can unintentionally foster misconduct and compromise research integrity.
Upholding scientific integrity is more crucial than ever, especially in an era of rapid technological advancement. The credibility, reproducibility, and reliability of research hinge on data availability and transparency. By adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) principles, researchers can address these challenges head-on. Dryad offers a solution for those committed to these principles, providing an open-access platform designed to enhance transparency and foster trust in research.
Data availability and reusability
Open access to data not only enables verification but also fuels innovation and collaboration. At Dryad, our experienced team of data curators works with researchers to ensure their data and metadata meet rigorous standards and align with FAIR principles. By publishing datasets under a CC0 license waiver, Dryad guarantees unrestricted access to data for reuse by the global research community.
One powerful example of the value of Dryad’s open-access approach comes from the field of genomics. An initial dataset containing genomic SNP data of domesticated pigs from around the world was published via Dryad. Later, another research team reused the data to uncover patterns of deliberate introduction of invasive feral pigs across the United States and Canada. This collaboration highlights not only the scientific impact of open data but also the trust and interconnectivity fostered by Dryad’s platform.
The original dataset: https://doi.org/10.5061/dryad.30tk6
The new dataset reusing the original: https://doi.org/10.5061/dryad.b2rbnzsq9
Transparency in research data
Transparency is a cornerstone of scientific integrity. By making it easier to verify and replicate studies, transparency not only prevents misconduct but also builds trust within the research community and with the public. Researchers can foster this trust by prioritizing key elements of transparency in their data practices:
Clearly documented methods and README
A transparent research process begins with a thorough description of the methods used to collect, analyze, and interpret data. Precise documentation ensures that others can replicate the study or build upon its findings. Additionally, defining each aspect of the data files—such as the file names, organization, and the units of measurement for each variable—makes it easier for users to understand, analyze, and reuse the data efficiently.
Comprehensive data reporting
Transparency requires sharing data when possible and providing a complete and accurate account of all elements within those data. Researchers should ensure that all data points are reported on and that any processes involving data manipulation or cleaning are meticulously documented. Dryad facilitates this by encouraging researchers to include detailed README files, which clarify the context and handling of datasets.
Disclosure of potential conflicts of interest
Researchers should fully disclose any conflicts of interest, including funding sources, affiliations, or other factors that might influence their data or findings. This openness is critical for maintaining credibility and objectivity.
Balancing human and AI-generated data
As artificial intelligence becomes increasingly integrated into research, the principles of transparency remain vital. The rise of AI-generated data introduces new challenges, such as ensuring the clarity of data origin, the documentation of algorithms, and ethical considerations around data use.
A recent Nature article highlights the enduring importance of transparency: “The principles and guidelines regulating the rise of AI will need to remain grounded in the basics of good science – how data are collected, treated, and used” (Hanson et al., 2023). Researchers can navigate these challenges by adhering to robust data practices and leveraging platforms like Dryad, which support the FAIR principles for both human- and AI-generated datasets.
By committing to transparency, researchers not only uphold the integrity of their work but also contribute to a culture of openness and reproducibility, ensuring that science continues to be a reliable and trusted source of knowledge.
Implementation strategies for transparent research data
- Pre-register studies to specify hypotheses, data collection methods, and analysis plans in advance. This helps to prevent data dredging or p-hacking by committing to a predefined research plan.
- Adopt transparent peer review processes to ensure accountability. For example, Dryad’s “private for peer review” (PPR) feature enables the sharing of data alongside manuscripts under peer review in scientific journals. This fosters an open dialogue between authors, reviewers, editors, and readers, ensuring that the data is scrutinized thoroughly and transparently.
By focusing on these key elements and implementation strategies, researchers can significantly enhance the transparency of their research data through Dryad, contributing to the overall integrity and reliability of scientific research.
FAIR principles at Dryad
The FAIR principles (Findable, Accessible, Interoperable, Reusable) serve as a framework for managing and sharing research data effectively. At Dryad, we prioritize these principles to ensure that datasets are not only open-access but also optimized for discovery, integration, and reuse.
Findability: Data should be easily located by both humans and machines. This requires robust metadata, descriptive titles, and carefully chosen keywords. Well-annotated datasets increase visibility and help researchers quickly find relevant data for their work. Authors can connect Dryad datasets with related research objects like published articles, data management sharing plans, software code, or supplementary information, allowing users to easily access all components of a research project through a single, citable link, usually via a Digital Object Identifier (DOI). At Dryad, we provide DOIs that are widely indexed in Google Scholar, Google Dataset Search, Scopus, Web of Science, and more, in addition to being fully discoverable on the web.
Accessibility: Clear access procedures and comprehensive metadata are essential for making data reusable. Dryad encourages researchers to provide detailed README files and methods, document software versions, and use open-source tools to ensure datasets remain accessible long-term. All Dryad datasets are downloadable via datadryad.org and our open API.
Interoperability: Standardized, open-access formats and controlled vocabularies are critical for data integration across systems. Dryad supports the inclusion of data dictionaries and standardized metadata via our submission platform, enabling seamless collaboration and reuse of datasets in diverse research contexts. Dryad adheres to community-accepted standard metadata (DataCite metadata schema) and the use of persistent identifiers.
Reusability: Dryad ensures that datasets are published under a CC0 license waiver, maximizing their potential for reuse without restrictions. Rich metadata helps future researchers understand the data, fostering reproducibility and facilitating discoveries. The README file is one of the most important files in each dataset for Dryad curators and future data users. Our team of experienced data curators meticulously checks each README file to verify that data are accessible, organized, and comprehensively explained to ensure ease of understanding and readiness for re-use.
Conclusion
Research integrity is the foundation of credible and impactful science. Upholding integrity requires a commitment to transparency, ethical practices, and the adoption of robust frameworks like the FAIR principles. By adhering to FAIR principles, Dryad empowers researchers to contribute data that is not only open-access but also primed for meaningful impact in the scientific community. By leveraging platforms like Dryad for data sharing, researchers can ensure their data is accessible, reusable, and trustworthy, fostering a culture of openness that drives innovation and strengthens the scientific community.
References
- Thiese MS, Walker S, Lindsey J. (2017) Truths, lies, and statistics. J Thorac Dis. 9(10): 4117-4124. DOI: https://doi.org/10.21037/jtd.2017.09.24
- Yang, Bin et al. (2018). Data from: Genome-wide SNP data unveils the globalization of domesticated pigs [Dataset]. Dryad. https://doi.org/10.5061/dryad.30tk6
- Giglio, Rachaewql (2024). Feral Swine Genotypes and Metadata Used for Identifying Translocations in the United States [Dataset]. Dryad. https://doi.org/10.5061/dryad.b2rbnzsq9
- GO FAIR. “FAIR Principles.” GO FAIR, https://www.go-fair.org/fair-principles/
- Hanson B, et al. (2023). Garbage in, garbage out: mitigating risks and maximizing benefits of AI in research. Nature. 623: 28-31. DOI: https://doi.org/10.1038/d41586-023-03316-8
About the author
Molly Hirst holds a Ph.D. in Ecology and Evolutionary Biology from the University of Michigan, where she studied genomics and sperm biology in hybridizing Platyrrhine primates. She gained extensive experience and a passion for curation as a graduate student curatorial assistant for nearly all UM Museum of Zoology and Herbarium divisions. Molly can be found cuddling her cats, reading, traveling the world as a naturalist, gardening, ice skating, and spoiling her nephew.
Funding
This work was, in part, funded by the U.S. National Institutes of Health, Office of Data Science Strategy and the Generalist Repository Ecosystem Initiative (GREI) OTA-21-00 [3OT2DB000005-01S3]. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH.
Feedback and questions are always welcome, to hello@datadryad.org.
To keep in touch with the latest updates from Dryad, follow us on LinkedIn, Mastodon, and Bluesky and subscribe to our quarterly newsletter.