Are you curious about what’s involved in depositing data in Dryad? looking for a quick way to show colleagues how straightforward data archiving can be? Dryad’s new 2-minute video demonstrates the data deposit process from start to finish.
Archive for the ‘Data availability’ Category
It would be a good idea to know and be ready to deposit your files in a data repository, because this month marks the implementation of the Joint Data Archiving Policy. The policy, endorsed by a consortium of prominent journals and societies, states that journals will require
as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive.
The policy can be customized by each journal, and enables both embargoes and editorial discretion to make special exceptions. Blanket exemptions apply to sensitive data such as identifiable human records and endangered species localities.
The journals (and corresponding societies) implementing the policy this month are:
- The American Naturalist (American Society of Naturalists)
- Evolution (Society for the Study of Evolution)
- Evolutionary Applications
- Heredity (The Genetics Society)
- Journal of Evolutionary Biology (European Society for Evolutionary Biology)
- Molecular Biology and Evolution (Society for Molecular Biology and Evolution)
- Molecular Ecology
- Systematic Biology (Society for Systematic Biology)
A sampling of the revised Instructions to Authors includes:
- The American Naturalist: “The American Naturalist requires authors to deposit the data associated with accepted papers in a public archive. For gene sequence data and phylogenetic trees, deposition in GenBank or TreeBASE, respectively, is required. There are many possible archives that may suit a particular data set, including the Dryad repository for ecological and evolutionary biology data (http://datadryad.org). All accession numbers for GenBank, TreeBASE, and Dryad must be included in accepted manuscripts before they go to Production. Any impediments to data sharing should be brought to the attention of the editors at the time of submission.”
- Journal of Evolutionary Biology “The editors and publisher of this journal expect authors to make the data underlying published articles available. An investigator who feels that reasonable requests have not been met by the authors should correspond with the Editor-in-Chief. Authors must use the appropriate database to deposit detailed information supplementing submitted papers, and quote the accession number in their manuscripts.”
- Molecular Ecology: “Data Accessibility: To enable readers to locate archived data from Molecular Ecology papers, as of January 2011 we will require that authors include a ‘Data Accessibility’ section after their references. This should list the data base and respective accession numbers for all data from the manuscript that has been made publicly available…. Please note that this section must be complete prior to the submission of the final version of your manuscript. Papers lacking this section will not be sent to Production.”
At Dryad, we have been working for some time now with editors and publishers at these and other partner journals to support the implementation of this policy. If you submit an article to a “JDAP journal,” you will be invited to simultaneously submit your data to Dryad. This may occur either prior to review or, depending on the journal, at the time your article is accepted. Dryad and the journal communicate behind the scenes to make it as easy as possible for you to deposit your data, and also ensure that a permanent, resolvable, and citable data identifier is published in the final article. That way, in the future, no one need be frightened by the question “do you know where your data are?”
An international group of major health research funders have made a “joint statement of purpose” announcing, in strong and clear terms, their intent to promote greater sharing of research data.
As public and charitable funders of this research, we believe that making research data sets available to investigators beyond the original research team in a timely and responsible manner, subject to appropriate safeguards, will generate three key benefits: faster progress in improving health, better value for money, and higher quality science.
The 17 signatories (so far) include many major governmental funding agencies (e.g. US National Institutes of Health, the Wellcome Trust, The Centers for Disease Control, the UK Medical Research Council, Australia’s National Health and Medical Research Council, the Canadian Institutes of Health Research, France’s National Institute for Health and Medical Research, and the German Research Foundation), private foundations (e.g. the Bill & Melinda Gates Foundation and the Hewlett Foundation) and even international organizations such as the World Bank. The group has invited additional funders to sign on to the statement.
Some of the long-term goals articulated in the document are near and dear to our hearts, in particular:
To the extent possible, datasets underpinning research papers in peer-reviewed journals are archived and made available to other researchers in a clear and transparent manner.
The human and technical resources and infrastructures needed to support data management, archiving and access are developed and supported for long-term sustainability.
An accompanying comment in The Lancet by Mark Walport of the Wellcome Trust and Paul Brest of the Hewlett Foundation (Sharing research data to improve public health, DOI:10.1016/S0140-6736(10)62234-9) raises some of the hard, but by now familiar, questions that will drive the approaches taken by the funding organizations: how to balance the rights and responsibilities of data generators and data users; how to safeguard and further the interests of the data subjects themselves; and how to ensure that the benefits of data sharing justify the expense and burden involved.
It will be very interesting to watch how the funding organizations work singly and in concert to overcome decades of cultural familiarity with data hoarding in the health sciences and, as Walport and Brent put it, “mend their ways.”
We’ve created a new Twitter feed for announcing all new data packages added to Dryad. It’s @datadryadnew — follow it if you want to keep an eye on what is going in to the repository.
Our @datadryad feed is also available, for updates on the Dryad repository and data sharing in general.
BioMed Central, publisher of over 200 peer-reviewed journals, has issued a draft statement on data sharing and open data, inviting comments from the scientific community. BMC’s Iain Hrynaszkiewicz consulted with several Dryad team members in the formulation of the statement. A related editorial in BMC Research Notes names Dryad as an example of a repository where data are assigned a unique identifier and “available in perpetuity with permanence guaranteed.” BMC Research Notes is seeking to encourage greater data sharing by waiving the publication fee for all articles which use or link to open data that is prepared in line with a community-accepted standard.
The draft statement supports data deposition in repositories assigning permanent identifiers to data, such as the DOI used by Dryad. BMC endorses the publishers’ role of providing “clear and permanent links to data hosted in repositories” and are working on a list of the available repositories.
Furthermore the statement says that “a way forward would be to require that from a specific date, any author submitting to a BioMed Central journal agrees to dedicate the data elements of their article and supplementary material to the public domain and apply the CC0 licence.” This proposed policy aligns closely with the Joint Data Archiving Policy (JDAP) already adopted by several Dryad partner journals.
Comments on the statement can be directed to the BMC blog.
Several journals in the field of proteomics have decided to mandate data sharing at the time of publication. These journals are leading the way toward data sharing out of a conviction that “the provenance of data sets and their proper citation is central to the research process,” as described in a recent commentary in Bio-IT World, Share the Data: Making Large-Scale Proteomics Data Widely Available.
Now “authors who publish a manuscript containing mass spectrometry data in Molecular and Cellular Proteomics (MCP) must submit the raw data to a publicly accessible site.” The journal Proteomics also requires data deposit in a public archive.
There are several specialized data repositories in the field, and several are working together as ProteomExchange “to provide a single point of submission to proteomics repositories, and encourage the data exchange and sharing of identifiers between the repositories so that the community may easily find datasets in the participating repositories.”
A new commentary piece, Linking big: the continuing promise of evolutionary synthesis, in the journal Evolution describes the promise of “synthetic science,” which includes re-use of data sets, research results, or unconnected methods or concepts, leading to new discoveries or trends. The authors, who all are affiliated with the National Evolutionary Synthesis Center (NESCent), argue for removing the cultural and technological barriers to enable new breakthroughs.
“By putting together pieces of prior research, it is possible to transform how you do science and open the doors to findings that previously were unattainable,” said Brian Sidlauskas, a fish biologist from Oregon State University and lead author on the Evolution article. “But such an approach runs counter to the way science traditionally has been conducted, so pursuing synthetic science is somewhat risky.”
“We need to reduce the risk, remove the barriers, and encourage more pursuit of synthesis because the potential,” he added, “is staggering.”
Sidlauskas cites access to actionable data as one of the major obstacles. “When you’re looking to synthesize data from several hundred individual studies, data formatting, storage and accessibility become huge issues,” he said. He says that “…the vast majority of data supporting previous studies are unavailable, often because the data are lost or preserved in inaccessible forms (notebooks, floppy disks).”
The article refers to Dryad as
… working to alleviate the problem of data availability by providing an open-access home for ecological and evolutionary data that does not fit into more specialized repositories. Dryad actively works with a coalition of journals and scientific societies to make deposition of all data a normal part of the research workflow. As more journals require data deposition as part of the manuscript publication process, the opportunities for potential syntheses linking such data will increase substantially.
Sidlauskas adds, “It’s kind of an open-source approach to science,” he added. “Data archives may require some kind of proprietary protection for a few months or years, but after a certain amount of time, they should become public domain. Only by saving the data that underlie today’s science will we allow future scientists to use those data in ways that may far exceed what the original researchers envisioned.”
Other authors on the commentary piece include Ganeshkumar Ganapathy, of the National Evolutionary Synthesis Center (NESCent); Einat Hazkani-Covo, Duke University Medical Center; Kristin P. Jenkins, NESCent; Hilmar Lapp, NESCent; Lauren W. McCall, NESCent; Samantha Price, University of California-Davis; Ryan Scherle, NESCent; Paula A. Spaeth, Northland College; and David M. Kidd, NERC Centre for Population Biology, Imperial College London.
CITATION: Sidlauskas, B., G. Ganapathy, et al. (2010). “Linking big: The continuing promise of evolutionary synthesis.” Evolution doi: 10.1111/j.1558-5646.2009.00892.x.
A strong editorial on data archiving is now available online in the February issue of The American Naturalist.
Authors Michael C. Whitlock, Mark A. McPeek, Mark D. Rausher, Loren Rieseberg, and Allen J. Moore present the case for the importance of data archiving in science. This is the first of several coordinated editorials soon to appear in major journals:
To promote the preservation and fuller use of data, The American Naturalist, Evolution, the Journal of Evolutionary Biology, Molecular Ecology, Heredity, and other key journals in evolution and ecology will soon introduce a new data‐archiving policy. The policy has been enacted by the Executive Councils of the societies owning or sponsoring the journals.
Citation: Am Nat 2010. Vol. 175, pp. 145–146. DOI: 10.1086/650340
“Because the state of natural systems is never repeated, data losses, or missed data collection opportunities can never be corrected.” So says the AGU, recently reaffirming the importance of data availability and preservation.
The statement offers strong support for data archiving and publication as a routine part of the research process.
The cost of collecting, processing, validating, and submitting data to a recognized archive should be an integral part of research and operational programs. Such archives should be adequately supported with long-term funding. Organizations and individuals charged with coping with the explosive growth of Earth and space digital data sets should develop and offer tools to permit fast discovery and efficient extraction of online data, manually and automatically, thereby increasing their user base. The scientific community should recognize the professional value of such activities by endorsing the concept of publication of data, to be credited and cited like the products of any other scientific activity, and encouraging peer-review of such publications.
The full statement from the AGU Council can be found here.