Linking big

A new commentary piece, Linking big: the continuing promise of evolutionary synthesis,  in the journal Evolution describes the promise of “synthetic science,”  which includes re-use of data sets,  research results, or unconnected methods or concepts,  leading to new discoveries or trends.    The authors, who all are affiliated with the National Evolutionary Synthesis Center (NESCent),  argue for removing the cultural and technological barriers to enable new breakthroughs.

“By putting together pieces of prior research, it is possible to transform how you do science and open the doors to findings that previously were unattainable,” said Brian Sidlauskas, a fish biologist from Oregon State University and lead author on the Evolution article. “But such an approach runs counter to the way science traditionally has been conducted, so pursuing synthetic science is somewhat risky.”

“We need to reduce the risk, remove the barriers, and encourage more pursuit of synthesis because the potential,” he added, “is staggering.”

Sidlauskas cites access to actionable data as one of the major obstacles. “When you’re looking to synthesize data from several hundred individual studies, data formatting, storage and accessibility become huge issues,” he said.   He says that  “…the vast majority of data supporting previous studies are unavailable, often because the data are lost or preserved in inaccessible forms (notebooks, floppy disks).”

The article refers to Dryad as

… working to alleviate the problem of data availability by providing an open-access home for ecological and evolutionary data that does not fit into more specialized repositories. Dryad actively works with a coalition of journals and scientific societies to make deposition of all data a normal part of the research workflow. As more journals require data deposition as part of the manuscript publication process, the opportunities for potential syntheses linking such data will increase substantially.

Sidlauskas adds, “It’s kind of an open-source approach to science,” he added. “Data archives may require some kind of proprietary protection for a few months or years, but after a certain amount of time, they should become public domain. Only by saving the data that underlie today’s science will we allow future scientists to use those data in ways that may far exceed what the original researchers envisioned.”

Other authors on the commentary piece include Ganeshkumar Ganapathy, of the National Evolutionary Synthesis Center (NESCent); Einat Hazkani-Covo, Duke University Medical Center; Kristin P. Jenkins, NESCent; Hilmar Lapp, NESCent; Lauren W. McCall, NESCent; Samantha Price, University of California-Davis; Ryan Scherle, NESCent; Paula A. Spaeth, Northland College; and David M. Kidd, NERC Centre for Population Biology, Imperial College London.

CITATION: Sidlauskas, B., G. Ganapathy, et al. (2010). “Linking big: The continuing promise of evolutionary synthesis.” Evolution doi: 10.1111/j.1558-5646.2009.00892.x.

Editorial on data archiving

A strong editorial on data archiving is now available online in the February issue of The American Naturalist.

Authors Michael C. Whitlock, Mark A. McPeek, Mark D. Rausher, Loren Rieseberg, and Allen J. Moore present the case for the importance of data archiving in science.   This is the first of several coordinated editorials soon to appear in major journals:

To promote the preservation and fuller use of data, The American Naturalist, Evolution, the Journal of Evolutionary Biology, Molecular Ecology, Heredity, and other key journals in evolution and ecology will soon introduce a new data‐archiving policy. The policy has been enacted by the Executive Councils of the societies owning or sponsoring the journals.

Citation: Am Nat 2010. Vol. 175, pp. 145–146. DOI: 10.1086/650340

Making data submission (almost) as easy as falling off a log.

In order to make data submission to Dryad as easy as possible for authors, the system piggybacks in an innovative way on the journal submission process.  The key is that most authors will be submitting their data to Dryad immediately after they learn that their final manuscript has been accepted by the journal.  Through behind-the-scenes communication with the journal, Dryad will already know the “vital information” about that paper before the author comes to Dryad to submit data.  This saves them from the laborious and error-prone task of filling in the paper details at Dryad.  We call this process “submission integration”, and it is one of the fundamental services provided to partner journals.

Dryad submission integration screenshot

A screenshot of the Dryad submission page.

Most journals employ one of a small number of manuscript management software systems to interact with authors, editors and reviewers. These software systems regularly employ customizable email form letters to communicate among the various parties.  Through emails that are automatically sent, and automatically processed upon receipt, Dryad can ensure that authors need not re-enter data that is already available to the journal, that the journal knows the web address that authors can use to access the submission page for that specific article, and – once data has been submitted – that the journal and the author receive notice about the record identifier to include in print.

We’re happy to report that after several months of testing, this system is ready to roll out.  The first guinea pig for testing was The American Naturalist, which publishes a relatively small number of data papers.  Then Molecular Ecology, which publishes a whole lot more.  We are now in the process of setting up submission integration with a long list of partner journals, thanks to Tim Vines of Molecular Ecology, who has written an easy-to-follow instructions for the many journals that use the popular Manuscript Central software.

As a teaser for things to come, we are working to make data archiving even more like falling off of a log, by implementing one-stop data deposition, through Dryad, to one or more specialized repositories required by our partner journals.  Techniques like submission integration and handshaking should greatly facilitate submission to the repository and the usefulness of the data records.

For the curious, here’s a little more detail on how submission integration works. First, the journal automatically sends an email to Dryad upon acceptance of a manuscript. Dryad parses the incoming email and creates an (empty) record for each new article, with a unique identifier based upon the manuscript number.  Second, the author receives the link to the submission page for that article.  Since the bibliographic information about the paper is already stored in Dryad, all the author needs to do is follow the link, log in, and upload their datafiles. Not only does this save the author needless time re-entering author names, paper title and so on, but it also helps to ensure the information is accurate and properly formatted. Ideally, the author also provides a ReadMe document to promote reusability, and optional metadata to make the data more easily discoverable.  Third, upon submission, unique identifiers such as Handles or Digital Object Identifiers (DOIs) are assigned to the data. These identifiers can be resolved to web addresses.  The identifier for the whole record, or what we call the “data package”, is then included in the article according to the conventions of each journal, so that readers of the article can easily find the record in Dryad.   Most data packages will become available as soon as the issue comes out, although some may have an embargo of up to one year.  For more gory details, see our wiki pages.