Data files in Dryad don’t just get dumped in there. Someone is there to look after the accuracy and completeness of the metadata, to migrate data files into new formats when necessary, to help users with new submissions, and generally mind the details so that others can find and reuse the data files down the road. This activity is called curation, and it is a critical behind-the-scenes function of a digital repository . Here, we’d like to take this opportunity to introduce Dryad’s lead curator, Elena Feinstein.
Elena, who hails from Atlanta, has degrees in biology from NYU, education from Emory, and library & information science from the University of North Carolina (UNC) at Chapel Hill. Before coming to Dryad, she taught high school and was a science librarian at UNC. Now, Elena works with the UNC Metadata Research Center curating Dryad’s content and continually improving all aspects of the way the repository manages its metadata.
When she’s not working on Dryad, Elena volunteers with the Durham Central Market co-op grocery store, and cooks and bakes until the wee hours.
Next time you submit data to Dryad, rest assured it will receive some quality attention from Elena.
 What is Digital Curation?, Digital Curation Centre
If you’re in London this week, don’t miss Science Online London on Friday and Saturday, Sept. 3-4. Hosted by the British Library, Mendeley, and Nature, this meeting is an opportunity not just to listen but to connect, engage, and interact.
Stop by the British Library booth to find out more about Dryad’s expansion under the new JISC grant involving Oxford University and BL.
Meeting topics include:
- How is the web changing the way we conduct, communicate, share, and evaluate research? How can we employ these trends for the greater good?
- How is the internet changing the way we work with data?
- How are blogs and social networking facilitating scientific discussion? What challenges do we face?
- What challenges and opportunities are there when engaging with the public?
In particular, these sessions on Friday may be of interest to those involved in data sharing:
- Breakout 1: Publishing primary research data
- Breakout 8: Connecting scientific resources
Follow the conference on Twitter @soloconf (comment with hashtag #solo10).
Science is international. Science publishing is international. And so it stands to reason that science data repositories should be international as well.
We are pleased to report that the Joint Information Science Committee (JISC) in the UK has made an award to the Dryad project through its Managing Research Data Program. Through this program, JISC seeks to ” fund projects to explore and pilot innovative technical and organizational models for enhanced research data publications… to stimulate the better management, more open sharing and easier reuse of research data.”
The UK partners in the project include Oxford University and the British Library (BL), with participation from the Digital Curation Centre, Charles Beagrie Ltd, and a number of major scientific publishing houses. The director of the project, Dr. David Shotton at Oxford, heads the Image Bioinformatics Research Group (IBRG) at Oxford, and has been a leader in the application of Web and Semantic Web technologies to enhance biological research data and publications.
The project will result in a UK mirror of the Dryad repository based at the BL, improve the tools available for the publication and citation of data, expand the disciplinary range of participating journals (particularly into epidemiology and infectious diseases), and further develop the business framework for an international organization dedicated to long-term data preservation.
The proposal (PDF here) emerged out of a Dryad-UK discussion meeting cosponsored by the Research Information Network, and held in London in April 2010.
I’ve been puzzling just now over FRPA – the Federal Research Public Access Act of 2009, which was the topic of some lively congressional testimony yesterday afternoon. Most commentary has focused on the immediate, and contentious, issue of whether to mandate open access to articles that are commercially published. But I think there is another issue here. FRPA perpetuates the misunderstanding, which seems common to much of the policy debate over open access publishing, that scientific research output is limited to whatever fits in the pages of a journal.
According to the proposed law, all federal agencies in the US with big-ticket extramural research budgets would be obligated to require of their funding recipients to make final peer-reviewed manuscripts available freely online w/in 6 mos. The act specifically excludes “laboratory notes, preliminary data analyses, notes of the author, phone logs, or other information used to produce final manuscripts”. So where does that leave the final dataset reported in the publication? Good question – it doesn’t seem to be on the radar in this debate at all.
And that’s a pity, because the disposition of these data is something that funding agencies and publishers actually do agree on: “The Association of Learned and Professional Society Publishers (ALPSP) and The International Association of Scientific, Technical, & Medical Publishers (STM) issued a joint statement presenting the views of scholarly and scientific publishers concerning access to research data, including that submitted with research papers. The statement recommends that research data should be as widely available as possible…”
Well said ALPSP/STM! I hope some congressional staffers are reading this. If you are one of them, then please — don’t forget about the data.
In a recent published editorial in Biotropica, Emilio Bruna makes the case for data archiving in tropical biology.
In his words, “… tropical ecosystems are undergoing myriad, rapid, and unprecedented environmental changes. The data collected by Biotropica’s authors could provide an invaluable resource to the scientists and decision-makers studying global change phenomena and designing conservation and management strategies.”
To read more, see: Bruna EM (2010) Scientific Journals can Advance Tropical Biology and Conservation by Requiring Data Archiving Biotropica. 42(4): 399–401. http://dx.doi.org/10.1111/j.1744-7429.2010.00652.x
Nature journals now list Dryad among their suggested data repositories. Citing “an inherent principle of publication is that others should be able to replicate and build upon the authors’ published claims,” the editorial policies mandate data sharing and archiving.
The policy on data sets reads:
A condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to others without preconditions.
Data sets must be made freely available to readers from the date of publication, and must be provided to editors and peer-reviewers at submission, for the purposes of evaluating the manuscript.
For the following types of data set, submission to a community-endorsed, public repository is mandatory. Accession numbers must be provided in the paper. Examples of appropriate public repositories are listed below.
PANGAEA (Publishing Network for Geoscientific & Environmental Data) is a repository for geoscience data with many features similar to Dryad, including use of DOIs for data files. A recent press release reports that Elsevier and PANGAEA have implemented reciprocal linking between data in the repository and journal articles. Research data sets deposited at PANGAEA are now automatically linked to the corresponding articles in Elsevier journals on its electronic platform ScienceDirect and vice versa. The data are freely available from the publication’s page in ScienceDirect, without a login or subscription.
Try it out:
- From this PANGAEA record, follow the DOI to the article in ScienceDirect (citations and abstracts only, unless you or your institution have subscription access)
- The PANGAEA link is to the right of the article with Supplementary Data beside it
This valuable two-way connectivity between data and article is most easily achieved when the data are captured at the time of article submission. See this previous post for more on Dryad’s approach to this problem, which is designed to work across multiple publishers.
Similar to the appearance of the PANGAEA logo in the online version of the article, we are toying with the idea of calling attention to the link in the opposite direction by placing journal cover images next to article DOIs in the Dryad display. We’d like to hear your thoughts on that. Is it helpful signage? Or distracting eye candy?
“For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.” The just-released Panton Principles propose that “data related to published science should be explicitly placed in the public domain.”
The creators recommend “adopting and acting on the following principles:”
- When publishing data make an explicit and robust statement of your wishes.
- Use a recognized waiver or license that is appropriate for data.
- If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
- Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.
These principles were written by Peter Murray-Rust, Cameron Neylon, Rufus Pollock and John Wilbanks at the Panton Arms in Cambridge, UK, and then refined by the Open Knowledge Foundation Working Group on Open Data in Science. There are open data web buttons available, and individuals and organizations can endorse the principles here.
There are lots of opinions and answers to this question. For starters, here’s a lively blog post, responding to this editorial last April. Consider also this blog post.
What do you think are the barriers to data sharing?
Data from: Thompson S, Daniels K. 2010. A porous convection model for small-scale grass patterns. American Naturalist 175: E10-E15. Dryad Digital Repository. http://hdl.handle.net/10255/dryad.857
The Journal of Evolutionary Biology, the journal of the European Society for Evolutionary Biology, has just published an editorial supporting data archiving. The editorial is now available online:
The need for archiving data in evolutionary biology. Allen J. Moore, Mark A. McPeek, Mark D. Rausher, Loren Rieseberg, Michael C. Whitlock. Journal of Evolutionary Biology 2010.
Published Online: Feb 9 2010