Feeds:
Posts
Comments

We encourage individuals and project teams seeking to comply with data management planning mandates to consider Dryad as the destination repository for published data from their research.  Dryad is not only a widely applicable, best-practice solution for research data management, it is also a quick and easy solution!

Research datasets associated with a publication in any biological or biomedical field are welcome in Dryad, regardless of file type. Archived data files may include spreadsheets or other tables, images or maps, alignments, character matrices, etc.

Data files deposited in Dryad are permanently preservedpublicly available with no legal restrictions on re-use, and uniquely identified for attribution.

Data submission is simple, quick, and easy. Data files may be uploaded to Dryad in any file format, with a short README and a few metadata terms.

Finally, using an established best-practice data repository like Dryad facilitates a simple description in a data management plan. For example, grant applicants can use language like this to describe their intention to archive data in Dryad:

We plan to use the Dryad public repository for the long-term preservation and dissemination of data underlying publications from this funded research project. Data submitted to Dryad is made publicly available upon online publication** of the associated article. All data in Dryad is released to the public domain without legal restrictions on reuse, through a Creative Commons Zero waiver. There is a (legally non-binding) expectation of attribution of the Dryad data record and associated article. A one-time data deposit charge is paid by the authors or the associated journals, which allows Dryad data to be available for download without cost to users.

**Researchers may instead choose to stipulate an embargo period of 1 year.

If your funding agency allows it, don’t forget to budget for data preservation (data submission to Dryad is free through 2011).

Data deposited in Dryad can help researchers meet these policies and expectations:

  • the (US) National Science Foundation requires that data management plans include provisions for data archiving and preservation, and access policies and provisions for secondary use
  • the Wellcome Trust “expects all of its funded researchers to maximise the availability of research data with as few restrictions as possible”
  • the (US) National Institutes of Health data sharing policies state that “Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”
  • the (UK) Medical Research Council policy on data sharing and preservation states: “Where possible, published results should include links to the associated data. Investigators must show how data will be preserved and their strategies for sharing, e.g. by depositing it in a community database.”

Summaries of funding agencies’ data policies can be found here:

Resources on data management & sharing:

Questions about the role of the Dryad repository in data management planning can be directed to the Dryad team.

Sample data file, Gilbert J and Manica A (2010) Data from: Parental care trade-offs and life history relationships in insects. Dryad Digital Repository. doi:10.5061/dryad.1451

Credit: adamthelibrarian, from Flickr

This is an important month, because a host of our partner journals are implementing new policies on data archiving, and, in the U.S., the National Science Foundation is asking its new grantees to have explicit data management plans.  There are over 1000 data files from over 50 journals now in Dryad, and much of this content has been submitted only within the past year. Clearly, Dryad’s role in supporting the growing data archiving mandates from journals and funders continues to expand.

New Features
In the past few months, several new features have been added to Dryad.  Users can now save an incomplete submission and come back later to complete it.  They can see a listing of their completed and in progress submissions.  Users can download data citations to their favorite bibliography management programs and upload them to their favorite social bookmarking tools.  A new “faceted search” interface allows users to find data more easily, and also displays related content in other repositories, including ecological and environmental science data (from the Knowledge Network for Biocomplexity) and phylogenetic data (from TreeBASE). To provide an early indication of scientific impact, users can see how often data have been viewed and downloaded.

An important new feature is “handshaking”, which is what we call the process whereby authors upload some of their data to Dryad, and the information is conveyed behind-the-scenes to a specialized repository. The aim of handshaking is to reduce the time and effort need to deposit data when there are different repositories managing different aspects of the data.  Handshaking also enables persistent linkages among data in the different repositories. As a first foray into handshaking, we now offer users the option of initiating a deposit in TreeBASE, the primary repository for published phylogenetic data, whenever a NEXUS file is uploaded to Dryad.  Alternatively, the option is available to deposit in another repository first, and report the identifiers to Dryad to ensure that users can find all the data relevant to a given article.  We will be working in the months ahead to handshake with other specialized repositories required by our partner journals.

See our recent blog post about these features for more details.

Data Deposit in Three Easy Steps: The Movie
Are you looking for a way to show a colleague how straightforward data archiving can be?  We’ve added a short (2-minute) video to the site that walks users through the deposit process in three easy steps.  The video also available at SciVee.

Journals Implement Joint Data Archiving Policy
Starting this month, a number of Dryad partner journals have implemented a Joint Data Archiving Policy that requires, as a condition of publication, that authors deposit the data underlying their article in a public repository.  Some of the journals implementing this policy include: The American Naturalist, Evolution, Evolutionary Applications, Heredity, Journal of Evolutionary Biology, and Molecular Ecology. A recent TREE article by Michael Whitlock suggests how “data generators, data re-users, and journals can maximize the fairness and scientific value of data archiving.”

A growing number of journals now integrate their submission process with Dryad, meaning that the repository and journal exchange information to facilitate the author’s data deposition process and to ensure persistent linkage between articles and data. The current list includes The American Naturalist, The Biological Journal of the Linnean Society, Evolution, Journal of Evolutionary Biology, Journal of Heredity, Molecular Ecology, and Molecular Ecology Resources. And more are on the way (stay tuned).

NSF Data Management Plan Mandate
Starting this month, the U.S. National Science Foundation is requiring grant applicants to provide a data management plan describing how data will be collected, preserved and made available, and these plans will be subject to peer review.  We encourage applicants to leverage Dryad in their data management plans as a solution for the long-term preservation and dissemination of the data associated with their publications.  There are some pointers to resources for data management planning on the Dryad website.

Dryad UK Project
The Joint Information Science Committee (JISC) in the UK has made an award to Dryad and through Oxford University and the British Library to expand the scope of the journals involved, including into the areas of infectious disease and epidemiology, and to create a UK mirror of Dryad.  More information is here and at the Dryad UK site.

New Twitter Feed for Data Deposits
Interested in keeping up with new data available in Dryad?  Follow our Twitter feed (@datadryadnew) or subscribe to our RSS feed. We also Tweet general news about the repository and the world of data science as @datadryad.

Browse and search the repository at http://datadryad.org/
Follow Dryad on Twitter http://twitter.com/datadryad

This blog post is the first issue of the Dryad newsletter, summarizing recent achievements and milestones of the data repository.  If you’d like to receive future newsletters by email, please sign up for the Dryad Users mailing list.

Researchers working in data-intensive science, as well as science editors and publishers thinking about data policies, may want to take note of a new article by Michael Whitlock, Data archiving in ecology and evolution: best practices in the current issue of Trends in Ecology & Evolution.

Whitlock has long been a leader in advocating for data archiving and is the current Chair of the Dryad Consortium Board.  In this article he presents concrete suggestions for the what, how and when of data archiving.

But archiving is only half the equation.  Whitlock attempts to articulate sensible guidelines for data reuse, as well. Under what circumstances should researchers contact the original creators of the data set they are re-using, and when is co-authorship appropriate? How should authors properly acknowledge the original creators of the data?

Journals, editors, and publishers have an important role in promoting both data archiving and responsible data reuse.  One problem that merits broader discussion is how journals can conduct peer review so as to prevent data misuse.  Should researchers be given a chance to review manuscripts that report on new results reusing data that they originally published?  Or is it better to avoid the potential for conflict of interest (e.g. “how dare they not replicate my findings!”) and instead recruit independent experts?

Although the article is especially timely for those working in evolutionary biology and ecology, due to the recent adoption of mandatory data archiving at many of the leading journals in the field, these best practice recommendations are relevant across the sciences.

Michael C. Whitlock (2011) Data archiving in ecology and evolution: best practices, Trends in Ecology & Evolution,  26 (2): 61-65.  doi:10.1016/j.tree.2010.11.006.

Are you curious about what’s involved in depositing data in Dryad? looking for a quick way to show colleagues how straightforward data archiving can be?  Dryad’s new 2-minute video demonstrates the data deposit process from start to finish.

How to deposit data in Dryad

The video is embedded on the Dryad website, and also available on SciVee. Feel free to link to it and share it with colleagues.

It’s January 2011– do you know where your data are? 

It would be a good idea to know and be ready to deposit your files in a data repository, because this month marks the implementation of the Joint Data Archiving Policy.  The policy, endorsed by a consortium of prominent journals and societies, states that journals will require

as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive.

The policy can be customized by each journal, and enables both embargoes and editorial discretion to make special exceptions. Blanket exemptions apply to sensitive data such as identifiable human records and endangered species localities.

The journals (and corresponding societies) implementing the policy this month are:

  • The American Naturalist (American Society of Naturalists)
  • Evolution (Society for the Study of Evolution)
  • Evolutionary Applications
  • Heredity (The Genetics Society)
  • Journal of Evolutionary Biology (European Society for Evolutionary Biology)
  • Molecular Biology and Evolution (Society for Molecular Biology and Evolution)
  • Molecular Ecology
  • Systematic Biology (Society for Systematic Biology)

A sampling of the revised Instructions to Authors includes:

  • The American Naturalist: “The American Naturalist requires authors to deposit the data associated with accepted papers in a public archive. For gene sequence data and phylogenetic trees, deposition in GenBank or TreeBASE, respectively, is required. There are many possible archives that may suit a particular data set, including the Dryad repository for ecological and evolutionary biology data (http://datadryad.org). All accession numbers for GenBank, TreeBASE, and Dryad must be included in accepted manuscripts before they go to Production. Any impediments to data sharing should be brought to the attention of the editors at the time of submission.”
  • Journal of Evolutionary BiologyThe editors and publisher of this journal expect authors to make the data underlying published articles available. An investigator who feels that reasonable requests have not been met by the authors should correspond with the Editor-in-Chief. Authors must use the appropriate database to deposit detailed information supplementing submitted papers, and quote the accession number in their manuscripts.”
  • Molecular Ecology: “Data Accessibility: To enable readers to locate archived data from Molecular Ecology papers, as of January 2011 we will require that authors include a ‘Data Accessibility’ section after their references. This should list the data base and respective accession numbers for all data from the manuscript that has been made publicly available…. Please note that this section must be complete prior to the submission of the final version of your manuscript. Papers lacking this section will not be sent to Production.”

At Dryad, we have been working for some time now with editors and publishers at these and other partner journals to support the implementation of this policy. If you submit an article to a “JDAP journal,” you will be invited to simultaneously submit your data to Dryad. This may occur either prior to review or, depending on the journal, at the time your article is accepted. Dryad and the journal communicate behind the scenes to make it as easy as possible for you to deposit your data, and also ensure that a permanent, resolvable, and citable data identifier is published in the final article.  That way, in the future, no one need be frightened by the question “do you know where your data are?”

Asclepius statue

Statue of Asclepius, the Greek God of Medicine, from the Museum of Epidaurus Theatre. Image from: Wikimedia Commons, Licensed under: GFDL 1.3.

An international group of major health research funders have made a “joint statement of purpose” announcing, in strong and clear terms, their intent to promote greater sharing of research data.

As public and charitable funders of this research, we believe that making research data sets available to investigators beyond the original research team in a timely and responsible manner, subject to appropriate safeguards, will generate three key benefits: faster progress in improving health, better value for money, and higher quality science.

The 17 signatories (so far) include many major governmental funding agencies (e.g. US National Institutes of Health, the Wellcome Trust, The Centers for Disease Control, the UK Medical Research Council, Australia’s National Health and Medical Research Council, the Canadian Institutes of Health Research, France’s National Institute for Health and Medical Research, and the German Research Foundation), private foundations (e.g. the Bill & Melinda Gates Foundation and the Hewlett Foundation) and even international organizations such as the World Bank.  The group has invited additional funders to sign on to the statement.

Some of the long-term goals articulated in the document are near and dear to our hearts, in particular:

To the extent possible, datasets underpinning research papers in peer-reviewed journals are archived and made available to other researchers in a clear and transparent manner.

and

The human and technical resources and infrastructures needed to support data management, archiving and access are developed and supported for long-term sustainability.

An accompanying comment in The Lancet by Mark Walport of the Wellcome Trust and Paul Brent of the Hewlett Foundation (Sharing research data to improve public health, DOI:10.1016/S0140-6736(10)62234-9) raises some of the hard, but by now familiar, questions that will drive the approaches taken by the funding organizations: how to balance the rights and responsibilities of data generators and data users; how to safeguard and further the interests of the data subjects themselves; and how to ensure that the benefits of data sharing justify the expense and burden involved.

It will be very interesting to watch how the funding organizations work singly and in concert to overcome decades of cultural familiarity with data hoarding in the health sciences and, as Walport and Brent put it, “mend their ways.”

Ever wonder what happens to your Dryad data behind the scenes? Here’s a quick overview.

Once a depositor has uploaded their data files and finalized their submission, the Dryad curator is notified of the new content. The curator looks at the uploaded files to make sure they really do contain data (and not, say, the article manuscript or pictures of kittens). The curator then exerts some quality control on the metadata, the description of the article and data files. She corrects errors, such as typos or formatting tags that are displaying incorrectly, and may enrich the metadata, by adding taxon name keywords, for example. Advanced metadata enrichment issues include the tricky realm of name authority control, which ensures that all works by a given author are gathered together despite the varying forms of their name.

Once the curator approves the submission, the metadata description of the data goes live in the repository. The status of the data files themselves depends upon the embargo options selected by the depositor. Dryad DOIs (Digital Object Identifiers) are sent to the depositor and, in the case of our integrated partner journals, to the journal editors, so that they can be included in all forms of the final published article, and allow readers of the article to find the supporting data.

After the article is published, the curator adds complete article citation information, including a hyperlinked article DOI, to the Dryad record, and updates any data file embargoes, if needed.

The outcome is data files, which

  • are securely deposited in the repository, and linked to the journal article,
  • have a unique, permanent identifier that can be cited, and
  • can be discovered independently of the article, as well as through the article.

Additionally, authors can now track the views and downloads of their data files.   Dryad displays the number of times the data package has been viewed, and the number of times each component data file has been both viewed and downloaded.

The US National Science Foundation (NSF) has released its revised policy on Dissemination and Sharing of Research Results.

Starting January 18, 2011, NSF grant proposals must include a data management plan to describe “how the proposal will conform to NSF policy on the dissemination and sharing of research results.”  Data management plans will be reviewed with the grant application by program officers and peers, and implementation (or lack thereof) may influence subsequent award decisions.

The revised Grant Proposal Guide suggests several items for inclusion in a project’s data management plan:  an inventory of research output the project will create, standards applied for describing and storing the data, policies for sharing, provisions for reuse, and plans for preservation.  This is helpful, but very high-level.

Luckily, the NSF and several Directorates have provided supplementary documents with much more detail on expectations of the NSF in general, and individual Directorates in particular.  The Directorate Guidance documents provide a variety of suggestions (and sometimes requirements), including definitions about what is considered “data”, when the data needs to be made available, and what types of sharing or archive locations are appropriate.  As intended, these guidelines differ between Directorates, reflecting a variety of community norms.

Let’s look at expectations for timeliness of data availability, as a specific example.  The general FAQ states, “the expectation is that all data will be made available after a reasonable length of time,” where “what constitutes a reasonable length of time will be determined by the community of interest through the process of peer review and program management.”  The FAQ further suggests that one reasonable standard is to make data accessible immediately upon study publication.  The ENG (Engineering) guidance recommendation mirrors this.  The expectation of the OCE (Ocean Sciences) is different:  data should be submitted as soon as possible, but no later than two years after collection, with more stringent requirements for some programs.  Using yet a different milestone, the SES (Social and Economic Sciences) suggests that quantitative social and economic datasets be submitted within one year of the expiration of the grant award.  These concrete expectations will clearly assist investigators writing data management plans, and provide a common ground for reviewers.

In several places, the documents explicitly mention that what constitutes an acceptable plan is expected to evolve, as standards, technologies, resources, and community norms change over time.

Nicely done, NSF.

Note:  The Directorate for Biological Sciences has not issued a guidance as of this writing.

Update: The guidance from the Directorate for Biological Sciences was issued June 15, 2011.

For more information:

January 2011 Policy

Commentary and related documents

We’ve created a new Twitter feed for announcing all new data packages added to Dryad.  It’s @datadryadnew — follow it if you want to keep an eye on what is going in to the repository.

Our @datadryad feed is also available, for updates on the Dryad repository and data sharing in general.

Data files in Dryad don’t just get dumped in there.  Someone is there to look after the accuracy and completeness of the metadata, to migrate data files into new formats when necessary, to help users with new submissions, and generally mind the details so that others can find and reuse the data files down the road.  This activity is called curation, and it is a critical behind-the-scenes function of a digital repository [1].   Here, we’d like to take this opportunity to introduce Dryad’s lead curator, Elena Feinstein.

Elena, who hails from Atlanta, has degrees in biology from NYU, education from Emory, and library & information science from the University of North Carolina (UNC) at Chapel Hill. Before coming to Dryad, she taught high school and was a science librarian at UNC. Now, Elena works with the UNC Metadata Research Center curating Dryad’s content and continually improving all aspects of the way the repository manages its metadata.

When she’s not working on Dryad, Elena volunteers with the Durham Central Market co-op grocery store, and cooks and bakes until the wee hours.

Next time you submit data to Dryad, rest assured it will receive some quality attention from Elena.

[1] What is Digital Curation?, Digital Curation Centre

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 470 other followers