Archive for the ‘Data management plans’ Category

As a non-profit repository dependent on support from members and users, Dryad is greatly concerned with the economics and sustainability of data services. Our business model is built around Data Publishing Charges (DPCs), designed to recover the basic costs of curating and preserving data. Dryad DPCs can be covered in 3 ways:

  1. The DPC is waived if the submitter is based in a country classified by the World Bank as a low-income or lower-middle-income economy.
  2. For many journals, the society or publisher will sponsor the DPC on behalf of their authors (to see whether this applies, look up your journal).
  3. In the absence of a waiver or a sponsor, the DPC is US$120, payable by the submitter.

Our long-term aim is to increase sponsorships and reduce the financial responsibility of individual researchers.

Last year, we launched a pilot study sponsored by the US National Science Foundation to test the feasibility of having a funding agency directly sponsor the DPC. We conducted a survey of Dryad submitters as part of the pilot, hoping to learn more about how researchers plan and pay for data archiving.

Initial survey results

We first want to say a hearty THANK YOU to our participants for giving us so much good information to work with! (10 participants were randomly selected to receive gift cards as a sign of our appreciation). Respondents were located around the world, with nearly all based at academic institutions.

Survey respondents' positions

A word about selection of survey participants. We know that approximately 1/3 of all Dryad data publications do not have a sponsor or waiver, meaning the researcher is responsible for covering the $120 charge. We wanted to learn more about payment methods and funding sources for these non-sponsored DPCs.

We specifically solicited researchers for our survey who had 1) submitted to Dryad in the previous year and 2) paid their Data Publishing Charge directly (via credit card or voucher code). The survey questions focused on a few topics:

  • Grant funding and Data Management Plans
  • Where the money for their Data Publishing Charges ultimately came from, and
  • Whether funding concerns affect their data archiving behavior.

A few highlights are presented below; we intend to dig deeper into the survey results (and other information gathered as part of the pilot study) and report on them publicly in the coming months.

Planning for data in grant proposals

Nearly 72% of respondents indicated that the research associated with their publication/data was supported by a grant. We wanted to know how (or whether) researchers planned ahead for archiving their data in their grant proposals, and the results were enlightening:

  • 43% did not include a Data Management Plan (DMP) as part of their proposal for funding.
  • Of those who did submit a DMP, only about 46% committed to archiving their data as part of that plan.
  • A whopping 96% said they did not specifically budget for data archiving in their proposal.
  • Only 41% were able to archive their data within the grant funding period, while 59% were unable to, or were unsure.

As these results indicate, data management/stewardship is still not a high priority at the grant proposal stage. Even when researchers plan for data deposition, they don’t consider the costs associated. And even if they do (hypothetically) have funding specifically for data, the timing may not allow them to use it before the grant expires.

These factors suggest that if funding agencies want to prioritize supporting data stewardship, they should make funds available for this purpose outside the traditional grant structure.

Show me the money

When submitters pay the Dryad Data Publishing Charge themselves, where does that money come from? Are submitters being reimbursed? If so, how/by whom?

Our results showed that, unfortunately, about a quarter of our participants paid their DPCs out-of-pocket and did not receive any reimbursement. Approximately the same number paid themselves but were reimbursed (by their institution, a grant, or some combination of these), and 37% of DPCs were paid directly by the institution (using an institutional credit card or voucher code).

How was the Dryad DPC paid?


Some respondents view self-funding of data publication as worthwhile:

My belief is that scientific data should be publicly available and I am willing to cover the costs myself if supervisors (grant holders) do not.

As long as the cost is reasonable, in the worse case scenario I pay from my pocket. Better the data are safe and easily accessible for years to come than stored in spurious formats and difficult-to-access servers.

But for many others, covering the payment can be a real pain point:

I paid the processing charge myself mainly because our University’s reimbursement process was so laborious, I felt it easier just to get it over and done with myself and absorb the relatively small cost personally.

I just have to beg and plead for funding support each time.

If I am publishing after the postdoc ends then I am no longer paid to work on the project. Since I have had four postdocs, each lasting less than two years, this has happened for all my publications.

Examples from the “other” payment category shown above illustrate the scrappiness of researchers in finding funding:

I paid this from flexible research funds that were recently awarded by my institution. Had that not occurred, I would have had to pay personally and not be reimbursed.

I used my RTF (research trust fund) since I didn’t have dedicated grant funding.

Scavenged money from other projects.

Key takeaways

Our preliminary results show that at a time of more and stronger open data policies, paying for data publication remains far from straightforward, with much of the burden passed along to individual researchers.

Concerns about funding for open data can have real impacts on research availability and publication choice. More than 15% of our participants indicated that they have collected data in the last few years that they have been unable to archive due to lack of funds. Meanwhile, over 40% say that when choosing which journal(s) to submit to, sponsorship of the Dryad DPC does, or at least may, influence their decision.

The good news it that during our 8-month pilot implementation period, the US National Science foundation sponsored nearly 200 Data Publishing Charges for which researchers would otherwise have been responsible.

We at Dryad are committed to finding and implementing solutions, and very much appreciate the feedback and support we receive from the research and publishing community. Stay tuned for more lessons learned.

Read Full Post »

Are you a librarian wondering what Dryad can do for you, and you can do for Dryad?  Please see our guest post on “Dryad for the Science Librarian” over at the New England eScience Portal.

eScience Community Blog banner

Read Full Post »

We encourage individuals and project teams seeking to comply with data management planning mandates to consider Dryad as the destination repository for published data from their research.  Dryad is not only a widely applicable, best-practice solution for research data management, it is also a quick and easy solution!

Research datasets associated with a publication in any biological or biomedical field are welcome in Dryad, regardless of file type. Archived data files may include spreadsheets or other tables, images or maps, alignments, character matrices, etc.

Data files deposited in Dryad are permanently preservedpublicly available with no legal restrictions on re-use, and uniquely identified for attribution.

Data submission is simple, quick, and easy. Data files may be uploaded to Dryad in any file format, with a short README and a few metadata terms.

Finally, using an established best-practice data repository like Dryad facilitates a simple description in a data management plan. For example, grant applicants can use language like this to describe their intention to archive data in Dryad:

We plan to use the Dryad public repository for the long-term preservation and dissemination of data underlying publications from this funded research project. Data submitted to Dryad is made publicly available upon online publication** of the associated article. All data in Dryad is released to the public domain without legal restrictions on reuse, through a Creative Commons Zero waiver. There is a (legally non-binding) expectation of attribution of the Dryad data record and associated article. A one-time data deposit charge is paid by the authors or the associated journals, which allows Dryad data to be available for download without cost to users.

**Researchers may instead choose to stipulate an embargo period of 1 year.

If your funding agency allows it, don’t forget to budget for data preservation (data submission to Dryad is free through 2011).

Data deposited in Dryad can help researchers meet these policies and expectations:

  • the (US) National Science Foundation requires that data management plans include provisions for data archiving and preservation, and access policies and provisions for secondary use
  • the Wellcome Trust “expects all of its funded researchers to maximise the availability of research data with as few restrictions as possible”
  • the (US) National Institutes of Health data sharing policies state that “Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”
  • the (UK) Medical Research Council policy on data sharing and preservation states: “Where possible, published results should include links to the associated data. Investigators must show how data will be preserved and their strategies for sharing, e.g. by depositing it in a community database.”

Summaries of funding agencies’ data policies can be found here:

Resources on data management & sharing:

Questions about the role of the Dryad repository in data management planning can be directed to the Dryad team.

Sample data file, Gilbert J and Manica A (2010) Data from: Parental care trade-offs and life history relationships in insects. Dryad Digital Repository. doi:10.5061/dryad.1451

Read Full Post »

The US National Science Foundation (NSF) has released its revised policy on Dissemination and Sharing of Research Results.

Starting January 18, 2011, NSF grant proposals must include a data management plan to describe “how the proposal will conform to NSF policy on the dissemination and sharing of research results.”  Data management plans will be reviewed with the grant application by program officers and peers, and implementation (or lack thereof) may influence subsequent award decisions.

The revised Grant Proposal Guide suggests several items for inclusion in a project’s data management plan:  an inventory of research output the project will create, standards applied for describing and storing the data, policies for sharing, provisions for reuse, and plans for preservation.  This is helpful, but very high-level.

Luckily, the NSF and several Directorates have provided supplementary documents with much more detail on expectations of the NSF in general, and individual Directorates in particular.  The Directorate Guidance documents provide a variety of suggestions (and sometimes requirements), including definitions about what is considered “data”, when the data needs to be made available, and what types of sharing or archive locations are appropriate.  As intended, these guidelines differ between Directorates, reflecting a variety of community norms.

Let’s look at expectations for timeliness of data availability, as a specific example.  The general FAQ states, “the expectation is that all data will be made available after a reasonable length of time,” where “what constitutes a reasonable length of time will be determined by the community of interest through the process of peer review and program management.”  The FAQ further suggests that one reasonable standard is to make data accessible immediately upon study publication.  The ENG (Engineering) guidance recommendation mirrors this.  The expectation of the OCE (Ocean Sciences) is different:  data should be submitted as soon as possible, but no later than two years after collection, with more stringent requirements for some programs.  Using yet a different milestone, the SES (Social and Economic Sciences) suggests that quantitative social and economic datasets be submitted within one year of the expiration of the grant award.  These concrete expectations will clearly assist investigators writing data management plans, and provide a common ground for reviewers.

In several places, the documents explicitly mention that what constitutes an acceptable plan is expected to evolve, as standards, technologies, resources, and community norms change over time.

Nicely done, NSF.

Note:  The Directorate for Biological Sciences has not issued a guidance as of this writing.

Update: The guidance from the Directorate for Biological Sciences was issued June 15, 2011.

For more information:

January 2011 Policy

Commentary and related documents

Read Full Post »