Five-ish Minutes With: Charles Fox

In our latest post, our Executive Director Melissanne Scheld sits down with Dryad’s Board of Directors Chair, Professor Charles Fox, to discuss challenges researchers face today, how Dryad is helping alleviate some of those pain points, why Dryad has had such staying power in a quickly changing industry,  . . . and then we move on to dessert. 

Chuck Fox

Can you tell us a little about your professional background and how that intersects with Dryad’s mission?

I wear two hats in my professional life – I am an evolutionary ecologist who studies various aspects of insect biology at the University of Kentucky, and I am a journal editor (Executive Editor of Functional Ecology).

My involvement with open data and Dryad began fortuitously in 2006. The British Ecological Society was invited to send a representative to a Data Registry Workshop, organized by the Ecological Society of America, to be held that December in Santa Barbara, California. I am (and was at that time) an editor of one of the British Ecological Society’s journals, Functional Ecology, and I live in the U.S. So Lindsay Haddon, who was Publications Manager for the BES, asked me to attend the workshop  as their representative. Before that meeting I don’t recall having thought much about open data or data archives, but I was excited to attend the meeting in part because the topic intrigued me and, selfishly, because my parents live in southern California and this was an opportunity to visit them. The discussions at that meeting, plus those at a couple follow-up meetings over the next couple years, including one at NESCent in Durham, North Carolina, and another in Vancouver, convinced me that data publishing, and open data more generally, should be a part of research publication. So I began lobbying the BES to adopt an open data policy and become a founding member of Dryad. I wrote a proposed data policy – just a revision of the Journal Sata Archiving Policy, JDAP, that many ecology and evolution journals adopted – and submitted that proposal to the BES’ publication committee. It took a few years, but in 2011 the BES adopted that data policy across their suite of journals and became a member of Dryad. The BES has since been a strong supporter of open data and required data publication as a condition of publishing a manuscript in one of their journals. Probably because I was a vocal proponent of data policies at BES meetings (along with a few others, most notably Tim Coulson), I was nominated to be a Dryad board member, and was elected to the board in 2013.

As an educator,  what are some of the biggest changes you’ve seen in the classroom during your career?

When I started teaching, first as a graduate student (teaching assistant) and then as a young university professor, we didn’t have Powerpoint and digital projectors. So I made heavy use of a chalkboard (or dry erase board) during lecture, and used an overhead projector for more complicated graphics. Students had to take detailed notes on the lecture, which required them to write furiously all throughout the class. Nowadays I produce detailed PowerPoint slides that include most of the material I cover, so I write very little on the chalkboard. And, because I can provide my slides to students before class – as a pdf that they can print and bring to class – the students are freed from scribbling furiously to capture every detail. Students still need to take some notes (my slides do not include every detail), but they are largely freed to listen to lecture and participate in class discussions. I am not convinced, though, that these changes have led to improved learning, at least not in all students. Having information too easily available, including downloadable class materials, seems to cause some students to actually disengage from class, and ultimately do poorly, possibly because they think they don’t need to attend class, or engage when they do attend, since they have all of the materials easily accessible to them outside the classroom?

What do you think the biggest challenges are for open science research today?

I have been amazed at how quickly open data has become accepted as the standard in the ecology and evolution research communities. When data policies were first proposed to journals there was substantial resistance to their adoption – journals were nervous about possibly driving away authors, and editors (who are also researchers) shared the views that were common in the community regarding ownership of their own data – but over just a few years the resistance largely disappeared among editors, societies and publishers, such that a large proportion of the top journals in the field have adopted policies requiring data to be published alongside research manuscripts. That said, some significant challenges remain, both on the researcher side and on the repository side. On the repository side, sustainable funding remains the largest hurdle. Data repositories cost money to run, such as for staff and infrastructure. Dryad has been relying on a mix of data publication charges (DPCs) and grants to fund its mission. This has worked for us so far, but constantly chasing grants is a lot of work for those writing grants, and the cost to researchers paying DPCs, albeit small, is not trivial for those without grant support.

On the researcher side, though data publishing has mostly become an accepted part of research publication in the community, there remain many important cultural and practical challenges to making open data universally practiced.  These include the development of standards for data citation and reuse (not restrictions on data reuse, but community expectations for citation and collaboration), balancing views of data ownership with the needs of the community, balancing the concerns of researchers that produce long-term datasets with those of the community, and others. We also need to improve education about data, such as teaching our students how to organize and properly annotate their datasets so that they are useful for other researchers after publication. Even when data are made available by researchers, actually using those data can be challenging if they are not well organized and annotated.

When researchers are deciding in which repository to deposit their research data, what values and functions should they consider?

Researchers should choose a repository that best fits the type of data they have to deposit and the community that will likely be reusing it. There are many repositories that handle specialized data types, such as genetic sequence data or data to be used for phylogenetic analysis. If your data suits a specialized archive, choose that. But the overwhelming majority of data generated by ecologists don’t fit into specialized archives. It’s for these types of data that Dryad was developed.

So what does Dryad offer researchers? From the perspective of the dataset author, Dryad links your dataset directly to the manuscript you have published about the dataset. This provides users detailed metadata on the contents of your dataset, helping them understand the dataset and use it correctly for future research. Dryad also ensures that your dataset is discoverable, whether you start at the journal page, on Dryad’s site, or any of a large number of collaborator services. The value of Dryad to the dataset user are similar – easy discoverability of data and clear links to the data collection details (i.e., links to the associated manuscripts).  

You’ve held several roles on Dryad’s Board of Directors – what about this organization compels you to volunteer your free time?

My experiences as a scientist, a journal editor, and participating in open data discussions have convinced me that data publication is an essential part of research publication. For decades, or even centuries, we’ve relied on a publishing model where researchers write manuscripts that describe the work they have done and summarize their results and conclusions for the broader community. That’s the typical journal paper, and was the limit of what could be done in an age where everything had to fit onto the printed page and be distributed on paper. Nowadays we have near infinite space in a digital medium to not just summarize our results, but also provide all of the details, including the actual data, as part of the research presentation. It will always be important to have an author summarize their findings and place their work into context – that intellectual contribution is an essential part of communicating your research – but there’s no reason that’s where we need to stop. I imagine a world where a reader can click on a figure, or table, or other part of a manuscript and be taken directly to the relevant details – the actual data presented in the figure, the statistical models underlying the analyses, more detailed descriptions of study sites or organisms, and possibly many other types of information about the experiment, data collection, equipment used, results, etc. We shouldn’t be constrained by historical limitations of the printed page. We’re not yet even close to where I think we can and should be  going, but making data an integral part of research publication is a huge step in the right direction. So I enthusiastically support journal mandates that require data to be published alongside each manuscript presenting research results. And facilitating this is a core part of Dryad’s mission, which leads me to enthusiastically support both Dryad’s mission and the organization itself!

Pumpkin or apple pie?  

Those are my two favorite pies, so it’s a tough question. If served a la mode, i.e., with ice cream, then I’d most often pick apple pie. But, without ice cream, I’d have to choose pumpkin pie.

Stay tuned for future conversations with industry thought leaders and other relevant blog posts here at Dryad News and Views.

 

How do researchers pay for data publishing? Results of a recent submitter survey

As a non-profit repository dependent on support from members and users, Dryad is greatly concerned with the economics and sustainability of data services. Our business model is built around Data Publishing Charges (DPCs), designed to recover the basic costs of curating and preserving data. Dryad DPCs can be covered in 3 ways:

  1. The DPC is waived if the submitter is based in a country classified by the World Bank as a low-income or lower-middle-income economy.
  2. For many journals, the society or publisher will sponsor the DPC on behalf of their authors (to see whether this applies, look up your journal).
  3. In the absence of a waiver or a sponsor, the DPC is US$120, payable by the submitter.

Our long-term aim is to increase sponsorships and reduce the financial responsibility of individual researchers.

Last year, we launched a pilot study sponsored by the US National Science Foundation to test the feasibility of having a funding agency directly sponsor the DPC. We conducted a survey of Dryad submitters as part of the pilot, hoping to learn more about how researchers plan and pay for data archiving.

Initial survey results

We first want to say a hearty THANK YOU to our participants for giving us so much good information to work with! (10 participants were randomly selected to receive gift cards as a sign of our appreciation). Respondents were located around the world, with nearly all based at academic institutions.

Survey respondents' positions

A word about selection of survey participants. We know that approximately 1/3 of all Dryad data publications do not have a sponsor or waiver, meaning the researcher is responsible for covering the $120 charge. We wanted to learn more about payment methods and funding sources for these non-sponsored DPCs.

We specifically solicited researchers for our survey who had 1) submitted to Dryad in the previous year and 2) paid their Data Publishing Charge directly (via credit card or voucher code). The survey questions focused on a few topics:

  • Grant funding and Data Management Plans
  • Where the money for their Data Publishing Charges ultimately came from, and
  • Whether funding concerns affect their data archiving behavior.

A few highlights are presented below; we intend to dig deeper into the survey results (and other information gathered as part of the pilot study) and report on them publicly in the coming months.

Planning for data in grant proposals

Nearly 72% of respondents indicated that the research associated with their publication/data was supported by a grant. We wanted to know how (or whether) researchers planned ahead for archiving their data in their grant proposals, and the results were enlightening:

  • 43% did not include a Data Management Plan (DMP) as part of their proposal for funding.
  • Of those who did submit a DMP, only about 46% committed to archiving their data as part of that plan.
  • A whopping 96% said they did not specifically budget for data archiving in their proposal.
  • Only 41% were able to archive their data within the grant funding period, while 59% were unable to, or were unsure.

As these results indicate, data management/stewardship is still not a high priority at the grant proposal stage. Even when researchers plan for data deposition, they don’t consider the costs associated. And even if they do (hypothetically) have funding specifically for data, the timing may not allow them to use it before the grant expires.

These factors suggest that if funding agencies want to prioritize supporting data stewardship, they should make funds available for this purpose outside the traditional grant structure.

Show me the money

When submitters pay the Dryad Data Publishing Charge themselves, where does that money come from? Are submitters being reimbursed? If so, how/by whom?

Our results showed that, unfortunately, about a quarter of our participants paid their DPCs out-of-pocket and did not receive any reimbursement. Approximately the same number paid themselves but were reimbursed (by their institution, a grant, or some combination of these), and 37% of DPCs were paid directly by the institution (using an institutional credit card or voucher code).

How was the Dryad DPC paid?

 

Some respondents view self-funding of data publication as worthwhile:

My belief is that scientific data should be publicly available and I am willing to cover the costs myself if supervisors (grant holders) do not.

As long as the cost is reasonable, in the worse case scenario I pay from my pocket. Better the data are safe and easily accessible for years to come than stored in spurious formats and difficult-to-access servers.

But for many others, covering the payment can be a real pain point:

I paid the processing charge myself mainly because our University’s reimbursement process was so laborious, I felt it easier just to get it over and done with myself and absorb the relatively small cost personally.

I just have to beg and plead for funding support each time.

If I am publishing after the postdoc ends then I am no longer paid to work on the project. Since I have had four postdocs, each lasting less than two years, this has happened for all my publications.

Examples from the “other” payment category shown above illustrate the scrappiness of researchers in finding funding:

I paid this from flexible research funds that were recently awarded by my institution. Had that not occurred, I would have had to pay personally and not be reimbursed.

I used my RTF (research trust fund) since I didn’t have dedicated grant funding.

Scavenged money from other projects.

Key takeaways

Our preliminary results show that at a time of more and stronger open data policies, paying for data publication remains far from straightforward, with much of the burden passed along to individual researchers.

Concerns about funding for open data can have real impacts on research availability and publication choice. More than 15% of our participants indicated that they have collected data in the last few years that they have been unable to archive due to lack of funds. Meanwhile, over 40% say that when choosing which journal(s) to submit to, sponsorship of the Dryad DPC does, or at least may, influence their decision.

The good news it that during our 8-month pilot implementation period, the US National Science foundation sponsored nearly 200 Data Publishing Charges for which researchers would otherwise have been responsible.

We at Dryad are committed to finding and implementing solutions, and very much appreciate the feedback and support we receive from the research and publishing community. Stay tuned for more lessons learned.

How can the Dryad repository help researchers’ data management plans?

We encourage individuals and project teams seeking to comply with data management planning mandates to consider Dryad as the destination repository for published data from their research.  Dryad is not only a widely applicable, best-practice solution for research data management, it is also a quick and easy solution!

Research datasets associated with a publication in any biological or biomedical field are welcome in Dryad, regardless of file type. Archived data files may include spreadsheets or other tables, images or maps, alignments, character matrices, etc.

Data files deposited in Dryad are permanently preservedpublicly available with no legal restrictions on re-use, and uniquely identified for attribution.

Data submission is simple, quick, and easy. Data files may be uploaded to Dryad in any file format, with a short README and a few metadata terms.

Finally, using an established best-practice data repository like Dryad facilitates a simple description in a data management plan. For example, grant applicants can use language like this to describe their intention to archive data in Dryad:

We plan to use the Dryad public repository for the long-term preservation and dissemination of data underlying publications from this funded research project. Data submitted to Dryad is made publicly available upon online publication** of the associated article. All data in Dryad is released to the public domain without legal restrictions on reuse, through a Creative Commons Zero waiver. There is a (legally non-binding) expectation of attribution of the Dryad data record and associated article. A one-time data deposit charge is paid by the authors or the associated journals, which allows Dryad data to be available for download without cost to users.

**Researchers may instead choose to stipulate an embargo period of 1 year.

If your funding agency allows it, don’t forget to budget for data preservation (data submission to Dryad is free through 2011).

Data deposited in Dryad can help researchers meet these policies and expectations:

  • the (US) National Science Foundation requires that data management plans include provisions for data archiving and preservation, and access policies and provisions for secondary use
  • the Wellcome Trust “expects all of its funded researchers to maximise the availability of research data with as few restrictions as possible”
  • the (US) National Institutes of Health data sharing policies state that “Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”
  • the (UK) Medical Research Council policy on data sharing and preservation states: “Where possible, published results should include links to the associated data. Investigators must show how data will be preserved and their strategies for sharing, e.g. by depositing it in a community database.”

Summaries of funding agencies’ data policies can be found here:

Resources on data management & sharing:

Questions about the role of the Dryad repository in data management planning can be directed to the Dryad team.

Sample data file, Gilbert J and Manica A (2010) Data from: Parental care trade-offs and life history relationships in insects. Dryad Digital Repository. doi:10.5061/dryad.1451

NSF policy on dissemination and sharing of research results

The US National Science Foundation (NSF) has released its revised policy on Dissemination and Sharing of Research Results.

Starting January 18, 2011, NSF grant proposals must include a data management plan to describe “how the proposal will conform to NSF policy on the dissemination and sharing of research results.”  Data management plans will be reviewed with the grant application by program officers and peers, and implementation (or lack thereof) may influence subsequent award decisions.

The revised Grant Proposal Guide suggests several items for inclusion in a project’s data management plan:  an inventory of research output the project will create, standards applied for describing and storing the data, policies for sharing, provisions for reuse, and plans for preservation.  This is helpful, but very high-level.

Luckily, the NSF and several Directorates have provided supplementary documents with much more detail on expectations of the NSF in general, and individual Directorates in particular.  The Directorate Guidance documents provide a variety of suggestions (and sometimes requirements), including definitions about what is considered “data”, when the data needs to be made available, and what types of sharing or archive locations are appropriate.  As intended, these guidelines differ between Directorates, reflecting a variety of community norms.

Let’s look at expectations for timeliness of data availability, as a specific example.  The general FAQ states, “the expectation is that all data will be made available after a reasonable length of time,” where “what constitutes a reasonable length of time will be determined by the community of interest through the process of peer review and program management.”  The FAQ further suggests that one reasonable standard is to make data accessible immediately upon study publication.  The ENG (Engineering) guidance recommendation mirrors this.  The expectation of the OCE (Ocean Sciences) is different:  data should be submitted as soon as possible, but no later than two years after collection, with more stringent requirements for some programs.  Using yet a different milestone, the SES (Social and Economic Sciences) suggests that quantitative social and economic datasets be submitted within one year of the expiration of the grant award.  These concrete expectations will clearly assist investigators writing data management plans, and provide a common ground for reviewers.

In several places, the documents explicitly mention that what constitutes an acceptable plan is expected to evolve, as standards, technologies, resources, and community norms change over time.

Nicely done, NSF.

Note:  The Directorate for Biological Sciences has not issued a guidance as of this writing.

Update: The guidance from the Directorate for Biological Sciences was issued June 15, 2011.

For more information:

January 2011 Policy

Commentary and related documents