Feeds:
Posts
Comments

As a non-profit repository dependent on support from members and users, Dryad is greatly concerned with the economics and sustainability of data services. Our business model is built around Data Publishing Charges (DPCs), designed to recover the basic costs of curating and preserving data. Dryad DPCs can be covered in 3 ways:

  1. The DPC is waived if the submitter is based in a country classified by the World Bank as a low-income or lower-middle-income economy.
  2. For many journals, the society or publisher will sponsor the DPC on behalf of their authors (to see whether this applies, look up your journal).
  3. In the absence of a waiver or a sponsor, the DPC is US$120, payable by the submitter.

Our long-term aim is to increase sponsorships and reduce the financial responsibility of individual researchers.

Last year, we launched a pilot study sponsored by the US National Science Foundation to test the feasibility of having a funding agency directly sponsor the DPC. We conducted a survey of Dryad submitters as part of the pilot, hoping to learn more about how researchers plan and pay for data archiving.

Initial survey results

We first want to say a hearty THANK YOU to our participants for giving us so much good information to work with! (10 participants were randomly selected to receive gift cards as a sign of our appreciation). Respondents were located around the world, with nearly all based at academic institutions.

Survey respondents' positions

A word about selection of survey participants. We know that approximately 1/3 of all Dryad data publications do not have a sponsor or waiver, meaning the researcher is responsible for covering the $120 charge. We wanted to learn more about payment methods and funding sources for these non-sponsored DPCs.

We specifically solicited researchers for our survey who had 1) submitted to Dryad in the previous year and 2) paid their Data Publishing Charge directly (via credit card or voucher code). The survey questions focused on a few topics:

  • Grant funding and Data Management Plans
  • Where the money for their Data Publishing Charges ultimately came from, and
  • Whether funding concerns affect their data archiving behavior.

A few highlights are presented below; we intend to dig deeper into the survey results (and other information gathered as part of the pilot study) and report on them publicly in the coming months.

Planning for data in grant proposals

Nearly 72% of respondents indicated that the research associated with their publication/data was supported by a grant. We wanted to know how (or whether) researchers planned ahead for archiving their data in their grant proposals, and the results were enlightening:

  • 43% did not include a Data Management Plan (DMP) as part of their proposal for funding.
  • Of those who did submit a DMP, only about 46% committed to archiving their data as part of that plan.
  • A whopping 96% said they did not specifically budget for data archiving in their proposal.
  • Only 41% were able to archive their data within the grant funding period, while 59% were unable to, or were unsure.

As these results indicate, data management/stewardship is still not a high priority at the grant proposal stage. Even when researchers plan for data deposition, they don’t consider the costs associated. And even if they do (hypothetically) have funding specifically for data, the timing may not allow them to use it before the grant expires.

These factors suggest that if funding agencies want to prioritize supporting data stewardship, they should make funds available for this purpose outside the traditional grant structure.

Show me the money

When submitters pay the Dryad Data Publishing Charge themselves, where does that money come from? Are submitters being reimbursed? If so, how/by whom?

Our results showed that, unfortunately, about a quarter of our participants paid their DPCs out-of-pocket and did not receive any reimbursement. Approximately the same number paid themselves but were reimbursed (by their institution, a grant, or some combination of these), and 37% of DPCs were paid directly by the institution (using an institutional credit card or voucher code).

How was the Dryad DPC paid?

 

Some respondents view self-funding of data publication as worthwhile:

My belief is that scientific data should be publicly available and I am willing to cover the costs myself if supervisors (grant holders) do not.

As long as the cost is reasonable, in the worse case scenario I pay from my pocket. Better the data are safe and easily accessible for years to come than stored in spurious formats and difficult-to-access servers.

But for many others, covering the payment can be a real pain point:

I paid the processing charge myself mainly because our University’s reimbursement process was so laborious, I felt it easier just to get it over and done with myself and absorb the relatively small cost personally.

I just have to beg and plead for funding support each time.

If I am publishing after the postdoc ends then I am no longer paid to work on the project. Since I have had four postdocs, each lasting less than two years, this has happened for all my publications.

Examples from the “other” payment category shown above illustrate the scrappiness of researchers in finding funding:

I paid this from flexible research funds that were recently awarded by my institution. Had that not occurred, I would have had to pay personally and not be reimbursed.

I used my RTF (research trust fund) since I didn’t have dedicated grant funding.

Scavenged money from other projects.

Key takeaways

Our preliminary results show that at a time of more and stronger open data policies, paying for data publication remains far from straightforward, with much of the burden passed along to individual researchers.

Concerns about funding for open data can have real impacts on research availability and publication choice. More than 15% of our participants indicated that they have collected data in the last few years that they have been unable to archive due to lack of funds. Meanwhile, over 40% say that when choosing which journal(s) to submit to, sponsorship of the Dryad DPC does, or at least may, influence their decision.

The good news it that during our 8-month pilot implementation period, the US National Science foundation sponsored nearly 200 Data Publishing Charges for which researchers would otherwise have been responsible.

We at Dryad are committed to finding and implementing solutions, and very much appreciate the feedback and support we receive from the research and publishing community. Stay tuned for more lessons learned.

As the new year begins, we take note of the increasing diversity of fields represented in data archived at Dryad and review the numbers for 2016.

Dryad Grows into a General Repository

We are excited to see Dryad’s role in the preservation of data expand into new areas and fields in 2016. Researchers submitted more data involving human subjects and data from social media. In addition, a quick look at our most popular data shows that two of the top five downloaded packages were from the fields of cardiology and science journalism. While Dryad’s origins are in the life sciences, it is increasingly being used as a general repository for data from a myriad of fields.

Let’s take a look at the numbers for 2016:

Increase in Number of Data Packages and Data Files

Our curators were busy! The total number of published data packages (sets of data files associated with a publication) at the end of the year was a whopping 15,325. Our curators meticulously archived 4,307 packages, a 10% increase from 2015. The size of data packages also continued to grow – from an average of 481MB to an average of 573MB, an increase of about 20%.summary of Dryad data packages 2016

At the end of 2016, we were closing in on 50,000 archived data files; by January of this year, we passed that mark.

In a future blog, we’ll talk about the integration of new journals into the Dryad submission process, new members, and new partnerships. For now, we’ll just note that there was a 22% increase in the number of journals that have data in Dryad linking back to the article.

New Fields

We’ve seen a significant uptick in human subjects data and social media data this year, which has prompted us to develop an FAQ on cleaning and de-identification of human subjects data for public access. As the idea of what data should be preserved continues to broaden, submissions of these kinds of data will only increase. We’ll keep you updated about this trend in future blogs.

Top Downloads

Let’s take a look at the most popular data published in 2016, in terms of downloads. Among the top 5 downloads includes data on plant genetics, the early history of ray-finned fishes, and, not surprisingly in this age, the effects of climate change on boreal forests.

Also of interest are data from an article in Science evaluating how people make use of Sci-Hub, an open source scholarly library. Our guest blog on these data by science journalist John Bohannon generated a lot of interest this year and was one of our most popular blog posts ever.

Another significant development in 2016 came from the medical sciences. A comparison of coronary diagnostic techniques marked Dryad’s first submission from one of the top five cardiology journals, JACC: Cardiovascular Interventions.

The fact that 2 of the 5 top downloads come from fields outside of life sciences clearly indicates that data in Dryad now cover a broad range of fields.

Top 5 Downloads of Data Archived in 2016

Article Dryad DOI Number of Downloads
Wagner MR et al. (2016) Host genotype and age shape the leaf and root microbiomes of a wild perennial plant. Nature Communications 7: 12151. http://doi.org/10.5061/dryad.g60r3 3123
Bohannon J et al. (2016) Who’s downloading pirated papers? Everyone.  Science 352(6285): 508-512. http://doi.org/10.5061/dryad.q447c 2969
D’Orangeville L et al. (2016) Northeastern North America as a potential refugium for boreal forests in a warming climate. Science 352(6292): 1452-1455. http://doi.org/10.5061/dryad.785cv 741
Johnson NP et al. (2016) Continuum of vasodilator stress from rest to contrast medium to adenosine hyperemia for fractional flow reserve assessment. JACC. Cardiovascular Interventions 9(8): 757-767. http://doi.org/10.5061/dryad.f76nv 453
Lu J et al. (2016) The oldest actinopterygian highlights the cryptic early history of the hyperdiverse ray-finned fishes. Current Biology 26(12): 1602–1608. http://doi.org/10.5061/dryad.t6j72 423

Overall, we’ve had a great year and are delighted to be seeing a broader range of data from an increasing number of journals and fields. Thanks to our Board of Directors, members, and of course our staff for providing their support to make 2016 a notable year for Dryad!

We’re beginning a series highlighting researchers who use Dryad to openly publish their research data. We ask them about their current projects, why they believe in open science, and why they choose Dryad.

photo of Zach Gompert

Zach Gompert

For our first researcher profile, we talked with Dr. Zach Gompert, assistant professor in the Department of Biology at Utah State University, about how his work ties in with open science:

Dryad: What is your area of research and what’s your current focus?

Gompert: The overarching goal in my lab is to advance understanding of the extent, organization, causes, and consequences of variation in nature. Some of the issues were are investigating are:

  • What are the evolutionary consequences of hybridization?
  • How does the evolution of novel ecological interactions affect biodiversity?
  • Is temporal variation in natural selection a key determinant of genetic diversity levels in natural populations?

We address these questions through population genomic analyses of natural and experimental populations, and through development of new theory and statistical methods. Our work on Lycaenid butterflies shows that hybridization can be a key creative force in animal evolution and that evolutionary histories are not always well represented by the ‘evolutionary tree’ metaphor. In other words, lineages don’t just split, they come back together.

We have quite a few datasets in Dryad now, including partial genome sequences from over a thousand butterflies.

butterfly in field

Lycaeides melissa

Dryad: What do you think about open science in general? What are advantages of open science? 

Gompert: Science has always been a communal endeavor. Large-scale collaboration is vital now for a number of reasons:

  • Diverse expertise. Many key questions require a diverse group of investigators. This results in big, multifaceted datasets and necessitates rapid sharing of data, methods, and findings.
  • Re-purposing data. It’s common now for data and methods to have applications beyond those that they were originally collected or developed for. Open science allows these to be used by other investigators, accelerating the rate of discovery.
  • Data integrity. Openness ensures a higher level of quality and integrity. When data and methods are available for scrutiny, possible errors are more likely to be identified and corrected. This is particularly relevant for large-scale, multi-investigator projects.
  • Public funding and access. Since much of science is funded by the public, I think scientists have an ethical duty to make the products of research available to everyone.

Dryad: In your opinion, what are disadvantages or concerns about open science?

Gompert: There are two common concerns:

  • Getting scooped. Researchers can be scooped if another group analyzes and publishes the data they generated. While this has some validity, sufficient safeguards and community standards are in place to minimize this problem, and it’s minor compared to the advantages of openness.
  • Poor documentation. I think data archiving is in better shape than it once was, but much of archived data or code are not sufficiently documented to truly be useful to others. Enhancing documentation of data is a big area where we as a community need to do more.

Dryad: You have over 20 datasets archived in Dryad. What do you see as the benefits of data sharing in Dryad?

Gompert: The primary strength of Dryad is its flexibility, specifically the ability to archive diverse types of data (and computer code) in a single location and to link to other more specialized databases such as NCBI. With Dryad, researchers have a central location where they can find all of the data associated with a publication.

We’re coming off of a big month which included a two-day Dryad board meeting, International Data Week in Denver, and the Open Access Publishers meeting (COASP) in Arlington, VA. Combined with Open Access Week, we’ve been basking in all things #openscience at Dryad.

International Data Week 2016

International Data Week was a collection of three different events: SciDataCon 2016International Data Forum, idwlogoand the 8th Research Data Alliance Plenary Meeting. While it was my first time attending RDA and SciDataCon, it wasn’t the first time for the many Dryad board members who have been actively participating in these forums for years.

Dryad staff had the pleasure of participating in a few panels over the week. As part of SciDataCon, Elizabeth Hull discussed protecting human subjects in an open data repository. In another, as part of the RDA 8th Plenary, I participated in a discussion of the challenges surrounding sustainability of data infrastructure. (The talk is available on the RDA website. The panel starts at minute 30).

29822088326_6d9db25bbf_qParticipating in IDW reminded me how important our diverse community of stakeholders and members are to furthering the adoption of open data. Dryad members create a community and support our mission. Our members benefit by receiving discounts on data publication fees and by relying on a repository that stays current in the evolving needs and mandates that surround open data. We work together to help make open data easy and affordable for authors.

Asking OA publishers to be more open

Following International Data Week, I had the opportunity to participate for the first time in the Open Access Scholarly Publishers Association meeting, COASP 2016. Heather Joseph, Executive Director of SPARC kicked off the meeting with a keynote that urged attendees to consider how they would complete the phrase “Open in order to . . .” as a way to ensure that we all keep our sights on working toward something more than just ‘open for the sake of open’. Some of other memorable talks addressed the challenges with mapping connections from articles to other related outputs, and discussed the growing interest in alternative revenue models to article processing charges (APCs). I had the privilege to deliver a keynote entitled “Be More Open” which highlighted the connections between Open Access and Open Data movement, and I encouraged OASPA to add open data policies to their membership requirements.

I’d like to thank the organizers and sponsors of International Data Week and COASP 2016 for making these important conversations possible. In addition, I would also like to encourage any interested stakeholders to join Dryad and support open data.

Change_In_Hand

We are pleased to have received a Sustaining Award from the U.S. National Science Foundation.  Sustaining Awards are an innovative proposal track, developed within NSF’s Advances in Bioinformatics program, that provides “limited support for the cost of ongoing operations and maintenance of existing cyberinfrastructure that is critical for the continued advance of priority biological research.”

The award  is to the University of North Carolina at Chapel Hill with Dryad as a subawardee. The grant provides approximately $762K in funding over three years (starting 1-Sep-2016).

From the abstract:

This award will enable Dryad to achieve the scale required for sustainability through continued growth and extension to new research communities. At the same time, it will enable the continued growth of the repository’s valuable collection of diverse and high-quality data for research and education.

The full project description is publicly available and more information about the award is at the NSF Funding Database.

We are grateful to NSF, who have generously supported the Dryad Digital Repository since its inception in 2008, including a recently funded small-scale pilot study to explore direct sponsorship of data publication charges.

 

One of the most rewarding things about working for Dryad is collaborating with talented and passionate professionals from across the globe who are dedicated to increasing the availably of open data. This summer, two new people were officially elected to serve on Dryad’s Board of Directors and we are excited to have them our governance team.

linJennifer Lin, Director of Product Management at Crossref, comes to us with lots of experience in product development and management, community outreach, scholarly communications, and more. Based in California, USA, Jennifer was instrumental in helping Dryad integrate our data submission system with PLOS journals during her tenure there. She is a data sharing evangelist, and passionate about tools for making data reusable and discoverable. We are thrilled to have her direct her energy and enthusiasm Dryad’s way.

nilssonJohan Nilsson is also new to the Dryad board and comes from the Oikos Editorial Office, a society-owned publishing foundation based at Lund University, Sweden. Johan’s past work has been as a research scientist in evolutionary ecology. He has a strong interest in scientific communication and social media engagement and focuses particularly on how the benefits of open science (and open data in particular) can be better expressed to researchers. We value his expertise and perspective into how Dryad can best serve its users.

dilloWe would be remiss if we didn’t also publicly welcome Ingrid Dillo, who was appointed to the board early in 2016. Ingrid is deputy director at DANS (Data Archiving and Networked Services). She holds a PhD in history and has a long record of policy development at DANS, the National Library of the Netherlands and Dutch Ministry of Education, Culture and Science. She is especially interested in research data management and the certification of trustworthy digital repositories. We are already relying on Ingrid’s expertise and learning from her work with groups like the Research Data Alliance.

Candidates to Dryad’s 12 member Board of Directors are nominated by Member organizations, and four of the Directors are elected or re-elected every year. Once on the Board, Directors serve as individuals rather than organizational representatives. The 12-member rotating Board aims for both diversity of perspective and depth of expertise. We are delighted to have achieved both with our new Directors. We welcome them onboard and wish to extend a heartfelt thanks to Directors past, present, and future for their contributions and dedication to Dryad’s mission.

whopays

The question of who should pay for the preservation and stewardship of open research data remains unresolved, at a time when journals and funders alike are adopting strong open data policies. As a non-profit repository that relies on financial support from members and users, we at Dryad deal with this question daily, and are eager to help find new and sustainable solutions.

Along these lines, if you submit your data to Dryad, you will soon notice that we will ask for information about your grant support. That’s because we’re running a pilot project with the US National Science Foundation (NSF) to test the feasibility of having a funding organization directly sponsor Data Publication Charges (DPCs).

During this pilot implementation, if your research was supported by a grant from the US NSF, and your DPC would not otherwise be waived or sponsored by another organization, this grant information can be used to charge the DPC directly to a fund set aside as part of this project.

nsf_flowchart

Entering grant information at data submission is optional. Nonetheless, we encourage researchers to fill out the funding information in order to benefit from NSF funds, enable awardees to receive credit from their institutions and funders for the open availability and reuse of the data, and to promote its discoverability.

Direct funder sponsorship of data archiving has some significant features:

Researchers also stand to benefit — they have an interest in seeing their data responsibly curated and preserved, even if they publish and archive data after their grant funds have expired.  And we are excited by the prospect of increasing the proportion of data packages for which the DPC is sponsored or waived (which is currently just over 2/3).

We aim to work out the details of achieving the goals above, and to evaluate any downsides, as part of the pilot. We will also be surveying researchers to better understand what happens when data is not sponsored by a payment plan. From that, we will be able to develop recommendations for what Dryad, funding organizations, and institutions can do to facilitate the DPC payment process for researchers.

We are grateful to the NSF Advances in Bioinformatics program for the supplemental funding behind this project, and we hope that many researchers will take advantage of the opportunity to have their DPC covered by the NSF funds, which will be available at least through February 2017.  Please let me know (at director@datadryad.org) if you have any questions or feedback!