Community update, July 2022

Here are a few updates from Dryad that we hope folks will find of interest. If you have any questions or comments, please don’t hesitate to contact us, via hello [at] datadryad [dot] org.


First up: Our long-time Product Manager Daniella Lowenberg is taking on an exciting new appointment at the U.S. Department of Health and Human Services as Senior Advisor for Data Governance and will be leaving Dryad later this month. In her new role, Daniella will lead the development of strategy and systems to support public and restricted access to human services data. For more details about her assignment, please check out the University of California announcement here. While we’re sorry to lose her, we’re also so excited for Daniella to take on this amazing opportunity.

It’s difficult to overstate the extent of Daniella’s contributions across our organisation; I know she’s been a valuable, responsive, and knowledgeable collaborator for many of you, as well as a driving force behind data publishing and standards-setting over the last several years. Her loss will be felt deeply – as deeply as her energy, character and expertise have infused Dryad since 2018 and will energise us as we move forward. Please join me in congratulating Daniella on her fabulous new role. She can be reached via Daniella.Lowenberg [at] ucop [dot] edu.

Daniella leaves us in style – posting just last week the outcomes of her collaboration with data scientist and ecologist Karthik Ram and plans to make Dryad more data science friendly. We’ll be improving data quality at submission, considering substantial changes to the interface, and exploring feature sets around file manifests, tabular file previews, rendered READMEs, README templates, and much more.  

Welcome to new team & community members

Many of you will now have met Mark Kurtz, Dryad Head of Business Operations, who joined us in March. Not one for fanfare, Mark didn’t want us to press-release his joining the team, but we must say how thrilled we are to have him on board and what a difference it’s made to have such a skilled and experienced operator on hand. You can learn a little more about Mark on our team page. We’re soon to be joined by a new Senior Full Stack Developer and a Head of Partnership Development (for which we’re still inviting applications). We hope to announce all the new members of the team (including Mark!) in the Autumn. 

Dryad is also pleased to welcome a number of new members to our growing community: the Australian Wine Research Institute (AUS); Hindawi (UK); Northwestern University (USA); Rockefeller University (USA); University of Rochester (USA); and the U.S. Fish & Wildlife Service (USA)

NIH GREI Initiative

At the beginning of the year, the U.S. National Institutes of Health (NIH) Office of Data Science Strategy announced the Generalist Repository Ecosystem Initiative (GREI) which we are pleased to be a part of. We’re working with five other generalist repositories “to establish consistent metadata, develop use cases for data sharing, train and educate researchers on FAIR data and the importance of data sharing” and look forward to working closely with NIH in preparation for the updated  NIH Data Sharing and Management Policy roll-out in 2023. 

Bits and bobs

Catching up after COVID, we’ve now released our Annual Report for the fiscal year 2021 (FY21) (summer 2020 to summer 2021). FY22 is coming soon.

And – finally – at a Database Sustainability Symposium hosted by Phoenix Bioinformatics in March, Jen spoke about Dryad’s community of support, our commitment to the Principles for Open Scholarly Infrastructure (POSI) and what I’ve come to grasp about our 15-year history. If you’re interested, take a look.

Making Dryad More Data Science Friendly

Daniella Lowenberg & Karthik Ram

As we enter year three of the pandemic, it has become clear that many aspects of our lives have permanently changed. Travel and fieldwork, especially in remote locations, were never easy to begin with. Now, these efforts have become much more challenging to organize and execute, serving as a constant reminder that the data we collect must be carefully curated and reliably made available to future researchers. 

The Dryad team has been making steady improvements to platform infrastructure for many years, kicking off with the  CDL partnership and platform re-launch in 2019 . Through various outdoor meetings during lockdowns, we explored various ways to make Dryad even more researcher friendly, especially in the context of data quality and data reuse. The last years of Dryad integrations have been so heavily focused on submission in line with goals to increase awareness and feasibility of publishing data: publisher integrations, integration with Zenodo for software and supplementary information, tabular data checks with Frictionless data. These integrations have been greatly powerful and necessary for supporting research data publishing. Now, it’s time to focus that level of investment on researcher reuse of Dryad datasets.

In Q2 of 2022, we carried out a detailed analysis of the Dryad corpus and the API. Dryad hosts more than a million data files across over 48,000 data publications. Tabular data files (csv, tsv, and Excel) make up at least 30% of the submission (far more are in compressed files), followed by various image formats, and miscellaneous supporting files (scripts, notes, and readme files). At least 13% of files are opaque zipped files that contain collections of tabular or fasta files. Usage instructions were sparse and README files historically were poorly structured. 

In 2021 Dryad partnered with the Frictionless project to run data validation across all new submissions. An analysis of 46,823 tabular files revealed that 85% of the files didn’t have any obvious validation issues, 10% with problems, and 4% with more serious errors. Dryad continues to run Frictioness validation during the submission process but doesn’t yet enforce compliance before submission. 

From these results and from listening to various research communities it’s clear that with any data publisher, and especially with Dryad, the value needs to lie in the usability of published datasets. Dryad has put a plan in place to improve data quality at submission, a time when researchers are best equipped to address any problems with their datasets. We have also put a plan in motion to make substantial changes to the API and the interface. In the future, we will explore feature sets around file manifests, tabular file previews, rendered READMEs, README templates, and much more.  

The last decade has proved that it’s possible to get mass adoption of researchers to comply with open data policies: tossing their data over a wall to the repository, including a data availability statement (rarely with a data citation – insert Daniella’s many rants some of which are available here), and feeling like they’ve met the mandate. But at what point is this useful? It’s not if the data aren’t being reused and especially if the data are not able to be reused. 

Dryad’s mission remains to advance scientific discovery through curated open data access and driving this forward we will be focusing on feature sets centered on: reusability, machine usability, and pluggability. This includes aligning with popular data science tools, educating researchers along the submission process with more complex checks and automated tooling for quality, and rethinking how users access and compute with data published in Dryad. 

As the adoption of executable notebooks becomes more mainstream in the research community, Dryad is committed to meeting these researchers where they are headed, with a data-science-friendly research repository.