Metadata Game Changers, Stanford University, and Dryad receive NSF Funding to improve metadata quality and connect repositories

Metadata Game Changers, the Center for Expanded Data Annotation and Retrieval (CEDAR) at Stanford University, and Dryad are thrilled to announce their joint National Science Foundation two-year EAGER award focused on increasing the quality of disciplinary metadata and bridging the gap between generalist and disciplinary data repositories. 

Increasing the quality of metadata for past and future datasets requires interoperable, seamless workflows. This team will be working together to pilot approaches that increase the quality of datasets deposited into the Dryad data repository while streamlining the submission processes. Examples of these pilot projects include displaying the right scientific metadata schema for datasets depending on standard fields in Dryad, as well as using metadata fields to flag datasets that would be best fit at a disciplinary repository (and piloting integrations with those repositories).

The approach will rely on the ability of the CEDAR technology to acquire and encode standardized metadata for different scientific communities, using established reporting guidelines for different classes of experiments and standard terms for authoring  metadata.   Much of this work also depends on developing learning algorithms based on the last 40,000+ published datasets in Dryad.

“Existing metadata are an exciting learning set for discerning patterns that can be used to streamline the metadata creation process. We want to  ensure that data are being submitted to the best fit repository with the right metadata,” said Ted Habermann.

The goal of these pilot explorations is to understand and build interoperable, open source approaches for repositories to use to improve the quality of metadata and datasets published and begin to find ways that disciplinary repositories and general repositories can collaborate to more effectively support the research community.. 

The work ahead will be led by Principal Investigator Dr. Ted Habermann and the team welcomes feedback or ideas from the community. For more information, follow along at our blogs (Metadata Game Changers, Dryad, and CEDAR) or get in touch at ted@metadatagamechangers.com.

COVID Tracking Project Data Now Available in Dryad

Following on the news of The COVID Tracking Project at the Atlantic (CTP)’s collaboration with UCSF and California Digital Library, Dryad is proud to announce our partnership with CTP to provide an accessible, citable, and long-term home for the data that has guided policy and expanded the capacity of the medical and scientific community to respond to COVID-19. Over the course of the pandemic, CTP tracked and made available national testing data for more than two months, when these counts were otherwise not available, and other factors necessary to guide the medical community to better understand the epidemiology and public health impacts of COVID-19. CTP was the main organization to have compiled data from all the U.S. states, as opposed to using federal or county data. These unique data— reported from every state March 7th, 2020 to March 7th, 2021— are now available in Dryad

The Chan Zuckerberg Initiative (CZI), a philanthropy that is leveraging technology, community-driven solutions, and collaboration to build a more inclusive, just, and healthy future, was an early supporter of the CTP.

“In a time of unprecedented uncertainty, the volunteers at the COVID Tracking Project provided consistent, timely, and meaningful data on the direction of the pandemic. CZI was proud to support their comprehensive COVID-19 racial data tracker, and ultimately, their efforts to derive lessons that will help navigate future public health crises,” said Kishore Hari, Community Engagement Strategist at the Chan Zuckerberg Initiative. “We are thrilled to see this project archived at Dryad, UCSF, and California Digital Library, allowing researchers across the world to continue exploring the data and organizational records amassed by this incredible team.”

As the world continues to recognize the importance of data-driven and evidence-based research and policy, it’s become increasingly clear that broad access to data is essential to the advancement of research and clinical practice. Access includes having robust metadata to understand and reuse the data, in accessible file formats, and with the assurance of long-term preservation. 

CZI is a member of the Dryad community, covering the costs of curation and preservation for their grantees and supporting them in following best practices for open data. 

“Open infrastructures for scholarly outputs is a critical component of an open, reproducible, and verifiable scientific ecosystem, and CZI is proud to support our grantees like The COVID Tracking Project with a place to store, preserve, discover, and link research datasets,” said Carly Strasser, Open Science Program Manager at the Chan Zuckerberg Initiative.

Acknowledging shared values as community-driven organizations, CTS and Dryad found great similarities in approaches to making data openly available and reusable. By hosting these data, Dryad enables users around the world to analyze, cite, and build on these COVID counts that otherwise would not have been available during the first year of the pandemic. 

The COVID Tracking Project at the Atlantic’s Amanda French said, “We’re publishing COVID Tracking Project data in Dryad for two main reasons: to provide an authoritative citable version of the data we compiled from U.S. states in the first year of the pandemic and to make sure that the data will be available long after our website and GitHub organization have gone away. There are many copies of CTP’s data all around the internet in different versions (some of which probably have different values from each other), so now that our crisis response organization has largely disbanded, it’s important to us to know that a thoroughly vetted version of the critically important U.S. state COVID data that hundreds of concerned citizens helped to compile has a permanent home where researchers in all kinds of fields can find and use it”.

As CTP continues to wind down and further datasets are curated, they will continue to become available in Dryad.

Announcing Dryad & eLife’s seamless data publishing integration

Crossposted at “Inside eLife”

eLife and Dryad have long supported making research publicly accessible and reusable. Over the last years, Dryad has increasingly curated and published datasets supporting eLife publications. As the open science landscape continues to evolve, with a growing emphasis on best practices and making all research components openly available, both organizations recognize that the workflows need to be simplified. Working with eJournalPress, eLife and Dryad are pleased to announce Dryad’s first platform-based integration, allowing authors to deposit datasets to Dryad seamlessly through eLife’s submission process.

As authors submit research to eLife, they will be prompted about data availability during the full submission. Authors are welcome to deposit their data to any suitable disciplinary repository and, if data do not yet have a home, authors will have the opportunity to upload their data to Dryad.

By clicking on the button to submit data, relevant metadata from the manuscript submission will auto populate into Dryad’s form. Authors will be able to edit metadata, add additional metadata specific to the dataset, upload data (up to 300GB) to be curated, and upload any software or supplemental information that will be published at Zenodo.

After finishing the dataset submission, authors will be automatically brought back into the eJournalPress platform with the dataset DOI and citation filled in. If authors choose to keep their data private during the peer review process, the access URL will be included here as well. Dryad will status-call eLife to understand when the related manuscript has been accepted and then automatically release the dataset to be curated and published.

Removing barriers to publishing includes removing outstanding costs, and eLife will continue to support its authors in publishing their data by covering the costs of data submissions to Dryad. We are very pleased to better support our joint and growing research communities in making open access article and data publishing workflows more accessible and see this work as an important step towards improving the reusability and reproducibility of research.

Dryad’s Enhanced Features for Data + Related Research Objects

At Dryad we continue to focus on the two key pillars of data publishing: curation of data as well as seamlessly easy workflows. We are committed to designing solutions centered on meeting researchers’ needs for easy and responsible data publication. Understanding that research data is one component of open science, we have prioritized partnerships and integrations that allow for Dryad to continue focusing on curated research data while providing support for non-data objects submitted in tandem. In February, we launched the first of our Zenodo integrations allowing for software related to Dryad datasets to be published with proper license and citation. Soon after, we got moving on collaboration with one of our new partners, Open Knowledge Foundation and the Frictionless Data team, to explore automated data quality checks. Thinking through the evolving needs of researchers in publishing their data and other works, as well as our work ahead with Frictionless Data and Zenodo, we recognized the need to modernize and upgrade our submission interface and underlying technologies.

We are excited to share our enhanced upload features at Dryad

Dryad’s new interface, using a React framework, combines all types of uploads on one page, making it clear what types of files researchers would like to submit and which ones will be triaged and published at Zenodo. This allows for Dryad to now accept multiple types of related works for submitted datasets: software and now supplementary information. Because of our deep roots with publishers, we have long accepted supplementary files like figures and non-data that do not require or are not applicable for curation or a CC0 license. We are thrilled to be able to better support these submissions, in addition to code and software that are related to Dryad datasets.

Users are now welcome to upload data, or data plus any combination of software and/or supplementary information

Like with our previous release, these related files are queued up at Zenodo, available in our private for peer review access URL, and published in conjunction with the dataset

And the related identifiers are automatically added to the related works on the Dryad landing page as well as further linked up in our metadata that we send to DataCite

These enhancements allow for us to immediately support our submitting authors, but also allows for us to build on a more nimble framework for our future feature releases: we are busy developing our integration with Frictionless Data to auto-validate tabular data files submitted to Dryad for curation, and we are gearing up for our summer of journal integrations with eJournalPress and Editorial Manager.

As always, if you have any feedback or feature ideas that we should consider for supporting best practices in data and software publishing, please get in touch. 

Doing it Right: A Better Approach for Software & Data

The Dryad and Zenodo teams are proud to announce the launch of our first formal integration. As we’ve noted over the last years, we believe that the best way to support the broad scientific community in publishing their outputs is to leverage each other’s strengths and build together. Our plan has always been to find ways to seamlessly connect software publishing and data curation in ways that are both easy enough that the features will be used but also beneficial to the researchers re-using and building on scientific discoveries. This month, we’ve released our first set of features to support exactly that.

Uploading to Zenodo Through Dryad

Researchers submitting data for curation and publication at Dryad will now have the option to upload code, scripts, and software packages on a new tab “Upload Software”. Anything uploaded here will be sent directly to Zenodo. Researchers will also have the opportunity to select the proper license for their software, as opposed to Dryad’s CC0 license.

The Dryad upload form now includes an option to upload code files that will be triaged and sent to Zenodo

Those familiar with Dryad may know that Dryad has a feature to keep datasets private during the peer review period, with a double blind download URL that allows for journal offices and collaborators to access the data prior to manuscript acceptance. Zenodo hosted software will be included in this private URL and will be held from the public until the dataset is ready to be published.

Before submitting researchers are able to preview all uploaded files 
Private for Peer Review link allows for auto download of the Dryad data as well as the software files in Zenodo

After curation and publication of the dataset, the Dryad and Zenodo outputs are linked publicly on each landing page and indexed with DataCite metadata. Versioning and updating of either package can happen at any time through the Dryad interface.

Published dataset at Dryad prominently allows researchers to navigate to and download code files from Zenodo
Software package is downloadable at Zenodo, with proper license, and linked to the Dryad dataset

Elevating Software

Throughout our building together, we worked with researchers across scientific disciplines to both test the look and feel of the features but also to understand how data and software are used together. Through conversations with folks at Software Sustainability Institute (SSI), rOpenSci, Research Software Alliance (ReSA), US Research Software Sustainability Institute (URSSI) and leaders in the software citation space, we understood that while researchers may not always think of their R or Python scripts as a piece of software, integrations like this are essential to elevate software as a valued, published, and citable output. 

“This work between the organizations represents a massive win for open science and reproducibility. Besides the lack of incentives to share, a significant source of friction for researchers is the burden of preparing research artifacts for different repositories. By simplifying this process and linking research objects, Dryad and Zenodo are not only making it easier to share code and software, but also dramatically enhancing discoverability and improving data and software citation.”
– Karthik Ram, Director of rOpenSci & URSSI lead

Looking Forward

This release is the first set of features in our path ahead working together to best support our global researcher base. While we are building feature sets around Supporting Information (non-software and non-data files) for journal publishers, we know that this space is evolving quickly and our partnership will respond to both the needs of researchers as well as the development of best practices from software and data initiatives. We will keep the community apprised of our future developments and we are always looking to expand our reach and iterate on what we’ve built. If you believe there are ways that Dryad and Zenodo better support research data and software publishing, please get in touch.

Dryad & Zenodo: Our Path Ahead

In July, 2019 we were proud to announce a funded partnership between Dryad and Zenodo. Today, we are excited to give an update on our future together. 

Dryad and Zenodo have both been leading the way in open-source data, software, and other research outputs publishing for the last decade. While our focus and adoption mechanisms may have been different, we’ve had similar values and goals all along: publish and archive non-traditional research outputs in an open and accessible way that promotes best practices. 

In looking to expand our capacities for sharing data and software, it became clear that we could each benefit from the other’s expertise. Dryad has long focused on research data, curating each dataset published, and working in close coordination with publishers and societies to support journal data policies. Zenodo, based at CERN, builds on strong infrastructure capacity and has focused on software publishing and citation. It was clear that by working together, leveraging each other’s expertise, we could better achieve our goals.

Notably, we believe researchers should have an opportunity to publish curated data, software, and other research outputs at a trusted, open source set of repositories in a seamless way.

At the beginning of February, we brought our two teams together to understand the repository systems, roadmaps, and to map our work ahead. We have broken down this work into a couple of segments and will be beginning with our first project, as noted on our Github, as “DJ D-Zed: Mixing Up Repositories”. In other words, we will be integrating our two systems to lower the barrier for researchers who want to follow best practices publishing their software, data, and supporting information. The first direction of focus is publishing from Dryad to Zenodo.

Image from iOS

So, what does it all look like?

This project entails re-imagining the Dryad upload interface to expand the scope of upload to accommodate researchers uploading more than data. Within this interface, through a series of declarations and machine reading, we will triage data, software, and supporting (other) files. Data should be curated and published at Dryad. Software requires a series of different license options, metadata, and other attributes and supporting files benefit from a previewer, so these files are more appropriately published at Zenodo. 

After curation, once the items are ready to be published, it is essential that we can link up the work with their DOIs and citations to both. As Dryad and Zenodo each mint DOIs for published works, it is our responsibility to expose the relationship between the software, data, and other citations so users can find all related work. The benefit of having separate citations for software and data will allow for more specified citation practices at journals, in preprints, etc. 

Image from iOS copy

It is essential that we acknowledge the importance of user testing. We have identified our minimum viable product, but the look and feel of this relies on close collaboration with our user experience teams and researcher user testing. This integration can only succeed if researchers find the benefits of using one entry point for two repositories, and are educated along the way about best practices for data and software. We’ll be planning opportunities for feedback at specific milestones, and appreciate comments via email or github comments along the way. 

What happens next

Our partnership relies on cross-organization co-development. Our teams have been spending time to understand how Dryad and Zenodo both function to ensure we are building for success for each of our user communities. Our initial user testing is about to ramp up, and we have begun the exploration into backend development to tie our systems closer together. As avid open-source supporters, all of our work will be tracked publicly on Github. Our code and documentation will also be available as new features are released.

User testing our workflows with researchers will help guide our development, but we also need to understand how this work can support Dryad and Zenodo’s larger communities: institutions, libraries, publishers, societies, funding agencies, and others that have a stake in research data and software publishing. We will have regular opportunities for feedback and we hope you will weigh in.

Check out our blogs for updates as well as our Twitter to hear about upcoming meetings we will be presenting at. And If you have feedback please as always get in touch with our Product Managers at Dryad and Zenodo.

 

Welcoming the Chan Zuckerberg Initiative to Our Member Community

We are proud to welcome our first philanthropy to join the Dryad member community: Chan Zuckerberg Initiative (CZI).

By joining Dryad, CZI will cover the cost of curation and preservation for their grantees’  data publications, supporting them in following best practices for open data.

“The discoverability and availability of research data is a critical component of an open, reproducible, and verifiable scientific ecosystem, and Dryad provides essential infrastructure in support of this mission. The Chan Zuckerberg Initiative is pleased to join the Dryad community.”

– Alex Wade, Open Science Program Manager (CZI)

We are thrilled that CZI will be joining publishers and institutional members in lowering the barrier for researchers to publish their curated research data. For more information about our memberships and to join the community, get in touch.

Deep Roots & Strong Branches: A Recap and Preview of Dryad’s Development Plans

Happy 2020! Kicking off the new year, our product development team wanted to take a moment to introduce our development processes and provide a glimpse into Dryad’s future directions. 2019 was an exciting year with our growth of 15% in submissions and the release of our new Dryad. This release was the culmination of a year and a half of work building a new, combined product development team (at Dryad and CDL) and developing new features to support Dryad’s user base. Since then, the work has not stopped. Our team has been working to continually meet user needs and better our services. 

Image from iOS.jpg

Members of the Product Development Team launching the new Dryad in September, 2019 (Left to Right: Daniella Lowenberg, Ryan Scherle, Marisa Strong, Scott Fisher, Brian Riley)

 

The Dryad development process

The Dryad product development team follows agile methodologies, working and releasing in  two-week sprints. This means we prioritize feature development and bug fixes based on user needs (which are ever evolving). This work is tracked on our public project board here.  Feature development also includes working with our user experience team to design interfaces that are both accessible for and understood by our users. Outward-facing features are tested for specific user groups (researchers, curators, members, etc) before development and before each release. At the end of each sprint, we post our release notes covering at a high (and sometimes technical) level what was completed. 

This type of development work means that we depend on community feedback to help identify the features necessary for making data publishing as easy as possible and for ensuring that published datasets are usable. There are hundreds of features we would love to build or enhance, and hearing productive feedback from the community helps to guide our development priorities. If you have a feature request, or would like to report a bug, you may log a ticket here. Our product manager consistently grooms through cards and will be in touch with more questions when that work is prioritized.

What we’ve been building

In the last three months, we have been primarily focused on ensuring the new platform can support the growing Dryad community. This means building up a robust, accessible platform and enhancing researcher facing features.

One of Dryad’s key strengths is its high adoption rate. This means that the platform receives heavy traffic loads. To support these loads over the long term and as the user base grows, we have been putting in various reinforcement features like load balancing our servers, improving reliability of our downloads, and actively monitoring/blocking bots as necessary to ensure the site can avoid any downtime.

Our other development work has included addressing accessibility and feature optimization, including:

  • Adjustments to our interface to be a more accessible service for our users
  • Enhancements for the auto-fill features (journal name, institutional affiliations) to reduce lag and better the author submission process
  • Updating our DataCite schema, allowing for Dryad to send author institutional affiliations (RORs) to DataCite, enabling better tracking of dataset publications by affiliation and support consumption by initiatives like FREYA and Make Data Count.

This foundational work is key to strengthen the system and prepare for new feature development work in 2020 and beyond. 

Where we are headed

Continuing to work in our two-week sprints, we will be building essential features for the researchers using Dryad (e.g., integrations, geolocation) as well as more complex functionality for our growing institutional and publisher member communities (e.g., integrations, reporting, data metrics aggregation). We also have embarked on a couple of larger projects that we are excited to share.

  • Zenodo – Dryad Partnership: Following on our announcement in July, 2019, we have embarked on a project to integrate Zenodo and Dryad, with a goal to provide researchers with a more seamless data, code, and other materials publishing process. While the initial work has already been scoped, our official kick-off meeting is in a couple of weeks and we will update the community shortly thereafter with our project plans.
  • Editorial Manager & ScholarOne Integrations: Since many Dryad authors publish data in conjunction with an article, we have been building a direct integration with Editorial Manager, a leading journal submission platform. This work will allow for researchers submitting to a journal that uses Editorial Manager to have the option to publish their data at Dryad without actually leaving the Editorial Manager (article submission) system. We look forward to sharing more information about this implementation in the spring. We have also been working to map a similar integration with ScholarOne that will enable thousands of journals to integrate directly with Dryad.

Our open REST APIs are documented and available for use. We have been talking with undergraduate and graduate level students looking for coding projects to build integrations into our platform with R, Python, Jupyter, rOpenSci, and Binder. If you are interested in working with our APIs, get in touch!

We have a busy year ahead and we look forward to working with both researchers and research supporting communities, continuing to make data publishing as seamless as possible. Follow along our blog and twitter for further updates.

 

NSF Workshop Overview: Focusing on Researcher Perspectives

Since its founding, Dryad has hosted a researcher-led, open data publishing community and service. With the California Digital Library partnership in 2018, and reflecting on a decade of Dryad’s existence, we have spent time exploring what it means to remain a community-owned data publishing platform. By convening publishers, institutions, and other scholarly communications stakeholders to discuss the meaning of community-ownership, we have begun to understand how research-supporters see their role in the Dryad community and leadership. But to better understand the meaning of “researcher-led”, we wanted to hear about researchers’ perspectives on community-led open infrastructure. 

With the support of a National Science Foundation Community Meeting grant (award #1839032), we hosted a meeting  on October 4th, 2019, with folks from the founding Dryad research communities. Going back to our roots, gathering both researchers that founded Dryad as well as early career researchers in Ecology and Evolutionary Biology, we held a day-long event centered around asking a diverse group of researchers: what does it mean for Dryad to remain researcher-led?

Focusing on research perspectives 

Kicking this off, we found it essential to hear from researchers themselves on how they use data, what their policies are, and their thoughts on how data re-use could be better suited to their use cases. Listening to researchers that are in different levels of their careers, we could see broad similarities but also meaningful variance in how even within the Ecology and Environmental Biology fields there are very different needs and uses for similar research data. 

We explored these dynamics through a series of presentations.  Ashley Asmus, a graduate student involved in the DroughtNet and NutNet projects explained the large amount of data they depend on across 27 countries, which could benefit from a more mature data management infrastructure. Dr. Lizzie Wolkovich introduced her lab’s new data management policy, requiring open sharing of data. And Dr. Karthik Ram, explained his perspective on what the data world could learn from the software world in terms of making things as easy as possible, with a bottom-up approach.

Image from iOS copy 2

Dr. Karthik Ram presenting on his experience working with open source software

Dryad and the disciplinary repository landscape

Before diving into Dryad-specific discussions, we took time to have a large-format discussion with guests from BCO-DMO, a repository for Oceanographic data as well as folks from Arctic Data Center, both National Science Foundation funded discipline specific repositories. It was evident that researchers do not feel they have proper guidance on which repository to use, even when funders feel this piece is clearly stated. Beyond it being a mandate, it’s important for researchers to submit to these repositories as discipline specific repositories typically provide richer curation than multi-disciplinary “general” repositories. A heavy theme that emerged was how Dryad and others that are embedded in the article publishing processes could ensure submitted data are going to the right home.

Meeting user needs

Splitting the room based on user interests in submitting and publishing data or re-using data in Dryad, we turned the event space walls into post-it note exhibits. Researchers wrote down as many features and use cases they could think of for either submitting data or using data. Within their groups they then clustered and prioritized these features. Interestingly, the majority of participants chose to focus on data re-use, reflecting the change in open data acceptance amongst the community they represent. Some of the highest priority features in this arena were about integrations and development of software tools that make the curated data more usable. For those focusing on submission the top rated features were around crediting back to funders and institutions, as well as relations to the scripts and code used to analyze the data.

Image from iOS copy 3

Dr. Sally Otto representing the “Publishing Data” group discussion

Image from iOS copy 4

Researchers clustering and prioritizing data re-use features

Maintaining a researcher-led community and platform

Circling back to the opening question we prompted the group to think about their perceptions of what it means for researchers to be leading the Dryad community. Many of these perspectives centered around transparency in marketing, true costs, and the added values. A big note was on how we can overcome barriers like those who do not have funding to publish data. Researchers raised the point that they may not be able to cover the cost of a data publishing charge, even at a respected US-based institution. Questions of how curation, integration, and open-source values can be inclusive of these communities struggling for funding prompted us to consider how disparate and diverse scientific research may be, even within the same domain. We received innovative ideas related to business models for supporting a broader audience of researchers as well as outreach ideas reflecting the need to integrate deeper within the open-source software community.

Working in conjunction with the open repositories (BCO-DMO, Arctic Data Center) and repository networks (DataONE) present at the workshop, and continuing to be led in the forms of governance and product management by researchers, Dryad and California Digital Library are striving to both understand and promote proper practices for community-ownership in open source data publishing. While this was a one-day event, we aim to continue to engage with broader research communities and encourage any researcher to get in touch with us if you have feedback or ideas for how you can get involved in our community.

CDL and Dryad thank the National Center for Ecological Analysis and Synthesis (NCEAS) for giving us the space to hold this meeting as well as the National Science Foundation for granting meeting funds.