Dryad & Zenodo: Our Path Ahead

In July, 2019 we were proud to announce a funded partnership between Dryad and Zenodo. Today, we are excited to give an update on our future together. 

Dryad and Zenodo have both been leading the way in open-source data, software, and other research outputs publishing for the last decade. While our focus and adoption mechanisms may have been different, we’ve had similar values and goals all along: publish and archive non-traditional research outputs in an open and accessible way that promotes best practices. 

In looking to expand our capacities for sharing data and software, it became clear that we could each benefit from the other’s expertise. Dryad has long focused on research data, curating each dataset published, and working in close coordination with publishers and societies to support journal data policies. Zenodo, based at CERN, builds on strong infrastructure capacity and has focused on software publishing and citation. It was clear that by working together, leveraging each other’s expertise, we could better achieve our goals.

Notably, we believe researchers should have an opportunity to publish curated data, software, and other research outputs at a trusted, open source set of repositories in a seamless way.

At the beginning of February, we brought our two teams together to understand the repository systems, roadmaps, and to map our work ahead. We have broken down this work into a couple of segments and will be beginning with our first project, as noted on our Github, as “DJ D-Zed: Mixing Up Repositories”. In other words, we will be integrating our two systems to lower the barrier for researchers who want to follow best practices publishing their software, data, and supporting information. The first direction of focus is publishing from Dryad to Zenodo.

Image from iOS

So, what does it all look like?

This project entails re-imagining the Dryad upload interface to expand the scope of upload to accommodate researchers uploading more than data. Within this interface, through a series of declarations and machine reading, we will triage data, software, and supporting (other) files. Data should be curated and published at Dryad. Software requires a series of different license options, metadata, and other attributes and supporting files benefit from a previewer, so these files are more appropriately published at Zenodo. 

After curation, once the items are ready to be published, it is essential that we can link up the work with their DOIs and citations to both. As Dryad and Zenodo each mint DOIs for published works, it is our responsibility to expose the relationship between the software, data, and other citations so users can find all related work. The benefit of having separate citations for software and data will allow for more specified citation practices at journals, in preprints, etc. 

Image from iOS copy

It is essential that we acknowledge the importance of user testing. We have identified our minimum viable product, but the look and feel of this relies on close collaboration with our user experience teams and researcher user testing. This integration can only succeed if researchers find the benefits of using one entry point for two repositories, and are educated along the way about best practices for data and software. We’ll be planning opportunities for feedback at specific milestones, and appreciate comments via email or github comments along the way. 

What happens next

Our partnership relies on cross-organization co-development. Our teams have been spending time to understand how Dryad and Zenodo both function to ensure we are building for success for each of our user communities. Our initial user testing is about to ramp up, and we have begun the exploration into backend development to tie our systems closer together. As avid open-source supporters, all of our work will be tracked publicly on Github. Our code and documentation will also be available as new features are released.

User testing our workflows with researchers will help guide our development, but we also need to understand how this work can support Dryad and Zenodo’s larger communities: institutions, libraries, publishers, societies, funding agencies, and others that have a stake in research data and software publishing. We will have regular opportunities for feedback and we hope you will weigh in.

Check out our blogs for updates as well as our Twitter to hear about upcoming meetings we will be presenting at. And If you have feedback please as always get in touch with our Product Managers at Dryad and Zenodo.

 

Deep Roots & Strong Branches: A Recap and Preview of Dryad’s Development Plans

Happy 2020! Kicking off the new year, our product development team wanted to take a moment to introduce our development processes and provide a glimpse into Dryad’s future directions. 2019 was an exciting year with our growth of 15% in submissions and the release of our new Dryad. This release was the culmination of a year and a half of work building a new, combined product development team (at Dryad and CDL) and developing new features to support Dryad’s user base. Since then, the work has not stopped. Our team has been working to continually meet user needs and better our services. 

Image from iOS.jpg

Members of the Product Development Team launching the new Dryad in September, 2019 (Left to Right: Daniella Lowenberg, Ryan Scherle, Marisa Strong, Scott Fisher, Brian Riley)

 

The Dryad development process

The Dryad product development team follows agile methodologies, working and releasing in  two-week sprints. This means we prioritize feature development and bug fixes based on user needs (which are ever evolving). This work is tracked on our public project board here.  Feature development also includes working with our user experience team to design interfaces that are both accessible for and understood by our users. Outward-facing features are tested for specific user groups (researchers, curators, members, etc) before development and before each release. At the end of each sprint, we post our release notes covering at a high (and sometimes technical) level what was completed. 

This type of development work means that we depend on community feedback to help identify the features necessary for making data publishing as easy as possible and for ensuring that published datasets are usable. There are hundreds of features we would love to build or enhance, and hearing productive feedback from the community helps to guide our development priorities. If you have a feature request, or would like to report a bug, you may log a ticket here. Our product manager consistently grooms through cards and will be in touch with more questions when that work is prioritized.

What we’ve been building

In the last three months, we have been primarily focused on ensuring the new platform can support the growing Dryad community. This means building up a robust, accessible platform and enhancing researcher facing features.

One of Dryad’s key strengths is its high adoption rate. This means that the platform receives heavy traffic loads. To support these loads over the long term and as the user base grows, we have been putting in various reinforcement features like load balancing our servers, improving reliability of our downloads, and actively monitoring/blocking bots as necessary to ensure the site can avoid any downtime.

Our other development work has included addressing accessibility and feature optimization, including:

  • Adjustments to our interface to be a more accessible service for our users
  • Enhancements for the auto-fill features (journal name, institutional affiliations) to reduce lag and better the author submission process
  • Updating our DataCite schema, allowing for Dryad to send author institutional affiliations (RORs) to DataCite, enabling better tracking of dataset publications by affiliation and support consumption by initiatives like FREYA and Make Data Count.

This foundational work is key to strengthen the system and prepare for new feature development work in 2020 and beyond. 

Where we are headed

Continuing to work in our two-week sprints, we will be building essential features for the researchers using Dryad (e.g., integrations, geolocation) as well as more complex functionality for our growing institutional and publisher member communities (e.g., integrations, reporting, data metrics aggregation). We also have embarked on a couple of larger projects that we are excited to share.

  • Zenodo – Dryad Partnership: Following on our announcement in July, 2019, we have embarked on a project to integrate Zenodo and Dryad, with a goal to provide researchers with a more seamless data, code, and other materials publishing process. While the initial work has already been scoped, our official kick-off meeting is in a couple of weeks and we will update the community shortly thereafter with our project plans.
  • Editorial Manager & ScholarOne Integrations: Since many Dryad authors publish data in conjunction with an article, we have been building a direct integration with Editorial Manager, a leading journal submission platform. This work will allow for researchers submitting to a journal that uses Editorial Manager to have the option to publish their data at Dryad without actually leaving the Editorial Manager (article submission) system. We look forward to sharing more information about this implementation in the spring. We have also been working to map a similar integration with ScholarOne that will enable thousands of journals to integrate directly with Dryad.

Our open REST APIs are documented and available for use. We have been talking with undergraduate and graduate level students looking for coding projects to build integrations into our platform with R, Python, Jupyter, rOpenSci, and Binder. If you are interested in working with our APIs, get in touch!

We have a busy year ahead and we look forward to working with both researchers and research supporting communities, continuing to make data publishing as seamless as possible. Follow along our blog and twitter for further updates.

 

New Dryad is Here

The Dryad team has worked over the past year to understand what features are required to best support the research community’s ever-evolving needs. We are proud to announce the launch of our new Dryad platform and we are excited to share with the research community the enhancements that we have made!  

Dryad’s newest features are centered around making data publishing as easy as possible for researchers:

  • In addition to supporting datasets as part of a journal submission, Dryad now also supports datasets being submitted independently
  • Data can be uploaded from cloud storage or lab servers 
  • Datasets can be as large as 300GB
  • Datasets can easily be updated or versioned at any time in our process
  • Standardized data usage and citation statistics are updated and displayed for each published dataset 
  • Data can be submitted and downloaded through our new REST APIs

Since our beginning, Dryad has curated, published, and archived nearly 30,000 datasets underlying scholarly articles. While Dryad began and flourished in the ecology and evolutionary biology communities, it now encompasses the life and biomedical sciences and is gaining larger traction in the broader science and publishing landscapes.  As Dryad expands its disciplinary scope, we are taking into consideration the evolution of data management and publication practices throughout the sciences. 

New features to support current research practices

Because of these changing needs, we believe it is essential to allow for datasets to be submitted and published at any point in the research process. We see the need for datasets to be submitted or published at the point of preprints, micropublications, project completion, null result findings, or in preparation for submitting a manuscript (to name a few). 

Image from Gyazo

We also understand that data and research are dynamic, so it is important to support versioning and enhanced descriptor fields for these datasets through the research process. As part of the new Dryad, we have increased fields for usage notes, methods, standard vocabularies (i.e., funder), as well as increased file size limits all to enable consistent updating and improvements.

Image from Gyazo

We believe these changes can allow researchers to make their datasets as usable and understandable (FAIR) as possible, treating each dataset as a citable and valued research output. 

“We aim to build in best practices for research data so researchers don’t have to think about compliance and making their data discoverable.”

-Daniella Lowenberg, Dryad Product Manager

 

The road ahead

Dryad has long been embedded in the scientific communities through support by and for the outputs of funding bodies (i.e., National Science Foundation, National Institutes of Health, European Commission, private funding agencies, etc) and journal publishers and societies (i.e. The Royal Society, British Ecological Society, American Academy for the Advancement of Science). Going forward, Dryad aims to further build these connections through institutional memberships. We welcome our newest institutional members, and look forward to growing our membership, to support the costs and data publishing needs of their researchers. 

“The features and new capabilities encompassed in our new platform reflect Dryad’s long standing commitment to working with the data sharing community to build a premier data repository service that reflects the evolving needs of researchers, their funders and institutions. I look forward to welcoming new members to this growing global community.”

– Caroline Sutton, Dryad Board Chair

 

The new Dryad platform is just the beginning for our roadmap to make data publishing both robust and seamless. We have already started building integrations with publishing platforms such as Editorial Manager, ScholarOne, and PubSweet that will enable more journals to integrate with Dryad at the point of article submission. 

We will also be working with data analysis and computing spaces like Jupyter, Binder, WholeTale, and rOpenSci to allow for published datasets to be usable within researcher workspaces. To further ensure discoverability of datasets, we are also working with PubMed to allow for Dryad datasets related to articles to be searchable. As we announced earlier this year, we will be building on our partnership with Zenodo that makes software and data publishing a more connected and easy process for both researchers and publishers. 

Get involved

The road ahead is exciting and will take us closer to our goal of supporting researchers and making data publishing easier. We invite you to provide user feedback and potential integration discussions. 

Check out our Github, follow us on Twitter, and get in touch.

 

Funded Partnership Brings Dryad and Zenodo Closer

By Daniella Lowenberg (Cross posted at Zenodo)

With increasing mandates and initiatives around open data and software, researchers commonly have to make a choice about where to deposit their non-article outputs. Unfortunately, systems that are built to accommodate these objects work separately and can make the process more difficult. As a result, data, code, figures, and other outputs go to a variety of disconnected places, or improper homes (i.e. code with the wrong license or data not curated). To tackle this issue, and make open research best practices more seamless for researchers, we are thrilled to announce a partnership between Dryad and Zenodo.

Dryad is a leader in data curation and data publishing. For the last ten years, Dryad has focused primarily on research data, supporting a CC0 license and manually curating each incoming dataset. Zenodo, a general use repository hosted at CERN, has been paving the way in software citation and publishing. As long time players in the open science movement, we believe that we can advance open science and open-source projects further by working together.  Instead of working individually to broaden each our scopes, building competitive features, and inefficiently using our limited resources, Dryad and Zenodo will be working together to support more seamless workflows that make the process easier for researchers. 

To jumpstart this collaboration, we are proud to have been awarded an Alfred P. Sloan Foundation grant that will enable us to co-develop new solutions focused on supporting researcher and publisher workflows as well as best practices in data and software curation. By focusing on integrations between our systems, leveraging data and software expertise, we can both extend the reach of our services and open up more opportunities for broader research communities.  We are looking forward to re-imagining the submission process for researchers and how we can better support our journal publishing and institutional communities along the way.

Our leadership teams are dedicated to the future of our co-development projects. “Dryad has long admired the work Zenodo does in our shared space and we are thrilled to finally find a way to collaborate on a project that benefits researchers around the globe. The Dryad-Zenodo integration is an excellent example of how two like-minded organizations can join together in a shared vision,” says Melissanne Scheld, Executive Director at Dryad. 

Dryad and Zenodo have always shared the same Open Science values, this is why we are very excited to partner up with such a talented team and bring the future of scientific publication one step closer to reality. We look forward to this inspiring collaboration with Dryad as well as helping the research community to move science forward.” says Jose Benito Gonzalez, Head of Digital Repositories at CERN/Zenodo.

As we embark on this open-source project and partnership together, we invite community feedback and input.  

The way forward at Dryad

Crossroads

Melissanne Scheld, Executive Director, takes time to reflect on the Dryad/CDL partnership and to share thoughts on the direction of this collaborative effort.

It’s been a fast two months since I joined Dryad at this pivotal and exciting juncture. As previously announced, this spring Dryad entered into a formal partnership with California Digital Library (CDL) to ensure long-term sustainability for Dryad and to reinforce two essential,  shared goals:

  1. Create sustainability for open-source, community-owned, data curation & publication infrastructure
  2. Drive adoption of curated data publishing in the research community.

Where we are

For the past decade, Dryad has served as a highly regarded, non-profit, curated repository for data research across disciplines. None of that is changing!

Going forward we need to better meet researchers within their own workflows. We need to make the action of submitting research data even easier so that it becomes a seamless step within the publishing process.

We are currently working to migrate the Dryad system onto CDL’s Dash platform. Using an Agile framework, developers from both Dryad and CDL are collaborating to build an open-source, nimble service that will offer a higher level of administrative functionality, an improved curation layer, and various submission options.

Where we’re going

Screen Shot 2018-10-22 at 4.28.18 PM

Researchers will find our new offering continues to meet funder requirements and sets the bar in best practices for data sharing. Using the FAIR data principles as a guide, the curation we perform on each dataset deposited eases findability and usability, while the new levels of enhanced integrations we plan to develop (more on this below) will further improve submitters’ workflows.

For institutions, we want to offer an infrastructure that supports local research data management through features including campus single sign-on, bespoke reporting, integration with local repositories, and campus co-branding. The global network of libraries, which CDL is part of, will help us reach a wider range institutions that are also looking for data management solutions.

Dryad has always had strong publisher support; our new offering will improve these partnerships through enhanced API integrations. Going forward we will build upon our publishing partners while also working with platform providers to develop direct integrations. This will provide a more automated submission process around the transmission of metadata and DOIs.

We want to build modular infrastructure that is future-proof. We should be thinking about data publishing both as its own entity and in conjunction with article publishing. There are many avenues for circulating research and data publishing should be a part of all of these. Publishing data should be as ‘easy’ and ‘standardized‘ as article publishing.

Along with more robust infrastructure, we need to rethink how we build Dryad’s sustainability.  As a small, lean, non-profit, we need to build financial models that don’t overburden any single segment of our community, but still allow us to support the high level of curation and preservation infrastructure for which Dryad is known.

We are currently market testing new models within our community and have been talking with institutions and publishers to hear how we can best support their data publishing needs and what shared costs might look like. We know that there has been a lot of talk lately in our wider community about membership models; early feedback from our partners indicates this is still the most favorable method for investing in long-term sustainability.

What will success look like for us?  

successThe Dryad/CDL partnership aims to create a self-sustaining, curated, digital data repository for researchers across all fields of inquiry, based on the needs of and supported by institutional and publisher community members. We are building from a strong foundation, have created a thoughtful roadmap through community feedback, and are confident we are on a pathway to sustainability.

Personally, I’m very excited about all of these changes and know that, in partnership with CDL, we will be able to better serve our community. I look forward to updating you on future developments, but in the meantime, please don’t hesitate to reach out to me at director@datadryad.org with any questions or comments.

Technical update — Schema.org and Google Dataset Search

36201321231_92a4ca0401_z

Image by Pete

A core part of Dryad’s mission is to make our data available as widely as possible. Although most users find Dryad content through our website or via links from journal articles, many users also find Dryad content through search aggregators and other third-party services. For our content to be available to these external services, we follow the FAIR principle of Interoperability and make metadata available through a number of machine-readable mechanisms, including OAI-PMH, the DataONE API, and RSS.

This year, we added support for a new machine-readable mechanism, the Schema.org metadata format. This format was originally developed by representatives of major search engines, including Google, Bing, and Yahoo. It has recently been endorsed by a number of data repositories, including Dryad. The Schema.org metadata format allows us to embed machine-readable descriptions of data directly into the same web pages that users use to view Dryad content.

For example, for this recently deposited data package, you can visit the web page to view information optimized for human users. But if you use your web browser’s option to “view source” on the page, you will find the following metadata embedded in the Schema.org format:

{
    "@context" : "http://schema.org/",
    "@type" : "Dataset",
    "@id" : "https://doi.org/10.5061/dryad.70d46",
    "name" : "Data from: Biodiverse cities: the nursery industry, 
    homeowners, and neighborhood differences drive urban tree
    composition",
    "author" : [ {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-2649-9159",
        "givenName" : "Meghan",
        "familyName" : "Avolio"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0001-7209-514X",
        "givenName" : "Diane",
        "familyName" : "Pataki"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-5215-4947",
        "givenName" : "Tara",
        "familyName" : "Trammell"
    }, {
        "@type" : "Person",
        "givenName" : "Joanna",
        "familyName" : "Endter-Wada"
    } ],
    "datePublished" : "2017-12-18",
    "description" : "In arid and semi-arid regions, where few if any 
    trees are native, city trees are largely human-planted. Societal 
    factors such as resident preferences for tree traits, nursery 
    offerings, and neighborhood characteristics are potentially key 
    drivers of urban tree community composition and diversity....",
    "keywords" : [ "urban tree diversity" ],
    "citation" : {
        "@type" : "Article",
        "identifier" : "doi:10.1002/ecm.1290"},
    "publisher" : {
        "@type" : "Organization",
        "name" : "Dryad Digital Repository",
        "url" : "https://datadryad.org"}
}

The Schema.org metadata is available for any search engines or other interested users to collect and use. Last week, we saw the first major use of this metadata, with the launch of the Google Dataset Search service. Although Google Dataset Search is still in beta, the initial version is promising. It is easy to search and find content from Dryad and other data repositories all within a single system.

We are proud to make Dryad content available through the Dataset Search, and we look forward to other organizations making use of our data in new and exciting ways!

Improvements in data-article linking

Chain link fence with highway in backgroundDryad is a curated, non-profit, general-purpose repository specifically for data underlying scientific and medical publications — mainly journal articles. As such, we place great importance on linking data packages to the articles with which they are associated, and we try our best to encourage authors and journals to link back to the Dryad data from the article, ideally in the form of a reference in the works cited section. (There’s still a long way to go in this latter effort; see this study from 2016 for evidence).

Submission integration provides closer coordination between Dryad and journals throughout the publishing workflow, and simplifies the data submission process for authors. We’ve already implemented this free service with 120 journals. If you’re interested in integrating your journal, please contact us.

We’re excited to share a few recent updates that are helping to make our data-article linkages more efficient, discoverable, and re-usable by other publishers/systems.

The Automated Publication Updater

One of the greatest housekeeping challenges for our curation team lies in finding out when the articles associated with Dryad data packages become available online. Once they do, we want to add the article citation and DOI link to our record as quickly as possible, and to release any data embargoes placed “until the article appears.” Historically, we’ve achieved this through a laborious patchwork of web searches, journal alert emails, and notifications from authors or editors themselves.

But over the past year or so, we’ve built and refined a webapp that we call the APU (or Automated Publication Updater). This super-handy tool essentially compares data packages in the Dryad workflow with publication metadata available at Crossref. When a good match is found, it automatically updates article-related fields in the Dryad record, and then sends our curation team an email alert so they they can validate the match and finalize the record. The webapp can be easily run by curators as often as needed (usually a few times a week).

While the APU doesn’t find everything, it has dramatically improved both efficiency with which we add article information and links to Dryad records — and our curators’ happiness levels. Big win. (If you’re interested in the technical details, you can find them on our wiki).

Scholix

Dryad is also pleased to be a contributor to Scholix, or Scholarly Link Exchange, an initiative of the Research Data Alliance (RDA) and the World Data System (WDS). Scholix is a high-level interoperability framework for exchanging information about the links between scholarly literature and data.

  • The problem: Many disconnected sources of scholarly output, with different practices including various persistent identifier (PID) systems, ways of referencing data, and timing of citing data.
  • The Scholix solutionA standard set of guidelines for exposing and consuming data-article links, using a system of hubs.

Here’s how it works:

  1. As a DataCite member repository, Dryad provides our data-publication links to DataCite, one of the Scholix Hubs. 
  2. Those links are made available via Scholix aggregators such as the DLI service
  3. Publishers can then query the DLI to find datasets related to their journal articles, and generate/display a link back to Dryad, driving web traffic to us, increasing data re-use, and facilitating research discovery.

Crossref publishers, DataCite repositories/data centers, and institutional repositories can all participate — information on how is available on the Scholix website.

Programmatic data access by ISSN

Did you know that content in Dryad is available via a variety of APIs (Application Program Interfaces)? Details are available at the “Data Access” page on our wiki.

The newest addition to this list is the ability to access Dryad data packages via journal ISSN. So, for example, if you wanted access to all Dryad content associated with the journal Evolution Letters, you would format your query as follows:

https://datadryad.org/api/v1/journals/2056-3744/packages

If you’re a human instead of a machine, you might prefer to visit our “journal page” for Evolution Letters:

https://datadryad.org/journal/2056-3744

————

Dryad is committed to values of openness, collaboration, standardization, seamless integration, reduction of duplication and effort, and increased visibility of research products (okay, data especially). The above examples are just some of the ways we’re working in this direction.

If you’re part of an organization who shares these values, please contact us to find out how you can be part of Dryad.