Technical update — Schema.org and Google Dataset Search

36201321231_92a4ca0401_z

Image by Pete

A core part of Dryad’s mission is to make our data available as widely as possible. Although most users find Dryad content through our website or via links from journal articles, many users also find Dryad content through search aggregators and other third-party services. For our content to be available to these external services, we follow the FAIR principle of Interoperability and make metadata available through a number of machine-readable mechanisms, including OAI-PMH, the DataONE API, and RSS.

This year, we added support for a new machine-readable mechanism, the Schema.org metadata format. This format was originally developed by representatives of major search engines, including Google, Bing, and Yahoo. It has recently been endorsed by a number of data repositories, including Dryad. The Schema.org metadata format allows us to embed machine-readable descriptions of data directly into the same web pages that users use to view Dryad content.

For example, for this recently deposited data package, you can visit the web page to view information optimized for human users. But if you use your web browser’s option to “view source” on the page, you will find the following metadata embedded in the Schema.org format:

{
    "@context" : "http://schema.org/",
    "@type" : "Dataset",
    "@id" : "https://doi.org/10.5061/dryad.70d46",
    "name" : "Data from: Biodiverse cities: the nursery industry, 
    homeowners, and neighborhood differences drive urban tree
    composition",
    "author" : [ {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-2649-9159",
        "givenName" : "Meghan",
        "familyName" : "Avolio"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0001-7209-514X",
        "givenName" : "Diane",
        "familyName" : "Pataki"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-5215-4947",
        "givenName" : "Tara",
        "familyName" : "Trammell"
    }, {
        "@type" : "Person",
        "givenName" : "Joanna",
        "familyName" : "Endter-Wada"
    } ],
    "datePublished" : "2017-12-18",
    "description" : "In arid and semi-arid regions, where few if any 
    trees are native, city trees are largely human-planted. Societal 
    factors such as resident preferences for tree traits, nursery 
    offerings, and neighborhood characteristics are potentially key 
    drivers of urban tree community composition and diversity....",
    "keywords" : [ "urban tree diversity" ],
    "citation" : {
        "@type" : "Article",
        "identifier" : "doi:10.1002/ecm.1290"},
    "publisher" : {
        "@type" : "Organization",
        "name" : "Dryad Digital Repository",
        "url" : "https://datadryad.org"}
}

The Schema.org metadata is available for any search engines or other interested users to collect and use. Last week, we saw the first major use of this metadata, with the launch of the Google Dataset Search service. Although Google Dataset Search is still in beta, the initial version is promising. It is easy to search and find content from Dryad and other data repositories all within a single system.

We are proud to make Dryad content available through the Dataset Search, and we look forward to other organizations making use of our data in new and exciting ways!

Dryad welcomes Scheld as new Executive Director

Dryad is excited to announce the appointment of Melissanne Scheld as Executive Director.

Melissanne joins as Dryad embarks upon our 10th year of providing open, not-for-profit infrastructure for scholarly data, and as we begin a strategic partnership with California Digital Library (CDL) to address researcher needs by leading an open, community-supported initiative in research data curation and publishing.

We are pleased Melissanne is joining us at this auspicious point in Dryad’s trajectory. With over 25 years of experience working with the academic community, and with her knowledge of the scholarly communications industry, we are confident she will successfully lead Dryad into our second decade as a community-supported provider of open data services.

–Charles Fox, Dryad Board of Directors Chairperson and Professor, University of Kentucky

Melissanne most recently served as Managing Director of Publishers Communication Group, a scholarly publishing consultancy, and has previously held positions at the university presses of Cambridge, New York University, and Columbia.

To welcome our new ED and/or inquire about ways to get involved with Dryad, send an email to director@datadryad.org.

Introducing the Dryad BOD Class of 2021

We are thrilled to announce the latest additions to the Dryad Board of Directors.

Our 12-member Board is intended to be a diverse group, with a mix of background and skills useful to represent the various stakeholders in the Dryad community — publishers, researchers, technologists, funders, and libraries. BOD members are elected or re-elected each year by the membership to serve 3-year terms.

New members for 2018-2021

The following individuals have assumed their duties:

horstmannWolfram Horstmann has been Director of Göttingen State and University Library since 2014. Prior to that, he was Associate Director at the Bodleian Libraries of the University of Oxford, UK and CIO at Bielefeld University, Germany. He is Professor at the Information School of the Humboldt University in Berlin, teaching Electronic Publishing, Open Access and Open Science. He is biologist by training and worked on the epistemology of simulations for his doctoral thesis. Read more about Wolfram.

mangiaficoPaolo Mangiafico is the Scholarly Communications Strategist at Duke University and Director of the Scholarly Communication Institute. In his role at Duke, Paolo works with librarians, technologists, faculty, students and university leadership to plan and implement programs that promote greater reach and impact for scholarship in many forms, including open access to publications and data and emerging platforms for publishing digital scholarship.

suttonCaroline Sutton is Director of Editorial Development with Taylor & Francis. Before joining the company in October 2016, she was co-founder of Co-Action Publishing, a full OA publisher. She helped to found and served as the first President of the Open Access Scholarly Publishers Association (OASPA) and is a member of the present board.  At Taylor & Francis, Caroline has led efforts to roll out data sharing policies as well as initiatives related to open scholarship across subject areas.

uhlirPaul Uhlir, J.D. is a consultant in information policy and management. He was Scholar at the U.S. National Academy of Sciences (NAS) in Washington, DC in 2015-2016, and Director of its Board on Research Data and Information, 2008-2015. He was employed at the NAS in various capacities from 1985-2016. Paul has won several prizes from the NAS and the international CODATA in data policy, and is a Fellow of the American Association for the Advancement of Science (AAAS). Read more detailed information about his professional activities.

Officers for 2018-2019

Shout-out to our new slate of Board officers:

  • Charles Fox (Chair)
  • Johan Nilsson (Vice-chair)
  • Brian Hole (continuing as Treasurer)
  • Fiona Murphy (Secretary)

Ex Officio members

Filling out the Board roster are two members in Ex Officio (non-voting) status. We are thankful that Todd Vision, longtime BOD member and PI of grants supporting Dryad, will continue to serve. We also welcome Günter Waibel, Associate Vice Provost and Executive Director of California Digital Library, in this capacity to represent our recently-announced partnership with the CDL.

Finally, we wish to express our sincere appreciation to outgoing BOD members and officers for their work on behalf of Dryad and open data.

Open data tips from the Dryad curation team | Part 2: Endangered species

This is the second in a series of blog posts highlighting new guidance from the Dryad curation team. Part 1 covered human subjects data. Part 2, from curator Shavon Stewart, focuses on best practices for sharing data associated with endangered species.


Ensuring safe data sharing for species under threat

Tasmanian devils, mountain gorillas, and black rhinos all have one thing in common. They are listed as critically endangered on the IUCN Red List of threatened species. Data archived in Dryad are publicly available, therefore, potential risks to endangered and vulnerable species must be carefully assessed before submitting data.

It is imperative that threatened species remain safe in their natural habitat. Publishing location data and habitat descriptions can expose species to hunters, poachers, and wildlife enthusiasts which can lead to their further decline, as well as hinder conservation efforts. The key is to provide fewer details of the species’ location for those with the intention of doing harm, without overly compromising analyses or replication by other researchers.

Here at Dryad, we recommend simple actions such as masking coordinates by a few decimal points or removing exact geo-coordinates from the dataset, which can limit illegal access to these vulnerable creatures.

Modified geo-coordinates for the breeding sites of Falco naumanni, from https://doi.org/10.5061/dryad.jq87d

Researchers who work with vulnerable species are encouraged to consult the following resources prior to submitting data:

Dryad partnering with CDL to accelerate data publishing

Two cheetahs running

Image credit Cat Specialist Group, catsg.org

Dryad is thrilled to announce a strategic partnership with California Digital Library (CDL) to address researcher needs by leading an open, community-supported initiative in research data curation and publishing.

Dryad was founded 10 years ago with the mission of providing open, not-for-profit infrastructure for data underlying the scholarly literature, and the vision of promoting a world where research data is openly available and routinely re-used to create knowledge.

20,000 data publications later, that message has clearly resonated. The Dryad model of embedding data publication within journal workflows has proven highly effective, and combined with our data curation expertise, has made Dryad a name that is both known and trusted in the research community. But a lot has changed in the data publishing space since 2008, and Dryad needs to change with it.

Who/what is CDL?

CDL LoroCDL was founded by the University of California in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. Since then, in collaboration with the UC libraries and other partners, they have assembled one of the world’s leading digital research libraries and changed the ways that faculty, students, and researchers discover and access information.

CDL has long-standing interest and experience in research data management (RDM) and data publishing. CDL’s digital curation program, the University of California Curation Center (UC3), provides digital preservation, data curation, and data publishing services, and has a history of coordinating collaborative projects regionally, nationally, and internationally. It is baked into CDL’s strategic vision to build partnerships to better promote and make an impact in the library, open research, and data management spaces (e.g., DMPTool, HathiTrust).

Why a partnership?

CDL and Dryad have a shared mission of increasing the adoption and availability of open data. By joining forces, we can have a much bigger impact. This partnership is focused on combining CDL’s institutional relationships, expertise, and nimble technology with Dryad’s position in the researcher community, curation workflows, and publisher relationships. By working together, we plan to create global efficiencies and minimize needless duplication of effort across institutions, freeing up time and funds, and, in particular, allowing institutions with fewer resources to support research data publishing and ensure data remain open.

Our joint Dryad-CDL initiative will increase adoption of open data by meeting researchers where they already are. We will leverage the strengths of both organizations to offer new products and services and to build broad, sustainable, and productive approaches to data curation. We plan to move quickly to provide new value:

  • For researchers: We will launch a new, modern and easier-to-use platform. This will provide a higher level of service, and even more seamless integration into regular workflows than Dryad currently offers
  • For journals and publishers: We will offer new integration paths that will allow direct communication with manuscript processing systems, better reporting, and more comprehensive curation services
  • For academic institutions: We will work directly with institutions to craft right-sized offerings to meet your needs

We have many details to hammer out and a lot of work to do, but among our first steps will be to reach out to you — each of the groups above — to discuss your needs, wants, and preferred methods of supporting this effort. With your help, the partnership will help us grow Dryad as a globally-accessible, community-led, non-commercial, low-cost service that focus on breaking down silos between publishing, libraries, and research.

As this partnership is taking shape, we ask for community input on how our collective efforts can best meet the needs of researchers, publishers, and institutions. Please stay tuned for further announcements and information over the coming months. We hope you share our excitement as we step into Dryad’s next chapter.

Dryad and the GDPR

The EU General Data Protection Regulation (GDPR) is a major piece of data privacy legislation coming into effect on May 25, 2018. GDPR will apply to all companies processing the personal data of all European Union residents, regardless of the company’s location.

We’d like to take this opportunity to emphasize that Dryad respects the privacy of our users and submitters and works to protect all personally identifiable information we collect, which is limited to names and contact information. We have an existing privacy policy to which submitters agree when they create a Dryad profile, and again when they submit data. Some steps we will be taking:

  • Reviewing our privacy policy to ensure it conforms to GDPR requirements;
  • Reviewing the parts of our system where submitters provide and maintain personal data, ensuring that there are links to the privacy policy, and that actions to control one’s personal data are clear; and
  • Reviewing the methods we use to communicate with submitters, users, and others to ensure that you do not receive unwanted emails.

If you are concerned about the accuracy of personally identifiable information maintained by Dryad, wish to review, access, or correct this information, or would like your information removed from Dryad’s records, you may contact us anytime at help@datadryad.org.

Dryad to join launch of the Data Curation Network

Alfred P. Sloan Foundation grant will fund implementation of shared staffing model across 7 academic libraries and Dryad

We’re thrilled to announce that Dryad will participate in a three-year, multi-institutional effort to launch the Data Curation Network. The implementation — led by the University of Minnesota Libraries and backed by a $526,438 grant from the Alfred P. Sloan Foundation — builds on previous work to better support researchers faced with a growing number of requirements to openly and ethically share their research data.

The result of many months of research and planning, the project brings together eight partners:

Currently, staff at each of these institutions provide their own data curation services. But because data curation requires a specialized skill set — spanning a wide variety of data types and discipline-specific data formats — institutions cannot reasonably expect to hire an expert in each area.

Curation workflow for the DCN

The intent of the Data Curation Network is to serve as a cross-institutional staffing model that seamlessly connects a network of expert data curators to local datasets and to supplement local curation expertise. The project aims to increase local capacity, strengthen cross-institutional collaboration, and ensure that researchers and institutions ethically and appropriately share data.

Lisa R. Johnston, Principal Investigator for the DCN and Director of the Data Repository for the University of Minnesota (DRUM), explains:

Functionally, the Data Curation Network will serve as the ‘human layer’ in a local data repository stack that provides expert services, incentives for collaboration, normalized curation practices, and professional development training for an emerging data curator community.

For our part, the Dryad curation team is excited to join a collegial network of professionals, to help develop shared procedures and understandings, and to learn from the partners’ experience and expertise (as they may learn from ours).

As an independent, non-profit repository, we are especially pleased to get to work more closely with the academic library community, and hope this project can provide a launchpad for future, international collaborations among organizations with similar missions but differing structures and funding models.

Watch this space for news as the project develops, and follow the DCN on Twitter: #DataCurationNetwork

Open data tips from the Dryad curation team | Part 1: Human subjects

Dryad is a general purpose repository for data underlying scholarly publications. Each new submission we receive is reviewed by our curation team before the data are archived. Our main priority is to ensure compliance with Dryad’s Terms of Service, but we also strongly believe that curation activities add value to your data publication, since curated data are more likely to be FAIR (findable, accessible, interoperable, and reusable).

FAIR

Before we register a DOI, a member of our curation team will check each data package to ensure that the data files can be opened, that they appear to contain information associated with a scientific publication, and that metadata for the associated publication are technically correct. We prefer common, non-proprietary file types and thorough documentation, and we may reach out if we are unable to view files as provided.

Our curators are also on the lookout for sensitive information such as personally identifiable human subjects data or protected location information, and for files that contain copyright and license statements that are incompatible with our required CC0 waiver.

To make the data archiving process more straightforward for authors, our curation team has authored sets of guidelines that may be consulted when preparing a data submission for a public repository such as Dryad. We hope these guidelines will help you as you prepare your Dryad data package, and that they will lessen the amount of time from point of submission to registered data DOI!

A series of blog posts will highlight each of the guidelines we’ve created. First up is our best practices for sharing human subjects data in an open access repository, from former Dryad curator Rebecca Kameny.

— Erin Clary, Senior Curator – curator@datadryad.org

_______________

Preparing human subject data for open access

Collecting, cleaning, managing, and analyzing your data is one thing, but what happens when you are ready to share your data with other researchers and the public?

peopleBecause our researchers come from fields that run the gamut of academia — from biology, ecology, and medicine, to engineering, agriculture, and sociology — and because almost any field can make use of data from human subjects, we’ve provided guidance for preparing such data for open access. We based our recommendations and requirements on well-respected national and international sources from government institutions, universities, and peer-reviewed publications.

Dryad curators will review data files for compliance with these recommendations, and may make suggestions to authors, however, authors who submit data to Dryad are ultimately responsible for ensuring that their data are properly anonymized and can be shared in a public repository.

handle-43946_960_720In a nutshell, Dryad does not allow any direct identifiers, but we do allow up to three indirect identifiers. Sound simple? It’s not. If the study involves a vulnerable population (such as children or indigenous people), if the number of participants is small, or if the data are sensitive (e.g., HIV status, drug use), three indirect identifiers may be too many. We evaluate each submission on a case-by-case basis.

If you have qualitative data, you’ll want to pay close attention to open-ended text, and may need to replace names with pseudonyms or redact identifiable text.

Quick tips for preparing human subjects data for sharing

  • Ensure that there are no direct identifiers.
  • Remove any nonessential identifying details.
  • Reduce the precision of a variable – e.g., remove day and month from date of birth; use county instead of city; add or subtract a randomly chosen number.
  • Aggregate variables that are potentially revealing, such as age.
  • Restrict the upper or lower ranges of a continuous variable to hide outliers by collapsing them into a single code.
  • Combine variables by merging data from two variables into a summary variable.

It’s also good research practice to provide clear documentation of your data in a README file. Your README should define your variables and allowable values, and can be used to alert users to any changes you made to the original dataset to protect participant identity.

Our guidelines expand upon the tips above, and link to some useful references that will provide further guidance to anyone who would like to share human subjects data safely.

Dryad Executive Director to pursue new opportunities

Over the past 3 ½ years, Dryad has become an independent organization with a committed team and organizational capacity. During this time, integrations and partnerships have expanded and sustainability plans have grown. From these efforts, we increased the amount of curated and openly published data available to the public. With great pride and bittersweet feelings, I will be moving on to pursue a new opportunity on Feb 23.

Working with the staff has meant collaborating with a group of committed, mission-driven professionals. Leading this group to become a collegial and very high-functioning team has been my absolute pleasure. I have also been honored to be accepted as an equal in the field of open data advocates and crafters of scholarly communication workflows, and to be able to share my vision of Dryad as a critical service. The support, encouragement, and concern of the Dryad board of directors was always behind me, and I’ve been energized at what we’ve accomplished in support of curated, open, and FAIR data.

A search for a new Executive Director has begun. This person will have the opportunity to develop mission-critical business strategies and to offer an innovative vision for promoting data openness in the scientific community and securing Dryad’s place as a key facilitator of data sharing. With the Dryad board’s support, Elizabeth Hull, Dryad Operations Manager, is filling in during the interim.

I want to thank our incredibly supportive community of submitters, members, partners, and collaborators for their dedication to open data and to Dryad’s mission. This next phase for the organization is now beginning. We invite you to join us and grow Dryad!

 

Improvements in data-article linking

Chain link fence with highway in backgroundDryad is a curated, non-profit, general-purpose repository specifically for data underlying scientific and medical publications — mainly journal articles. As such, we place great importance on linking data packages to the articles with which they are associated, and we try our best to encourage authors and journals to link back to the Dryad data from the article, ideally in the form of a reference in the works cited section. (There’s still a long way to go in this latter effort; see this study from 2016 for evidence).

Submission integration provides closer coordination between Dryad and journals throughout the publishing workflow, and simplifies the data submission process for authors. We’ve already implemented this free service with 120 journals. If you’re interested in integrating your journal, please contact us.

We’re excited to share a few recent updates that are helping to make our data-article linkages more efficient, discoverable, and re-usable by other publishers/systems.

The Automated Publication Updater

One of the greatest housekeeping challenges for our curation team lies in finding out when the articles associated with Dryad data packages become available online. Once they do, we want to add the article citation and DOI link to our record as quickly as possible, and to release any data embargoes placed “until the article appears.” Historically, we’ve achieved this through a laborious patchwork of web searches, journal alert emails, and notifications from authors or editors themselves.

But over the past year or so, we’ve built and refined a webapp that we call the APU (or Automated Publication Updater). This super-handy tool essentially compares data packages in the Dryad workflow with publication metadata available at Crossref. When a good match is found, it automatically updates article-related fields in the Dryad record, and then sends our curation team an email alert so they they can validate the match and finalize the record. The webapp can be easily run by curators as often as needed (usually a few times a week).

While the APU doesn’t find everything, it has dramatically improved both efficiency with which we add article information and links to Dryad records — and our curators’ happiness levels. Big win. (If you’re interested in the technical details, you can find them on our wiki).

Scholix

Dryad is also pleased to be a contributor to Scholix, or Scholarly Link Exchange, an initiative of the Research Data Alliance (RDA) and the World Data System (WDS). Scholix is a high-level interoperability framework for exchanging information about the links between scholarly literature and data.

  • The problem: Many disconnected sources of scholarly output, with different practices including various persistent identifier (PID) systems, ways of referencing data, and timing of citing data.
  • The Scholix solutionA standard set of guidelines for exposing and consuming data-article links, using a system of hubs.

Here’s how it works:

  1. As a DataCite member repository, Dryad provides our data-publication links to DataCite, one of the Scholix Hubs. 
  2. Those links are made available via Scholix aggregators such as the DLI service
  3. Publishers can then query the DLI to find datasets related to their journal articles, and generate/display a link back to Dryad, driving web traffic to us, increasing data re-use, and facilitating research discovery.

Crossref publishers, DataCite repositories/data centers, and institutional repositories can all participate — information on how is available on the Scholix website.

Programmatic data access by ISSN

Did you know that content in Dryad is available via a variety of APIs (Application Program Interfaces)? Details are available at the “Data Access” page on our wiki.

The newest addition to this list is the ability to access Dryad data packages via journal ISSN. So, for example, if you wanted access to all Dryad content associated with the journal Evolution Letters, you would format your query as follows:

https://datadryad.org/api/v1/journals/2056-3744/packages

If you’re a human instead of a machine, you might prefer to visit our “journal page” for Evolution Letters:

https://datadryad.org/journal/2056-3744

————

Dryad is committed to values of openness, collaboration, standardization, seamless integration, reduction of duplication and effort, and increased visibility of research products (okay, data especially). The above examples are just some of the ways we’re working in this direction.

If you’re part of an organization who shares these values, please contact us to find out how you can be part of Dryad.