A New Day for Dryad

dryadlogo_treeThere so much is new activity at Dryad! As we prepare to relaunch our platform later this summer, we’ve been hard at work on:

  1. Rolling out a new Institutional Membership plan with enhanced benefits
  2. Planning for an upcoming webinar
  3. Redesigning our logo!

A New Way to Partner with us: Institutional Memberships

Over the past few months you may have heard of or been involved in conversations about Dryad launching a new Institutional Membership – the ‘rumors’ are true and we are pleased to be rolling out this new program to institutions globally! During our first week, California State University – East Bay and Montana State University have officially joined the Dryad community! We are very excited to have these two institutions as members. And they’re excited to join as well:

“Dryad provides Cal State East Bay faculty and students with a tool that will not only preserve their research data but also make it available to the public at large. Because equity and access are core values of our university, we are excited to be one of the early adopters.”

— Jeffra Diane Bussmann, MLIS Associate Librarian

Screen Shot 2019-03-20 at 4.53.43 PM

Our plan is to build a member-owned community of organizations who support data publishing, curation, and preservation on behalf of researchers. We need to band together to make this happen. We know that researchers depend on Dryad; even as more institutions build and promote local data repositories, the number of submissions to Dryad continues to grow year on year. Our goal is to make the Dryad community compatible with the efforts of all institutions, regardless of local data repository infrastructure.

(Our new Institutional Membership is the first step in this direction; towards the end of the year we will be launching a new Publisher Membership with enhanced integrations and customized reporting, but more on that in a few months).

For now, we are rolling out the Institutional Membership program.  We encourage all research institutions to join now as we prepare to re-launch the Dryad platform. The ‘new’ Dryad will offer features for institutional members including campus single sign-on, bespoke reporting, local curation capabilities, and campus co-branding.

Our new model allows for flexibility in how we partner with research organizations.Through our curation, reporting, and integration systems, Dryad can either serve as your primary repository or supplement the services you currently offer.

Forthcoming Webinar

If you’d like to hear more about how your institution can be part of the Dryad community, please join our Institutional Member Webinar on March 27th!

At this webinar Dryad will be joined by our colleagues John Chodacki and Daniella Lowenberg from the California Digital Library (CDL) to discuss the MANY reasons to join the Dryad community as we showcase some of the new functionality and outline the benefits a Membership brings your institution.

Of course, one of the key questions everyone will want to know is – what will this cost? Dryad has crafted a tiered pricing structure based on an institution’s ability to pay. We don’t want any potential member to not be able to join because of the annual fee, so hopefully our plans will work for any potential institution (and if not, I would be happy to discuss directly with you).

Our New Look

As a capstone to these major changes, you may have noticed we have refreshed the Dryad logo! We think this new bright image conveys the spirit of connectivity that Dryad represents across our community. It also retains a thematic connection to our original design. After all, Dryad is still true to our roots (no logo-based pun intended); it is important to us that we never lose sight of Dryad’s core mission to support infrastructure that openly and freely shares and preserves research data for the long term.

If you’d like to JOIN TODAY or receive additional information regarding Dryad, please contact me at director@datadryad.org.

See you at the Institutional Member Webinar on March 27th.

 

Data Curation from Down Under: the 14th International Digital Curation Conference

img_5196.jpgIt was a long journey from Chapel Hill, NC to Melbourne, Australia, but it was definitely worth it to attend the 14th International Digital Curation Conference (IDCC). The IDCC is always a great event for people involved in digital curation and preservation, especially when it is in a beautiful city like Melbourne. I was excited to attend this year and to take part in a 10-minute lightning talk on the Data Curation Network (DCN) entitled “The Data Curation Network: A Curator Perspective”. (More on this later in this post.) I’d like to take this opportunity to share some highlights from the conference.

32258153577_6bdf6bd076_z

This theme of this year’s IDCC, “Collaborations and Partnerships: addressing the big digital challenges together”, fits perfectly with what the Data Curation Network is all about. The Data Curation Network puts into place a cross-institutional staffing model connecting a network of expert data curators to increase local curation capacity, strengthen collaboration and support the sharing of research data. (To read more about the DCN and Dryad’s participation in the network, see Elizabeth Hull’s previous blog post announcing Dryad’s participation in the DCN launch.)

40235616193_c930f23f41_mThe main conference was kicked off with a “Welcome to Country Ceremony” conducted by a Wurundjeri Community Elder, along with a welcome to the University of Melbourne from Gwenda Thomas, Directory Scholarly Services and University Librarian. Kevin Ashley, Director, Digital Curation Centre, also gave a welcome to IDCC19 that included a challenge to conference participants: “listen, talk, interact and be inspired to do something”.

40235541493_9b52a0d5ba_zThe opening keynote, which was presented by independent journalist Christine Kenneally and was entitled “Data, the creation of history and its impact on real lives“, related the compelling story of millions of orphans from around the world (including Australia and the US) searching for information about themselves. The orphans’ story highlighted the importance and direct impact of data on both a societal and an individual level, a theme that would emerge throughout  the conference.

After the keynote, the various presentations in the form of parallel sessions, posters and lightning talks began. Throughout the conference, these presentations were organized into broad topics such as:

  • Grand curation challenges across disciplines
  • Metadata
  • Trust47200267751_009c42e246_z
  • Data quality
  • Digital humanities
  • Examples and models / Models and tools
  • Research disciplines & data services
  • Research data management / Research data services
  • Digital curation & preservation
  • Building diverse and Inclusive Communities
  • Curating indigenous data
  • Skills

As a representative of the DCN, I took part in a lightning talk session with a presentation put together by Erin Clary (Dryad Senior Curator), Lisa Johnston (Principal Investigator for the DCN and Director of the Data Repository for the University of Minnesota) and myself. The presentation focused on the experiences Erin and I have had so far as curators with the DCN pilot. After Lisa gave a brief overview of the DCN, I described the training and preparation all participating curators undertook and what it was like for Erin and me to actually begin curating DCN submissions.

40235544193_9a0f000eb3_z

John Chodacki (Director, University of California Curation Center) gave a great presentation about the “Community Led Open Data Infrastructure: CDL & Dryad Partnership” in which he shared how and why the partnership came about and what it means going forward. John followed up immediately with another presentation about “The Research Organization Registry“. As an added bonus after the conference, John led the workshop “Accelerating Data Publication: new models for research institutions”. (For a summary of the workshop, see the blog post from the perspective of workshop attendee Dr. Richard Ferrers.)

The thought-provoking final keynote was presented remotely (in light of the recent US Government shutdown) by Dr. Patricia Brennan, Director, US National Library of Medicine. Her presentation, “Jumping into the stream of data curation“, highlighted the enormous amount of data curated each day by the National Library of Medicine. Dr. screen-shot-2019-02-28-at-2.47.44-pm.pngBrennan spoke of an “information tsunami”, the challenges inherent in curating all that data and what those challenges may mean for the future of data curation. Her presentation highlighted the shift in focus by data curation professionals over the years from pushing efforts to encourage data curation to figuring out how we move forward now that those efforts are paying off with a torrent of data given the limited resources available.

The conference came to an end all too soon with closing remarks by Kevin Ashley and Donna McRostie and an IDCC 2019 theme song that put a smile on everyone’s face. Next year, curators will do it all again at the 15th International Digital Curation Conference in (drum roll, please) … Dublin, Ireland!

40235555313_91457264a5_z
Continue reading

Most popular data from 2018

As we begin a new year and celebrate the major milestone of more than 25,000 data packages published, it’s a great time to highlight the value for re-use of the scholarly resources that are openly available and licensed in Dryad. 

So, which data packages published in 2018 have received the most downloads? Here are some at the top of the list.

Whale songs

Stafford et al (2018) Extreme diversity in the songs of Spitsbergen’s bowhead whales 

Here’s a lovely example of “data” that can have uses well beyond research. We’d love to know what people might be doing with these audio files. Meditating to them? Incorporating them into musical compositions?

whale

All about the data

It’s perhaps not surprising that Dryad data packages associated with Scientific Data get a lot of downloads, as they are a journal specifically for “descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data.” These three resources are proving especially popular:

  • Bennett et al (2018) GlobTherm, a global database on thermal tolerances for aquatic and terrestrial organisms
  • Faraut et al (2018) Dataset of human medial temporal lobe single neuron activity during declarative memory encoding and recognition 
  • Kummu et al (2018) Gridded global datasets for Gross Domestic Product and Human Development Index over 1990-2015 

screen shot 2019-01-24 at 2.13.10 pm

Avian functional traits

Storchová L, Hořák D (2018) Life-history characteristics of European birds

europeanrobinThis is an example of a dataset compiled specifically for re-use. According to the authors, “Recently, functional aspects of avian diversity have been used frequently in comparative analyses as well as in community ecology studies; thus, open access to complete datasets of traits will be valuable.” To make the data as useful as possible, they included a broad spectrum of traits and provided the file in an accessible format: ASCII text, tab delimited, not compressed. Given the large number of downloads, it has indeed proven valuable!

Improving clinical research transparency

Kilicoglu et al (2018) Automatic recognition of self-acknowledged limitations in clinical research literature

Here’s another dataset created for the purpose of improving research — in this case, reporting of limitations in clinical studies. The machine-learning techniques tested here can be incorporated into the workflows of other projects, to support efforts in increasing transparency.

———

Huge thanks are due to researchers who take the time and effort to publish their data, to the journals who support them in doing so (including those highlighted above), and to the Dryad member organizations who make it all possible. Here’s to the next 25,000, and the millions of downloads they will produce!

 

Five-ish Minutes With: Charles Fox

In our latest post, our Executive Director Melissanne Scheld sits down with Dryad’s Board of Directors Chair, Professor Charles Fox, to discuss challenges researchers face today, how Dryad is helping alleviate some of those pain points, why Dryad has had such staying power in a quickly changing industry,  . . . and then we move on to dessert. 

Chuck Fox

Can you tell us a little about your professional background and how that intersects with Dryad’s mission?

I wear two hats in my professional life – I am an evolutionary ecologist who studies various aspects of insect biology at the University of Kentucky, and I am a journal editor (Executive Editor of Functional Ecology).

My involvement with open data and Dryad began fortuitously in 2006. The British Ecological Society was invited to send a representative to a Data Registry Workshop, organized by the Ecological Society of America, to be held that December in Santa Barbara, California. I am (and was at that time) an editor of one of the British Ecological Society’s journals, Functional Ecology, and I live in the U.S. So Lindsay Haddon, who was Publications Manager for the BES, asked me to attend the workshop  as their representative. Before that meeting I don’t recall having thought much about open data or data archives, but I was excited to attend the meeting in part because the topic intrigued me and, selfishly, because my parents live in southern California and this was an opportunity to visit them. The discussions at that meeting, plus those at a couple follow-up meetings over the next couple years, including one at NESCent in Durham, North Carolina, and another in Vancouver, convinced me that data publishing, and open data more generally, should be a part of research publication. So I began lobbying the BES to adopt an open data policy and become a founding member of Dryad. I wrote a proposed data policy – just a revision of the Journal Sata Archiving Policy, JDAP, that many ecology and evolution journals adopted – and submitted that proposal to the BES’ publication committee. It took a few years, but in 2011 the BES adopted that data policy across their suite of journals and became a member of Dryad. The BES has since been a strong supporter of open data and required data publication as a condition of publishing a manuscript in one of their journals. Probably because I was a vocal proponent of data policies at BES meetings (along with a few others, most notably Tim Coulson), I was nominated to be a Dryad board member, and was elected to the board in 2013.

As an educator,  what are some of the biggest changes you’ve seen in the classroom during your career?

When I started teaching, first as a graduate student (teaching assistant) and then as a young university professor, we didn’t have Powerpoint and digital projectors. So I made heavy use of a chalkboard (or dry erase board) during lecture, and used an overhead projector for more complicated graphics. Students had to take detailed notes on the lecture, which required them to write furiously all throughout the class. Nowadays I produce detailed PowerPoint slides that include most of the material I cover, so I write very little on the chalkboard. And, because I can provide my slides to students before class – as a pdf that they can print and bring to class – the students are freed from scribbling furiously to capture every detail. Students still need to take some notes (my slides do not include every detail), but they are largely freed to listen to lecture and participate in class discussions. I am not convinced, though, that these changes have led to improved learning, at least not in all students. Having information too easily available, including downloadable class materials, seems to cause some students to actually disengage from class, and ultimately do poorly, possibly because they think they don’t need to attend class, or engage when they do attend, since they have all of the materials easily accessible to them outside the classroom?

What do you think the biggest challenges are for open science research today?

I have been amazed at how quickly open data has become accepted as the standard in the ecology and evolution research communities. When data policies were first proposed to journals there was substantial resistance to their adoption – journals were nervous about possibly driving away authors, and editors (who are also researchers) shared the views that were common in the community regarding ownership of their own data – but over just a few years the resistance largely disappeared among editors, societies and publishers, such that a large proportion of the top journals in the field have adopted policies requiring data to be published alongside research manuscripts. That said, some significant challenges remain, both on the researcher side and on the repository side. On the repository side, sustainable funding remains the largest hurdle. Data repositories cost money to run, such as for staff and infrastructure. Dryad has been relying on a mix of data publication charges (DPCs) and grants to fund its mission. This has worked for us so far, but constantly chasing grants is a lot of work for those writing grants, and the cost to researchers paying DPCs, albeit small, is not trivial for those without grant support.

On the researcher side, though data publishing has mostly become an accepted part of research publication in the community, there remain many important cultural and practical challenges to making open data universally practiced.  These include the development of standards for data citation and reuse (not restrictions on data reuse, but community expectations for citation and collaboration), balancing views of data ownership with the needs of the community, balancing the concerns of researchers that produce long-term datasets with those of the community, and others. We also need to improve education about data, such as teaching our students how to organize and properly annotate their datasets so that they are useful for other researchers after publication. Even when data are made available by researchers, actually using those data can be challenging if they are not well organized and annotated.

When researchers are deciding in which repository to deposit their research data, what values and functions should they consider?

Researchers should choose a repository that best fits the type of data they have to deposit and the community that will likely be reusing it. There are many repositories that handle specialized data types, such as genetic sequence data or data to be used for phylogenetic analysis. If your data suits a specialized archive, choose that. But the overwhelming majority of data generated by ecologists don’t fit into specialized archives. It’s for these types of data that Dryad was developed.

So what does Dryad offer researchers? From the perspective of the dataset author, Dryad links your dataset directly to the manuscript you have published about the dataset. This provides users detailed metadata on the contents of your dataset, helping them understand the dataset and use it correctly for future research. Dryad also ensures that your dataset is discoverable, whether you start at the journal page, on Dryad’s site, or any of a large number of collaborator services. The value of Dryad to the dataset user are similar – easy discoverability of data and clear links to the data collection details (i.e., links to the associated manuscripts).  

You’ve held several roles on Dryad’s Board of Directors – what about this organization compels you to volunteer your free time?

My experiences as a scientist, a journal editor, and participating in open data discussions have convinced me that data publication is an essential part of research publication. For decades, or even centuries, we’ve relied on a publishing model where researchers write manuscripts that describe the work they have done and summarize their results and conclusions for the broader community. That’s the typical journal paper, and was the limit of what could be done in an age where everything had to fit onto the printed page and be distributed on paper. Nowadays we have near infinite space in a digital medium to not just summarize our results, but also provide all of the details, including the actual data, as part of the research presentation. It will always be important to have an author summarize their findings and place their work into context – that intellectual contribution is an essential part of communicating your research – but there’s no reason that’s where we need to stop. I imagine a world where a reader can click on a figure, or table, or other part of a manuscript and be taken directly to the relevant details – the actual data presented in the figure, the statistical models underlying the analyses, more detailed descriptions of study sites or organisms, and possibly many other types of information about the experiment, data collection, equipment used, results, etc. We shouldn’t be constrained by historical limitations of the printed page. We’re not yet even close to where I think we can and should be  going, but making data an integral part of research publication is a huge step in the right direction. So I enthusiastically support journal mandates that require data to be published alongside each manuscript presenting research results. And facilitating this is a core part of Dryad’s mission, which leads me to enthusiastically support both Dryad’s mission and the organization itself!

Pumpkin or apple pie?  

Those are my two favorite pies, so it’s a tough question. If served a la mode, i.e., with ice cream, then I’d most often pick apple pie. But, without ice cream, I’d have to choose pumpkin pie.

Stay tuned for future conversations with industry thought leaders and other relevant blog posts here at Dryad News and Views.

 

Dryad celebrates international data

There’s been important discussion lately about how to make research more inclusive, equitable, diverse, and global. See the recent 2018 International Open Access Week, and International Data Week, happening now in Gaborne, Botswana, with the theme “Digital Frontiers of Global Science.”

Dryad is among these organizations seeking to provide sustainable, open scholarly infrastructure that is accessible to all. As such, we use the CC0 license exclusively, and offer fee waivers for researchers based in countries classified by the World Bank as low-income or lower-middle-income economies. Our burgeoning partnership with California Digital Library promises to make data publishing even easier for all researchers.

In celebration of a global perspective, the Dryad curation team has selected a few data packages that highlight both a wide geographic range and a collaborative approach to research projects.

Penguin imaging and classification in Antarctica

Screen Shot 2018-11-06 at 5.03.21 PM

Data from: Time-lapse imagery and volunteer classifications from the Zooniverse Penguin Watch project / associated article in Scientific Data

Data from: A remote-controlled observatory for behavioural and ecological research: a case study on emperor penguins / associated article in Methods in Ecology and Evolution

Antarctica may be a fine spot for penguins, but the cold conditions make it an inhospitable location for human beings to spend long periods. It is especially challenging for scientists engaged in gathering data under the frigid conditions and for their equipment. Two recent Dryad data packages highlight how scientists have addressed this chilly challenge with the use of remote observation systems. One provides data from a remote‐controlled system designed for information gathering, and the other employs citizen science to process large numbers of time-lapse images gathered remotely from an automated system.

The images that comprise the data from the Zooniverse project Penguin Watch are much more than just cool photos of penguins. They are the result of automated time-lapse cameras used for reliably and consistently monitoring wild penguin populations. The data includes 73,802 photos captured by 15 different Penguin Watch cameras, and the authors expressed the hope that annotated time-lapse imagery can be used to train machine learning algorithms to extract data automatically and perhaps for computer vision development.

The video and images from Richter et al. were taken by a self-sufficient remote-controlled observatory designed to operate year-round in extreme cold-weather conditions. The observatory has been capturing high-resolution images of penguins, along with other data, since 2013 using “multiple overview cameras and a high-resolution steerable camera with a telephoto lens.” The resulting images and video provide information on the life cycle, demographics, and behavior of the animals. For example, the dataset shows how the movement of penguins as individuals and as a group might be associated with the speed and direction of the wind.

Both datasets show how remote observation systems can be used by human investigators in various locations to collect data on animal populations, even in areas of the world which provide challenges to scientists.

— Debra Fagan

Collaborating across disciplines in Indonesia

 

Data from: Competing for blood: the ecology of parasite resource competition in human malaria-helminth co-infections / associated article in Ecology Letters

An international team of researchers reveal new knowledge about “co-infections,” multiple infectious diseases that attack the immune system at once. Budischak et al. (2018) used principles of ecological theory to answer questions about helminth-malaria co-infection in human hosts. Rather than measuring prevalence of malaria after deworming, as previous studies had done with varied results, Budischak et al. measured the density of specific species within an individual over time.

The researchers hypothesized that competition for resources, in this case red blood cells, would have an affect on the density of those species within the host. Data and samples originally collected for a 2 year placebo-controlled deworming trial in Indonesia were analyzed, and they found that when bloodsucking helminth species were removed, the density of Plasmodium vivax, which rely specifically on young red blood cells, increased 2.75-fold. This increase is enough to adversely affect the health of an individual, and heighten the chances that mosquitoes will transmit the P. vivax from one individual to another.

The researchers suggest that where resources allow, health care providers should consider the specific species that are co-infecting an individual, and weigh the cost-benefits of deworming at that time. These findings lay the groundwork for novel treatments of malaria and worm infections.

— Erin Clary

Assessing the potential of environmental citizen science in East Africa

Screen Shot 2018-10-30 at 3.59.27 PM

Data from: Developing the global potential of citizen science: Assessing opportunities that benefit people, society and the environment in East Africa / associated article in the Journal of Applied Ecology

Citizen science projects often suffer from limited visibility in developing countries. Recognizing this difficulty, these authors undertook a collaborative process with experts to assess the potential for environmental citizen science in East Africa. The .csv file published in Dryad contains scores given by workshop participants in relation to various opportunities, benefits and barriers, which serve as the basis for principles that are applicable more widely.

Importantly, the project emphasizes the benefits of citizen science not just to the natural environment, but for creating a more informed and empowered populace.

Fighting lupus in Latin America

Screen Shot 2018-10-30 at 3.57.46 PM

Data from: First Latin American clinical practice guidelines for the treatment of systemic lupus erythematosusassociated article in Annals of the Rheumatic Diseases

Dryad recently published data underlying collaborative research by the Latin American Group for the Study of Lupus (GLADEL) and the Pan-American League of Associations of Rheumatology (PANLAR). Both groups consisted of experienced Latin American rheumatologists who gathered together in Panama City to discuss special problems faced by patients with systemic lupus erythematosus (SLE) in Latin America.

The group started the research process by putting together a list of questions addressing clinical issues most commonly seen in Latin American patients. The team used the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system to answer these questions with the best available evidence. Summarized preliminary findings were used to develop a framework for therapies and treatments. The underlying dataset published by Dryad consists of tables describing the groups’ main findings of therapeutic interventions by organ/systems in SLE using the GRADE approach.

This dataset has potential for reuse and would be an excellent resource for the of study of lupus in the hopes of improving outcomes in Latin America and worldwide.

— Shavon Stewart

The way forward at Dryad

Crossroads

Melissanne Scheld, Executive Director, takes time to reflect on the Dryad/CDL partnership and to share thoughts on the direction of this collaborative effort.

It’s been a fast two months since I joined Dryad at this pivotal and exciting juncture. As previously announced, this spring Dryad entered into a formal partnership with California Digital Library (CDL) to ensure long-term sustainability for Dryad and to reinforce two essential,  shared goals:

  1. Create sustainability for open-source, community-owned, data curation & publication infrastructure
  2. Drive adoption of curated data publishing in the research community.

Where we are

For the past decade, Dryad has served as a highly regarded, non-profit, curated repository for data research across disciplines. None of that is changing!

Going forward we need to better meet researchers within their own workflows. We need to make the action of submitting research data even easier so that it becomes a seamless step within the publishing process.

We are currently working to migrate the Dryad system onto CDL’s Dash platform. Using an Agile framework, developers from both Dryad and CDL are collaborating to build an open-source, nimble service that will offer a higher level of administrative functionality, an improved curation layer, and various submission options.

Where we’re going

Screen Shot 2018-10-22 at 4.28.18 PM

Researchers will find our new offering continues to meet funder requirements and sets the bar in best practices for data sharing. Using the FAIR data principles as a guide, the curation we perform on each dataset deposited eases findability and usability, while the new levels of enhanced integrations we plan to develop (more on this below) will further improve submitters’ workflows.

For institutions, we want to offer an infrastructure that supports local research data management through features including campus single sign-on, bespoke reporting, integration with local repositories, and campus co-branding. The global network of libraries, which CDL is part of, will help us reach a wider range institutions that are also looking for data management solutions.

Dryad has always had strong publisher support; our new offering will improve these partnerships through enhanced API integrations. Going forward we will build upon our publishing partners while also working with platform providers to develop direct integrations. This will provide a more automated submission process around the transmission of metadata and DOIs.

We want to build modular infrastructure that is future-proof. We should be thinking about data publishing both as its own entity and in conjunction with article publishing. There are many avenues for circulating research and data publishing should be a part of all of these. Publishing data should be as ‘easy’ and ‘standardized‘ as article publishing.

Along with more robust infrastructure, we need to rethink how we build Dryad’s sustainability.  As a small, lean, non-profit, we need to build financial models that don’t overburden any single segment of our community, but still allow us to support the high level of curation and preservation infrastructure for which Dryad is known.

We are currently market testing new models within our community and have been talking with institutions and publishers to hear how we can best support their data publishing needs and what shared costs might look like. We know that there has been a lot of talk lately in our wider community about membership models; early feedback from our partners indicates this is still the most favorable method for investing in long-term sustainability.

What will success look like for us?  

successThe Dryad/CDL partnership aims to create a self-sustaining, curated, digital data repository for researchers across all fields of inquiry, based on the needs of and supported by institutional and publisher community members. We are building from a strong foundation, have created a thoughtful roadmap through community feedback, and are confident we are on a pathway to sustainability.

Personally, I’m very excited about all of these changes and know that, in partnership with CDL, we will be able to better serve our community. I look forward to updating you on future developments, but in the meantime, please don’t hesitate to reach out to me at director@datadryad.org with any questions or comments.

Technical update — Schema.org and Google Dataset Search

36201321231_92a4ca0401_z

Image by Pete

A core part of Dryad’s mission is to make our data available as widely as possible. Although most users find Dryad content through our website or via links from journal articles, many users also find Dryad content through search aggregators and other third-party services. For our content to be available to these external services, we follow the FAIR principle of Interoperability and make metadata available through a number of machine-readable mechanisms, including OAI-PMH, the DataONE API, and RSS.

This year, we added support for a new machine-readable mechanism, the Schema.org metadata format. This format was originally developed by representatives of major search engines, including Google, Bing, and Yahoo. It has recently been endorsed by a number of data repositories, including Dryad. The Schema.org metadata format allows us to embed machine-readable descriptions of data directly into the same web pages that users use to view Dryad content.

For example, for this recently deposited data package, you can visit the web page to view information optimized for human users. But if you use your web browser’s option to “view source” on the page, you will find the following metadata embedded in the Schema.org format:

{
    "@context" : "http://schema.org/",
    "@type" : "Dataset",
    "@id" : "https://doi.org/10.5061/dryad.70d46",
    "name" : "Data from: Biodiverse cities: the nursery industry, 
    homeowners, and neighborhood differences drive urban tree
    composition",
    "author" : [ {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-2649-9159",
        "givenName" : "Meghan",
        "familyName" : "Avolio"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0001-7209-514X",
        "givenName" : "Diane",
        "familyName" : "Pataki"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-5215-4947",
        "givenName" : "Tara",
        "familyName" : "Trammell"
    }, {
        "@type" : "Person",
        "givenName" : "Joanna",
        "familyName" : "Endter-Wada"
    } ],
    "datePublished" : "2017-12-18",
    "description" : "In arid and semi-arid regions, where few if any 
    trees are native, city trees are largely human-planted. Societal 
    factors such as resident preferences for tree traits, nursery 
    offerings, and neighborhood characteristics are potentially key 
    drivers of urban tree community composition and diversity....",
    "keywords" : [ "urban tree diversity" ],
    "citation" : {
        "@type" : "Article",
        "identifier" : "doi:10.1002/ecm.1290"},
    "publisher" : {
        "@type" : "Organization",
        "name" : "Dryad Digital Repository",
        "url" : "https://datadryad.org"}
}

The Schema.org metadata is available for any search engines or other interested users to collect and use. Last week, we saw the first major use of this metadata, with the launch of the Google Dataset Search service. Although Google Dataset Search is still in beta, the initial version is promising. It is easy to search and find content from Dryad and other data repositories all within a single system.

We are proud to make Dryad content available through the Dataset Search, and we look forward to other organizations making use of our data in new and exciting ways!