Technical update — Schema.org and Google Dataset Search

36201321231_92a4ca0401_z

Image by Pete

A core part of Dryad’s mission is to make our data available as widely as possible. Although most users find Dryad content through our website or via links from journal articles, many users also find Dryad content through search aggregators and other third-party services. For our content to be available to these external services, we follow the FAIR principle of Interoperability and make metadata available through a number of machine-readable mechanisms, including OAI-PMH, the DataONE API, and RSS.

This year, we added support for a new machine-readable mechanism, the Schema.org metadata format. This format was originally developed by representatives of major search engines, including Google, Bing, and Yahoo. It has recently been endorsed by a number of data repositories, including Dryad. The Schema.org metadata format allows us to embed machine-readable descriptions of data directly into the same web pages that users use to view Dryad content.

For example, for this recently deposited data package, you can visit the web page to view information optimized for human users. But if you use your web browser’s option to “view source” on the page, you will find the following metadata embedded in the Schema.org format:

{
    "@context" : "http://schema.org/",
    "@type" : "Dataset",
    "@id" : "https://doi.org/10.5061/dryad.70d46",
    "name" : "Data from: Biodiverse cities: the nursery industry, 
    homeowners, and neighborhood differences drive urban tree
    composition",
    "author" : [ {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-2649-9159",
        "givenName" : "Meghan",
        "familyName" : "Avolio"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0001-7209-514X",
        "givenName" : "Diane",
        "familyName" : "Pataki"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-5215-4947",
        "givenName" : "Tara",
        "familyName" : "Trammell"
    }, {
        "@type" : "Person",
        "givenName" : "Joanna",
        "familyName" : "Endter-Wada"
    } ],
    "datePublished" : "2017-12-18",
    "description" : "In arid and semi-arid regions, where few if any 
    trees are native, city trees are largely human-planted. Societal 
    factors such as resident preferences for tree traits, nursery 
    offerings, and neighborhood characteristics are potentially key 
    drivers of urban tree community composition and diversity....",
    "keywords" : [ "urban tree diversity" ],
    "citation" : {
        "@type" : "Article",
        "identifier" : "doi:10.1002/ecm.1290"},
    "publisher" : {
        "@type" : "Organization",
        "name" : "Dryad Digital Repository",
        "url" : "https://datadryad.org"}
}

The Schema.org metadata is available for any search engines or other interested users to collect and use. Last week, we saw the first major use of this metadata, with the launch of the Google Dataset Search service. Although Google Dataset Search is still in beta, the initial version is promising. It is easy to search and find content from Dryad and other data repositories all within a single system.

We are proud to make Dryad content available through the Dataset Search, and we look forward to other organizations making use of our data in new and exciting ways!

Winter 2009 management board meeting

Gate at The British Library

Gate at the British Library
(source: gaspa)

The Dryad Management Board recently held their Winter 2009 meeting at the British Library Conference Center in London. The meeting was attended by 13 journal representatives and 4 members of the Dryad development team. A few highlights from the meeting:

Dryad now includes 489 data files in 163 data packages, though a large proportion of this content has been imported from the Systematic Biology archives.

The rate of submissions to Dryad is slowly increasing. Dryad has been able to accept submissions from authors since early 2009. Two journals, The American Naturalist and Molecular Ecology, have completed initial integration with Dryad, allowing their authors to use a more streamlined submission process. The Journal of Heredity is making progress on integration, and several other journals expect to integrate in the near future.

We are currently improving the user interface for locating and obtaining data. We are developing more sophisticated tools for curation, and we are working with several partner repositories to replicate content and provide federated searching services. For more detail, see the Dryad Development Plan.

The board discussed the role of identifiers in Dryad and whether DOIs should be assigned to Dryad’s holdings. Representatives from CrossRef and DataCite led discussions on the advantages of DOIs. The board unanimously recommended that each Dryad data package be given a DOI (a data package is all data associated with a single article). The executive committee will determine whether DOIs should be used at more granular levels (e.g., the individual files within a data package).

The longest discussion of the meeting focused on plans for transitioning Dryad from the current grant funding to a model that is more sustainable for the long term. Todd Vision presented a cost model created by the Dryad development team and consultant Lorraine Eakin. Consultants from Charles Beagrie Limited presented an analysis of expected staffing needs and potential revenue streams. The board provided guidance on the schedule and methods for pursuing revenue from a variety of sources.

Community engagement emerged as a critical factor in ensuring long-term sustainability. Towards that end, the board discussed many ideas for increasing the visibility of the repository. Notable steps include increasing the frequency of posts on this blog, having a more visible presence at scientific meetings, and expanding use of social networking tools like Facebook and Twitter.

Once the Dryad development team compiles all notes from the meeting, we will release a more detailed report.