Feeds:
Posts
Comments

Archive for the ‘Discoverability’ Category

In our latest post, our Executive Director Melissanne Scheld sits down with Dryad’s Board of Directors Chair, Professor Charles Fox, to discuss challenges researchers face today, how Dryad is helping alleviate some of those pain points, why Dryad has had such staying power in a quickly changing industry,  . . . and then we move on to dessert. 

Chuck Fox

Can you tell us a little about your professional background and how that intersects with Dryad’s mission?

I wear two hats in my professional life – I am an evolutionary ecologist who studies various aspects of insect biology at the University of Kentucky, and I am a journal editor (Executive Editor of Functional Ecology).

My involvement with open data and Dryad began fortuitously in 2006. The British Ecological Society was invited to send a representative to a Data Registry Workshop, organized by the Ecological Society of America, to be held that December in Santa Barbara, California. I am (and was at that time) an editor of one of the British Ecological Society’s journals, Functional Ecology, and I live in the U.S. So Lindsay Haddon, who was Publications Manager for the BES, asked me to attend the workshop  as their representative. Before that meeting I don’t recall having thought much about open data or data archives, but I was excited to attend the meeting in part because the topic intrigued me and, selfishly, because my parents live in southern California and this was an opportunity to visit them. The discussions at that meeting, plus those at a couple follow-up meetings over the next couple years, including one at NESCent in Durham, North Carolina, and another in Vancouver, convinced me that data publishing, and open data more generally, should be a part of research publication. So I began lobbying the BES to adopt an open data policy and become a founding member of Dryad. I wrote a proposed data policy – just a revision of the Journal Sata Archiving Policy, JDAP, that many ecology and evolution journals adopted – and submitted that proposal to the BES’ publication committee. It took a few years, but in 2011 the BES adopted that data policy across their suite of journals and became a member of Dryad. The BES has since been a strong supporter of open data and required data publication as a condition of publishing a manuscript in one of their journals. Probably because I was a vocal proponent of data policies at BES meetings (along with a few others, most notably Tim Coulson), I was nominated to be a Dryad board member, and was elected to the board in 2013.

As an educator,  what are some of the biggest changes you’ve seen in the classroom during your career?

When I started teaching, first as a graduate student (teaching assistant) and then as a young university professor, we didn’t have Powerpoint and digital projectors. So I made heavy use of a chalkboard (or dry erase board) during lecture, and used an overhead projector for more complicated graphics. Students had to take detailed notes on the lecture, which required them to write furiously all throughout the class. Nowadays I produce detailed PowerPoint slides that include most of the material I cover, so I write very little on the chalkboard. And, because I can provide my slides to students before class – as a pdf that they can print and bring to class – the students are freed from scribbling furiously to capture every detail. Students still need to take some notes (my slides do not include every detail), but they are largely freed to listen to lecture and participate in class discussions. I am not convinced, though, that these changes have led to improved learning, at least not in all students. Having information too easily available, including downloadable class materials, seems to cause some students to actually disengage from class, and ultimately do poorly, possibly because they think they don’t need to attend class, or engage when they do attend, since they have all of the materials easily accessible to them outside the classroom?

What do you think the biggest challenges are for open science research today?

I have been amazed at how quickly open data has become accepted as the standard in the ecology and evolution research communities. When data policies were first proposed to journals there was substantial resistance to their adoption – journals were nervous about possibly driving away authors, and editors (who are also researchers) shared the views that were common in the community regarding ownership of their own data – but over just a few years the resistance largely disappeared among editors, societies and publishers, such that a large proportion of the top journals in the field have adopted policies requiring data to be published alongside research manuscripts. That said, some significant challenges remain, both on the researcher side and on the repository side. On the repository side, sustainable funding remains the largest hurdle. Data repositories cost money to run, such as for staff and infrastructure. Dryad has been relying on a mix of data publication charges (DPCs) and grants to fund its mission. This has worked for us so far, but constantly chasing grants is a lot of work for those writing grants, and the cost to researchers paying DPCs, albeit small, is not trivial for those without grant support.

On the researcher side, though data publishing has mostly become an accepted part of research publication in the community, there remain many important cultural and practical challenges to making open data universally practiced.  These include the development of standards for data citation and reuse (not restrictions on data reuse, but community expectations for citation and collaboration), balancing views of data ownership with the needs of the community, balancing the concerns of researchers that produce long-term datasets with those of the community, and others. We also need to improve education about data, such as teaching our students how to organize and properly annotate their datasets so that they are useful for other researchers after publication. Even when data are made available by researchers, actually using those data can be challenging if they are not well organized and annotated.

When researchers are deciding in which repository to deposit their research data, what values and functions should they consider?

Researchers should choose a repository that best fits the type of data they have to deposit and the community that will likely be reusing it. There are many repositories that handle specialized data types, such as genetic sequence data or data to be used for phylogenetic analysis. If your data suits a specialized archive, choose that. But the overwhelming majority of data generated by ecologists don’t fit into specialized archives. It’s for these types of data that Dryad was developed.

So what does Dryad offer researchers? From the perspective of the dataset author, Dryad links your dataset directly to the manuscript you have published about the dataset. This provides users detailed metadata on the contents of your dataset, helping them understand the dataset and use it correctly for future research. Dryad also ensures that your dataset is discoverable, whether you start at the journal page, on Dryad’s site, or any of a large number of collaborator services. The value of Dryad to the dataset user are similar – easy discoverability of data and clear links to the data collection details (i.e., links to the associated manuscripts).  

You’ve held several roles on Dryad’s Board of Directors – what about this organization compels you to volunteer your free time?

My experiences as a scientist, a journal editor, and participating in open data discussions have convinced me that data publication is an essential part of research publication. For decades, or even centuries, we’ve relied on a publishing model where researchers write manuscripts that describe the work they have done and summarize their results and conclusions for the broader community. That’s the typical journal paper, and was the limit of what could be done in an age where everything had to fit onto the printed page and be distributed on paper. Nowadays we have near infinite space in a digital medium to not just summarize our results, but also provide all of the details, including the actual data, as part of the research presentation. It will always be important to have an author summarize their findings and place their work into context – that intellectual contribution is an essential part of communicating your research – but there’s no reason that’s where we need to stop. I imagine a world where a reader can click on a figure, or table, or other part of a manuscript and be taken directly to the relevant details – the actual data presented in the figure, the statistical models underlying the analyses, more detailed descriptions of study sites or organisms, and possibly many other types of information about the experiment, data collection, equipment used, results, etc. We shouldn’t be constrained by historical limitations of the printed page. We’re not yet even close to where I think we can and should be  going, but making data an integral part of research publication is a huge step in the right direction. So I enthusiastically support journal mandates that require data to be published alongside each manuscript presenting research results. And facilitating this is a core part of Dryad’s mission, which leads me to enthusiastically support both Dryad’s mission and the organization itself!

Pumpkin or apple pie?  

Those are my two favorite pies, so it’s a tough question. If served a la mode, i.e., with ice cream, then I’d most often pick apple pie. But, without ice cream, I’d have to choose pumpkin pie.

Stay tuned for future conversations with industry thought leaders and other relevant blog posts here at Dryad News and Views.

 

Read Full Post »

36201321231_92a4ca0401_z

Image by Pete

A core part of Dryad’s mission is to make our data available as widely as possible. Although most users find Dryad content through our website or via links from journal articles, many users also find Dryad content through search aggregators and other third-party services. For our content to be available to these external services, we follow the FAIR principle of Interoperability and make metadata available through a number of machine-readable mechanisms, including OAI-PMH, the DataONE API, and RSS.

This year, we added support for a new machine-readable mechanism, the Schema.org metadata format. This format was originally developed by representatives of major search engines, including Google, Bing, and Yahoo. It has recently been endorsed by a number of data repositories, including Dryad. The Schema.org metadata format allows us to embed machine-readable descriptions of data directly into the same web pages that users use to view Dryad content.

For example, for this recently deposited data package, you can visit the web page to view information optimized for human users. But if you use your web browser’s option to “view source” on the page, you will find the following metadata embedded in the Schema.org format:

{
    "@context" : "http://schema.org/",
    "@type" : "Dataset",
    "@id" : "https://doi.org/10.5061/dryad.70d46",
    "name" : "Data from: Biodiverse cities: the nursery industry, 
    homeowners, and neighborhood differences drive urban tree
    composition",
    "author" : [ {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-2649-9159",
        "givenName" : "Meghan",
        "familyName" : "Avolio"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0001-7209-514X",
        "givenName" : "Diane",
        "familyName" : "Pataki"
    }, {
        "@type" : "Person",
        "@id" : "http://orcid.org/0000-0002-5215-4947",
        "givenName" : "Tara",
        "familyName" : "Trammell"
    }, {
        "@type" : "Person",
        "givenName" : "Joanna",
        "familyName" : "Endter-Wada"
    } ],
    "datePublished" : "2017-12-18",
    "description" : "In arid and semi-arid regions, where few if any 
    trees are native, city trees are largely human-planted. Societal 
    factors such as resident preferences for tree traits, nursery 
    offerings, and neighborhood characteristics are potentially key 
    drivers of urban tree community composition and diversity....",
    "keywords" : [ "urban tree diversity" ],
    "citation" : {
        "@type" : "Article",
        "identifier" : "doi:10.1002/ecm.1290"},
    "publisher" : {
        "@type" : "Organization",
        "name" : "Dryad Digital Repository",
        "url" : "https://datadryad.org"}
}

The Schema.org metadata is available for any search engines or other interested users to collect and use. Last week, we saw the first major use of this metadata, with the launch of the Google Dataset Search service. Although Google Dataset Search is still in beta, the initial version is promising. It is easy to search and find content from Dryad and other data repositories all within a single system.

We are proud to make Dryad content available through the Dataset Search, and we look forward to other organizations making use of our data in new and exciting ways!

Read Full Post »

Alfred P. Sloan Foundation grant will fund implementation of shared staffing model across 7 academic libraries and Dryad

We’re thrilled to announce that Dryad will participate in a three-year, multi-institutional effort to launch the Data Curation Network. The implementation — led by the University of Minnesota Libraries and backed by a $526,438 grant from the Alfred P. Sloan Foundation — builds on previous work to better support researchers faced with a growing number of requirements to openly and ethically share their research data.

The result of many months of research and planning, the project brings together eight partners:

Currently, staff at each of these institutions provide their own data curation services. But because data curation requires a specialized skill set — spanning a wide variety of data types and discipline-specific data formats — institutions cannot reasonably expect to hire an expert in each area.

Curation workflow for the DCN

The intent of the Data Curation Network is to serve as a cross-institutional staffing model that seamlessly connects a network of expert data curators to local datasets and to supplement local curation expertise. The project aims to increase local capacity, strengthen cross-institutional collaboration, and ensure that researchers and institutions ethically and appropriately share data.

Lisa R. Johnston, Principal Investigator for the DCN and Director of the Data Repository for the University of Minnesota (DRUM), explains:

Functionally, the Data Curation Network will serve as the ‘human layer’ in a local data repository stack that provides expert services, incentives for collaboration, normalized curation practices, and professional development training for an emerging data curator community.

For our part, the Dryad curation team is excited to join a collegial network of professionals, to help develop shared procedures and understandings, and to learn from the partners’ experience and expertise (as they may learn from ours).

As an independent, non-profit repository, we are especially pleased to get to work more closely with the academic library community, and hope this project can provide a launchpad for future, international collaborations among organizations with similar missions but differing structures and funding models.

Watch this space for news as the project develops, and follow the DCN on Twitter: #DataCurationNetwork

Read Full Post »

Chain link fence with highway in backgroundDryad is a curated, non-profit, general-purpose repository specifically for data underlying scientific and medical publications — mainly journal articles. As such, we place great importance on linking data packages to the articles with which they are associated, and we try our best to encourage authors and journals to link back to the Dryad data from the article, ideally in the form of a reference in the works cited section. (There’s still a long way to go in this latter effort; see this study from 2016 for evidence).

Submission integration provides closer coordination between Dryad and journals throughout the publishing workflow, and simplifies the data submission process for authors. We’ve already implemented this free service with 120 journals. If you’re interested in integrating your journal, please contact us.

We’re excited to share a few recent updates that are helping to make our data-article linkages more efficient, discoverable, and re-usable by other publishers/systems.

The Automated Publication Updater

One of the greatest housekeeping challenges for our curation team lies in finding out when the articles associated with Dryad data packages become available online. Once they do, we want to add the article citation and DOI link to our record as quickly as possible, and to release any data embargoes placed “until the article appears.” Historically, we’ve achieved this through a laborious patchwork of web searches, journal alert emails, and notifications from authors or editors themselves.

But over the past year or so, we’ve built and refined a webapp that we call the APU (or Automated Publication Updater). This super-handy tool essentially compares data packages in the Dryad workflow with publication metadata available at Crossref. When a good match is found, it automatically updates article-related fields in the Dryad record, and then sends our curation team an email alert so they they can validate the match and finalize the record. The webapp can be easily run by curators as often as needed (usually a few times a week).

While the APU doesn’t find everything, it has dramatically improved both efficiency with which we add article information and links to Dryad records — and our curators’ happiness levels. Big win. (If you’re interested in the technical details, you can find them on our wiki).

Scholix

Dryad is also pleased to be a contributor to Scholix, or Scholarly Link Exchange, an initiative of the Research Data Alliance (RDA) and the World Data System (WDS). Scholix is a high-level interoperability framework for exchanging information about the links between scholarly literature and data.

  • The problem: Many disconnected sources of scholarly output, with different practices including various persistent identifier (PID) systems, ways of referencing data, and timing of citing data.
  • The Scholix solutionA standard set of guidelines for exposing and consuming data-article links, using a system of hubs.

Here’s how it works:

  1. As a DataCite member repository, Dryad provides our data-publication links to DataCite, one of the Scholix Hubs. 
  2. Those links are made available via Scholix aggregators such as the DLI service
  3. Publishers can then query the DLI to find datasets related to their journal articles, and generate/display a link back to Dryad, driving web traffic to us, increasing data re-use, and facilitating research discovery.

Crossref publishers, DataCite repositories/data centers, and institutional repositories can all participate — information on how is available on the Scholix website.

Programmatic data access by ISSN

Did you know that content in Dryad is available via a variety of APIs (Application Program Interfaces)? Details are available at the “Data Access” page on our wiki.

The newest addition to this list is the ability to access Dryad data packages via journal ISSN. So, for example, if you wanted access to all Dryad content associated with the journal Evolution Letters, you would format your query as follows:

https://datadryad.org/api/v1/journals/2056-3744/packages

If you’re a human instead of a machine, you might prefer to visit our “journal page” for Evolution Letters:

https://datadryad.org/journal/2056-3744

————

Dryad is committed to values of openness, collaboration, standardization, seamless integration, reduction of duplication and effort, and increased visibility of research products (okay, data especially). The above examples are just some of the ways we’re working in this direction.

If you’re part of an organization who shares these values, please contact us to find out how you can be part of Dryad.

Read Full Post »

We present a guest post from researcher Falk Lüsebrink highlighting the benefits of data sharing. Falk is currently working on his PhD in the Department of Biomedical Magnetic Resonance at the Otto-von-Guericke University in Magdeburg, Germany. Here, he talks about his experience of sharing early MRI data and the unexpected impact that it is having on the research community.

Early release of data

The first time I faced a decision about publishing my own data was while writing a grant proposal. One of our proposed objectives was to acquire ultrahigh resolution brain images in vivo, making use of an innovative development: a combination of an MR scanner with ultrahigh field strength and a motion correction setup to remediate subject motion during data acquisition. While waiting for the funding decision, I simply could not resist acquiring a first dataset. We scanned a highly experienced subject for several hours, allowing us to acquire in vivo images of the brain with a resolution far beyond anything achieved thus far.

 MRI data showing the cerebellum in vivo

MRI data showing the cerebellum in vivo at (a) neuroscientific standard resolution of 1 mm, (b) our highest achieved resolution of 250 µm, and (c) state-of-the-art 500 µm resolution.

When our colleagues saw the initial results, they encouraged us to share the data as soon as possible. Through Scientific Data and Dryad, we were able to do just that. The combination of a peer-reviewed open access journal and an open access digital repository for the data was perfect for presenting our initial results.

17,000 downloads and more

‘Sharing the wealth’ seems to have been the right decision; in the three months since we published our data, there has been an enormous amount of activity:

A distinct need for data re-use

MRI studies are highly interdisciplinary, opening up numerous opportunities for sharing and re-using data. For example, our data might be used to build MR brain atlases and illustrate brain structures in much greater detail, or even for the first time. This could advance our understanding of brain functions. Algorithms used to quantify brain structures needed in the research of neurodegenerative disorders could be enhanced, increasing accuracy and reproducibility. Furthermore, by making available raw signals measured by the MR scanner, image reconstruction methods could be used to refine image quality or reduce the time it takes to collect the data.

There are also opportunities beyond those that our particular dataset offers. A recent emerging trend in MRI comes from the field of machine learning. Neuronal networks are being built to perform and potentially improve all kinds of tasks, from image reconstruction, to image processing, and even diagnostics. To train such networks, huge amounts of data are necessary; these data could come from repositories open to the public. Such re-use of MRI data by researchers in other disciplines is having a strong impact on the advancement of science. By publicly sharing our data, we are allowing others to pursue new and exciting directions.

Download the data for yourself and see what you can do with it. In the meantime, I am still eagerly awaiting the acceptance of the grant application . . . but that’s a different story.

The data: http://dx.doi.org/10.5061/dryad.38s74

The article: http://dx.doi.org/10.1038/sdata.2017.32

— Falk Lüsebrink

Read Full Post »

Did you ever wonder what goes on behind the scenes when Dryad curators review data files submitted by authors?  There are no wizards behind our curtains, just real live information specialists and trained data curators.

by Kaptain Kobold via Flickr

by Kaptain Kobold via Flickr

Dryad’s curation process is intentionally lightweight, so it doesn’t delay the availability of the data. Curators don’t review the scientific merit of the files – that is left to peer reviewers and the scientific community. Instead, we rely on our curators’ expertise in library and information science to ensure the integrity and preservation of the data.

Curators perform basic checks on each submission (can the files be opened? are they free of copyright restrictions? do they appear to be free of sensitive data?). The completeness and correctness of the metadata is checked and the DOI is officially registered. During their work, Dryad curators encounter thousands of data files in any number of file formats. Our team examines all of these data files to ensure they do, in fact, include data, and not manuscripts, or pictures of kittens.

Curators may communicate directly with submitters to address issues and/or to make suggestions about enhancing the description and reusability of the data package. They can also create new versions of data packages should corrections or additions be needed after archiving. Ultimately, the responsibility for the content of the files rests with the submitters, but Dryad’s curators can help to catch and fix many common problems – and some rare ones, too.

fileTypes_wordleSince Dryad’s inception, curation operations have been led by the Metadata Research Center (or MRC) directed by Dr. Jane Greenberg, initially at the University of North Carolina at Chapel Hill, and now at Drexel University. The team is supervised by Senior Curator Erin Clary, and currently includes all students in, or graduates of, Library and Information Science (LIS) or Informatics Master’s programs.

So, (wizard) hats off to all our behind-the-curtains data curators, whose vital contributions ensure that the data in the repository is findable and usable. If you have a question about Dryad curation or need advice on preparing your data for archiving, don’t hesitate to email us at curator@datadryad.org.

Read Full Post »

The Data Citation Synthesis Group has released a draft Declaration of Data Citation Principles and invites comment.

This has been a very interesting and positive collaborative process and has involved a number of groups and committed individuals. Encouraging the practice of data citation, it seems to me, is one of the key steps towards giving research data its proper place in the literature.

As the preamble to the draft principles states:

Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice.

In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data citation.

Please do comment on these principles. We hope that with community feedback and support, a finalised set of principles can be widely endorsed and adopted.

Discussion on a variety of lists is welcome, of course. However, if you want the Synthesis Group to take full account of your views, please be sure to post your comments on the discussion forum.

Some notes and observations on the background to these principles

I would like to add here some notes and observations on the genesis of these principles. As has been widely observed there have been a number of groups and interested parties involved in exploring the principles of data citation for a number of years. Mentioning only some of the sources and events that affected my own thinking on the matter, there was the 2007 Micah Altman and Gary King article, in DLib, which offered ‘A Proposed Standard for the Scholarly Citation of Quantitative Data’ and Toby Green’s OECD White Paper ‘We need publishing standards for datasets and data tables’ in 2009. Micah Altman and Mercè Crosas organised a workshop at Harvard in May 2011 on Data Citation Principles. Later the same year, the UK Digital Curation Centre published a guide to citing data in 2011.

The CODATA-ICSTI Task Group on Data Citation Standards and Practices (co-chaired by Christine Borgman, Jan Brase and Sara Callaghan) has been in existence since 2010. In collaboration with the US National CODATA Committee and the Board on Research Data and Information, a major workshop was organised in August 2011, which was reported in ‘For Attribution: Developing Data Attribution and Citation Practices and Standards’.

The CODATA-ICSTI Task Group then started work on a report covering data citation principles, eventually entitled ‘Out of Cite, Out of Mind’ – drafts were circulated for comment in April 2013 and the final report was released in September 2013.

Following the first ‘Beyond the PDF’ meeting in Jan 2011 participants produced the Force11 Manifesto ‘Improving Future Research Communication and e-Scholarship’ which places considerable weight on the availability of research data and the citation of those data in the literature. At ‘Beyond the PDF II’ in Amsterdam, March 2013, a group comprising Mercè Crosas, Todd Carpenter, David Shotton and Christine Borgman produced ‘The Amsterdam Manifesto on Data Citation Principles’. In the very same week, in Gothenburg, an RDA Birds of a Feather group was discussing the more specific problem of how to support, technologically, the reliable and efficient citation of dynamically changing or growing datasets and subsets thereof. And the broader issues of the place of data and research publication were being considered in the ICSU World Data Service Working Group on Data Publication. This group has, in turn, formed the basis for an RDA Interest Group.  Oooffff!

How great a thing is collaboration?

From June 2013, as the Force11 Group was preparing its website and activities to take forward the work on the Amsterdam Manifesto, calls came in from a number of sources for these various groups and initiatives to coordinate and collaborate. This was admirably well-received and from July the ‘Data Citation Synthesis Group’ had come into being with an agreed mission statement:

The data citation synthesis group is a cross-team committee leveraging the perspectives from the various existing initiatives working on data citation to produce a consolidated set of data citation principles (based on the Amsterdam Manifesto, the CODATA and other sets of principles provided by others) in order to encourage broad adoption of a consistent policy for data citation across disciplines and venues. The synthesis group will review existing efforts and make a set of recommendations that will be put up for endorsement by the organizations represented by this synthesis group.

The synthesis group will produce a set of principles, illustrated with working examples, and a plan for dissemination and distribution. This group will not be producing detailed specifications for implementation, nor focus on technologies or tools.

As has been noted elsewhere , the group comprised 40 individuals and brought together a large number of organisations and initiatives. What followed over the summer was a set of weekly calls to discuss and align the principles. I must say, I thought these were admirably organised and benefitted considerably from participants’ efforts to prepare documents comparing the various groups’ statements. The face-to-face meeting of the group, in which a lot of detailed discussion to finalise the draft was undertaken, was hosted (with a funding contribution from CODATA) at the US National Academies of Science between the 2nd RDA Plenary and the DataCite Summer Meeting (which CODATA also co-sponsored). It has been intellectually stimulating and a real pleasure to contribute to these discussions and to witness so many informed and engaged people bashing out these issues.

The principles developed by the Synthesis Group are now open for comment and I urge as many people, researchers, editors and publishers as possible who believe that data has a place in scholarly communications to comment on them and, in due course, to endorse them and put them into practice.

Are we finally at the cusp of real change in practice? Will we now start seeing the practice of citing data sources become more and more widespread? It’s soon to say for sure, but I hope these principles, and the work on which they build, have got us to a stage where we can start really believing the change is well underway.

Simon Hodson is Executive Director of CODATA and a member of the Dryad Board of Directors.  This post was originally published on the CODATA blog.

Read Full Post »