Feeds:
Posts
Comments

Archive for the ‘Data availability’ Category

Two cheetahs running

Image credit Cat Specialist Group, catsg.org

Dryad is thrilled to announce a strategic partnership with California Digital Library (CDL) to address researcher needs by leading an open, community-supported initiative in research data curation and publishing.

Dryad was founded 10 years ago with the mission of providing open, not-for-profit infrastructure for data underlying the scholarly literature, and the vision of promoting a world where research data is openly available and routinely re-used to create knowledge.

20,000 data publications later, that message has clearly resonated. The Dryad model of embedding data publication within journal workflows has proven highly effective, and combined with our data curation expertise, has made Dryad a name that is both known and trusted in the research community. But a lot has changed in the data publishing space since 2008, and Dryad needs to change with it.

Who/what is CDL?

CDL LoroCDL was founded by the University of California in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. Since then, in collaboration with the UC libraries and other partners, they have assembled one of the world’s leading digital research libraries and changed the ways that faculty, students, and researchers discover and access information.

CDL has long-standing interest and experience in research data management (RDM) and data publishing. CDL’s digital curation program, the University of California Curation Center (UC3), provides digital preservation, data curation, and data publishing services, and has a history of coordinating collaborative projects regionally, nationally, and internationally. It is baked into CDL’s strategic vision to build partnerships to better promote and make an impact in the library, open research, and data management spaces (e.g., DMPTool, HathiTrust).

Why a partnership?

CDL and Dryad have a shared mission of increasing the adoption and availability of open data. By joining forces, we can have a much bigger impact. This partnership is focused on combining CDL’s institutional relationships, expertise, and nimble technology with Dryad’s position in the researcher community, curation workflows, and publisher relationships. By working together, we plan to create global efficiencies and minimize needless duplication of effort across institutions, freeing up time and funds, and, in particular, allowing institutions with fewer resources to support research data publishing and ensure data remain open.

Our joint Dryad-CDL initiative will increase adoption of open data by meeting researchers where they already are. We will leverage the strengths of both organizations to offer new products and services and to build broad, sustainable, and productive approaches to data curation. We plan to move quickly to provide new value:

  • For researchers: We will launch a new, modern and easier-to-use platform. This will provide a higher level of service, and even more seamless integration into regular workflows than Dryad currently offers
  • For journals and publishers: We will offer new integration paths that will allow direct communication with manuscript processing systems, better reporting, and more comprehensive curation services
  • For academic institutions: We will work directly with institutions to craft right-sized offerings to meet your needs

We have many details to hammer out and a lot of work to do, but among our first steps will be to reach out to you — each of the groups above — to discuss your needs, wants, and preferred methods of supporting this effort. With your help, the partnership will help us grow Dryad as a globally-accessible, community-led, non-commercial, low-cost service that focus on breaking down silos between publishing, libraries, and research.

As this partnership is taking shape, we ask for community input on how our collective efforts can best meet the needs of researchers, publishers, and institutions. Please stay tuned for further announcements and information over the coming months. We hope you share our excitement as we step into Dryad’s next chapter.

Read Full Post »

Dryad is a general purpose repository for data underlying scholarly publications. Each new submission we receive is reviewed by our curation team before the data are archived. Our main priority is to ensure compliance with Dryad’s Terms of Service, but we also strongly believe that curation activities add value to your data publication, since curated data are more likely to be FAIR (findable, accessible, interoperable, and reusable).

FAIR

Before we register a DOI, a member of our curation team will check each data package to ensure that the data files can be opened, that they appear to contain information associated with a scientific publication, and that metadata for the associated publication are technically correct. We prefer common, non-proprietary file types and thorough documentation, and we may reach out if we are unable to view files as provided.

Our curators are also on the lookout for sensitive information such as personally identifiable human subjects data or protected location information, and for files that contain copyright and license statements that are incompatible with our required CC0 waiver.

To make the data archiving process more straightforward for authors, our curation team has authored sets of guidelines that may be consulted when preparing a data submission for a public repository such as Dryad. We hope these guidelines will help you as you prepare your Dryad data package, and that they will lessen the amount of time from point of submission to registered data DOI!

A series of blog posts will highlight each of the guidelines we’ve created. First up is our best practices for sharing human subjects data in an open access repository, from former Dryad curator Rebecca Kameny.

— Erin Clary, Senior Curator – curator@datadryad.org

_______________

Preparing human subject data for open access

Collecting, cleaning, managing, and analyzing your data is one thing, but what happens when you are ready to share your data with other researchers and the public?

peopleBecause our researchers come from fields that run the gamut of academia — from biology, ecology, and medicine, to engineering, agriculture, and sociology — and because almost any field can make use of data from human subjects, we’ve provided guidance for preparing such data for open access. We based our recommendations and requirements on well-respected national and international sources from government institutions, universities, and peer-reviewed publications.

Dryad curators will review data files for compliance with these recommendations, and may make suggestions to authors, however, authors who submit data to Dryad are ultimately responsible for ensuring that their data are properly anonymized and can be shared in a public repository.

handle-43946_960_720In a nutshell, Dryad does not allow any direct identifiers, but we do allow up to three indirect identifiers. Sound simple? It’s not. If the study involves a vulnerable population (such as children or indigenous people), if the number of participants is small, or if the data are sensitive (e.g., HIV status, drug use), three indirect identifiers may be too many. We evaluate each submission on a case-by-case basis.

If you have qualitative data, you’ll want to pay close attention to open-ended text, and may need to replace names with pseudonyms or redact identifiable text.

Quick tips for preparing human subjects data for sharing

  • Ensure that there are no direct identifiers.
  • Remove any nonessential identifying details.
  • Reduce the precision of a variable – e.g., remove day and month from date of birth; use county instead of city; add or subtract a randomly chosen number.
  • Aggregate variables that are potentially revealing, such as age.
  • Restrict the upper or lower ranges of a continuous variable to hide outliers by collapsing them into a single code.
  • Combine variables by merging data from two variables into a summary variable.

It’s also good research practice to provide clear documentation of your data in a README file. Your README should define your variables and allowable values, and can be used to alert users to any changes you made to the original dataset to protect participant identity.

Our guidelines expand upon the tips above, and link to some useful references that will provide further guidance to anyone who would like to share human subjects data safely.

Read Full Post »

Chain link fence with highway in backgroundDryad is a curated, non-profit, general-purpose repository specifically for data underlying scientific and medical publications — mainly journal articles. As such, we place great importance on linking data packages to the articles with which they are associated, and we try our best to encourage authors and journals to link back to the Dryad data from the article, ideally in the form of a reference in the works cited section. (There’s still a long way to go in this latter effort; see this study from 2016 for evidence).

Submission integration provides closer coordination between Dryad and journals throughout the publishing workflow, and simplifies the data submission process for authors. We’ve already implemented this free service with 120 journals. If you’re interested in integrating your journal, please contact us.

We’re excited to share a few recent updates that are helping to make our data-article linkages more efficient, discoverable, and re-usable by other publishers/systems.

The Automated Publication Updater

One of the greatest housekeeping challenges for our curation team lies in finding out when the articles associated with Dryad data packages become available online. Once they do, we want to add the article citation and DOI link to our record as quickly as possible, and to release any data embargoes placed “until the article appears.” Historically, we’ve achieved this through a laborious patchwork of web searches, journal alert emails, and notifications from authors or editors themselves.

But over the past year or so, we’ve built and refined a webapp that we call the APU (or Automated Publication Updater). This super-handy tool essentially compares data packages in the Dryad workflow with publication metadata available at Crossref. When a good match is found, it automatically updates article-related fields in the Dryad record, and then sends our curation team an email alert so they they can validate the match and finalize the record. The webapp can be easily run by curators as often as needed (usually a few times a week).

While the APU doesn’t find everything, it has dramatically improved both efficiency with which we add article information and links to Dryad records — and our curators’ happiness levels. Big win. (If you’re interested in the technical details, you can find them on our wiki).

Scholix

Dryad is also pleased to be a contributor to Scholix, or Scholarly Link Exchange, an initiative of the Research Data Alliance (RDA) and the World Data System (WDS). Scholix is a high-level interoperability framework for exchanging information about the links between scholarly literature and data.

  • The problem: Many disconnected sources of scholarly output, with different practices including various persistent identifier (PID) systems, ways of referencing data, and timing of citing data.
  • The Scholix solutionA standard set of guidelines for exposing and consuming data-article links, using a system of hubs.

Here’s how it works:

  1. As a DataCite member repository, Dryad provides our data-publication links to DataCite, one of the Scholix Hubs. 
  2. Those links are made available via Scholix aggregators such as the DLI service
  3. Publishers can then query the DLI to find datasets related to their journal articles, and generate/display a link back to Dryad, driving web traffic to us, increasing data re-use, and facilitating research discovery.

Crossref publishers, DataCite repositories/data centers, and institutional repositories can all participate — information on how is available on the Scholix website.

Programmatic data access by ISSN

Did you know that content in Dryad is available via a variety of APIs (Application Program Interfaces)? Details are available at the “Data Access” page on our wiki.

The newest addition to this list is the ability to access Dryad data packages via journal ISSN. So, for example, if you wanted access to all Dryad content associated with the journal Evolution Letters, you would format your query as follows:

https://datadryad.org/api/v1/journals/2056-3744/packages

If you’re a human instead of a machine, you might prefer to visit our “journal page” for Evolution Letters:

https://datadryad.org/journal/2056-3744

————

Dryad is committed to values of openness, collaboration, standardization, seamless integration, reduction of duplication and effort, and increased visibility of research products (okay, data especially). The above examples are just some of the ways we’re working in this direction.

If you’re part of an organization who shares these values, please contact us to find out how you can be part of Dryad.

Read Full Post »

Keeping research data open and accessible has always been our goal at Dryad. Now, we’ve partnered with Data Archiving and Networked Services (DANS) to ensure long-term preservation of curated data. We are proud to be taking this step to safeguard open data and ensure future discoverability.

Public content on Dryad servers, currently over 15,000 data packages and 50,000 files, will soon be backed up in the DANS archive regularly (with multiple copies in different locations), to add an extra layer of protection.

DANS will also serve as Dryad’s successor archive, to ensure that functionality of Dryad Digital Object Identifiers (DOIs) is maintained for the long term. Metadata will be available in open access format to all researchers using the DANS online archiving system, EASY.

This partnership ensures that data in Dryad will remain accessible and linked to the scholarly literature in the unlikely case of disruption of Dryad services. DANS has proven to be a natural fit for us in this effort. Dryad and DANS share a deep commitment to the stewardship of global scientific data on behalf of more than 50,000 researchers who trust us with their data and hundreds of publishing partners working with Dryad.

Henk Harmsen, Deputy director of DANS, says:

Together with Dryad we are committed to making digital research data and related outputs Findable, Accessible, Interoperable, and Reusable (FAIR). This collaboration minimizes the risk of loss or corruption of data over time. We are pleased to extend our capacity and data archive by partnering with Dryad.

Read Full Post »

We present a guest post from researcher Falk Lüsebrink highlighting the benefits of data sharing. Falk is currently working on his PhD in the Department of Biomedical Magnetic Resonance at the Otto-von-Guericke University in Magdeburg, Germany. Here, he talks about his experience of sharing early MRI data and the unexpected impact that it is having on the research community.

Early release of data

The first time I faced a decision about publishing my own data was while writing a grant proposal. One of our proposed objectives was to acquire ultrahigh resolution brain images in vivo, making use of an innovative development: a combination of an MR scanner with ultrahigh field strength and a motion correction setup to remediate subject motion during data acquisition. While waiting for the funding decision, I simply could not resist acquiring a first dataset. We scanned a highly experienced subject for several hours, allowing us to acquire in vivo images of the brain with a resolution far beyond anything achieved thus far.

 MRI data showing the cerebellum in vivo

MRI data showing the cerebellum in vivo at (a) neuroscientific standard resolution of 1 mm, (b) our highest achieved resolution of 250 µm, and (c) state-of-the-art 500 µm resolution.

When our colleagues saw the initial results, they encouraged us to share the data as soon as possible. Through Scientific Data and Dryad, we were able to do just that. The combination of a peer-reviewed open access journal and an open access digital repository for the data was perfect for presenting our initial results.

17,000 downloads and more

‘Sharing the wealth’ seems to have been the right decision; in the three months since we published our data, there has been an enormous amount of activity:

A distinct need for data re-use

MRI studies are highly interdisciplinary, opening up numerous opportunities for sharing and re-using data. For example, our data might be used to build MR brain atlases and illustrate brain structures in much greater detail, or even for the first time. This could advance our understanding of brain functions. Algorithms used to quantify brain structures needed in the research of neurodegenerative disorders could be enhanced, increasing accuracy and reproducibility. Furthermore, by making available raw signals measured by the MR scanner, image reconstruction methods could be used to refine image quality or reduce the time it takes to collect the data.

There are also opportunities beyond those that our particular dataset offers. A recent emerging trend in MRI comes from the field of machine learning. Neuronal networks are being built to perform and potentially improve all kinds of tasks, from image reconstruction, to image processing, and even diagnostics. To train such networks, huge amounts of data are necessary; these data could come from repositories open to the public. Such re-use of MRI data by researchers in other disciplines is having a strong impact on the advancement of science. By publicly sharing our data, we are allowing others to pursue new and exciting directions.

Download the data for yourself and see what you can do with it. In the meantime, I am still eagerly awaiting the acceptance of the grant application . . . but that’s a different story.

The data: http://dx.doi.org/10.5061/dryad.38s74

The article: http://dx.doi.org/10.1038/sdata.2017.32

— Falk Lüsebrink

Read Full Post »

As a non-profit repository dependent on support from members and users, Dryad is greatly concerned with the economics and sustainability of data services. Our business model is built around Data Publishing Charges (DPCs), designed to recover the basic costs of curating and preserving data. Dryad DPCs can be covered in 3 ways:

  1. The DPC is waived if the submitter is based in a country classified by the World Bank as a low-income or lower-middle-income economy.
  2. For many journals, the society or publisher will sponsor the DPC on behalf of their authors (to see whether this applies, look up your journal).
  3. In the absence of a waiver or a sponsor, the DPC is US$120, payable by the submitter.

Our long-term aim is to increase sponsorships and reduce the financial responsibility of individual researchers.

Last year, we launched a pilot study sponsored by the US National Science Foundation to test the feasibility of having a funding agency directly sponsor the DPC. We conducted a survey of Dryad submitters as part of the pilot, hoping to learn more about how researchers plan and pay for data archiving.

Initial survey results

We first want to say a hearty THANK YOU to our participants for giving us so much good information to work with! (10 participants were randomly selected to receive gift cards as a sign of our appreciation). Respondents were located around the world, with nearly all based at academic institutions.

Survey respondents' positions

A word about selection of survey participants. We know that approximately 1/3 of all Dryad data publications do not have a sponsor or waiver, meaning the researcher is responsible for covering the $120 charge. We wanted to learn more about payment methods and funding sources for these non-sponsored DPCs.

We specifically solicited researchers for our survey who had 1) submitted to Dryad in the previous year and 2) paid their Data Publishing Charge directly (via credit card or voucher code). The survey questions focused on a few topics:

  • Grant funding and Data Management Plans
  • Where the money for their Data Publishing Charges ultimately came from, and
  • Whether funding concerns affect their data archiving behavior.

A few highlights are presented below; we intend to dig deeper into the survey results (and other information gathered as part of the pilot study) and report on them publicly in the coming months.

Planning for data in grant proposals

Nearly 72% of respondents indicated that the research associated with their publication/data was supported by a grant. We wanted to know how (or whether) researchers planned ahead for archiving their data in their grant proposals, and the results were enlightening:

  • 43% did not include a Data Management Plan (DMP) as part of their proposal for funding.
  • Of those who did submit a DMP, only about 46% committed to archiving their data as part of that plan.
  • A whopping 96% said they did not specifically budget for data archiving in their proposal.
  • Only 41% were able to archive their data within the grant funding period, while 59% were unable to, or were unsure.

As these results indicate, data management/stewardship is still not a high priority at the grant proposal stage. Even when researchers plan for data deposition, they don’t consider the costs associated. And even if they do (hypothetically) have funding specifically for data, the timing may not allow them to use it before the grant expires.

These factors suggest that if funding agencies want to prioritize supporting data stewardship, they should make funds available for this purpose outside the traditional grant structure.

Show me the money

When submitters pay the Dryad Data Publishing Charge themselves, where does that money come from? Are submitters being reimbursed? If so, how/by whom?

Our results showed that, unfortunately, about a quarter of our participants paid their DPCs out-of-pocket and did not receive any reimbursement. Approximately the same number paid themselves but were reimbursed (by their institution, a grant, or some combination of these), and 37% of DPCs were paid directly by the institution (using an institutional credit card or voucher code).

How was the Dryad DPC paid?

 

Some respondents view self-funding of data publication as worthwhile:

My belief is that scientific data should be publicly available and I am willing to cover the costs myself if supervisors (grant holders) do not.

As long as the cost is reasonable, in the worse case scenario I pay from my pocket. Better the data are safe and easily accessible for years to come than stored in spurious formats and difficult-to-access servers.

But for many others, covering the payment can be a real pain point:

I paid the processing charge myself mainly because our University’s reimbursement process was so laborious, I felt it easier just to get it over and done with myself and absorb the relatively small cost personally.

I just have to beg and plead for funding support each time.

If I am publishing after the postdoc ends then I am no longer paid to work on the project. Since I have had four postdocs, each lasting less than two years, this has happened for all my publications.

Examples from the “other” payment category shown above illustrate the scrappiness of researchers in finding funding:

I paid this from flexible research funds that were recently awarded by my institution. Had that not occurred, I would have had to pay personally and not be reimbursed.

I used my RTF (research trust fund) since I didn’t have dedicated grant funding.

Scavenged money from other projects.

Key takeaways

Our preliminary results show that at a time of more and stronger open data policies, paying for data publication remains far from straightforward, with much of the burden passed along to individual researchers.

Concerns about funding for open data can have real impacts on research availability and publication choice. More than 15% of our participants indicated that they have collected data in the last few years that they have been unable to archive due to lack of funds. Meanwhile, over 40% say that when choosing which journal(s) to submit to, sponsorship of the Dryad DPC does, or at least may, influence their decision.

The good news it that during our 8-month pilot implementation period, the US National Science foundation sponsored nearly 200 Data Publishing Charges for which researchers would otherwise have been responsible.

We at Dryad are committed to finding and implementing solutions, and very much appreciate the feedback and support we receive from the research and publishing community. Stay tuned for more lessons learned.

Read Full Post »

We’re beginning a series highlighting researchers who use Dryad to openly publish their research data. We ask them about their current projects, why they believe in open science, and why they choose Dryad.

photo of Zach Gompert

Zach Gompert

For our first researcher profile, we talked with Dr. Zach Gompert, assistant professor in the Department of Biology at Utah State University, about how his work ties in with open science:

Dryad: What is your area of research and what’s your current focus?

Gompert: The overarching goal in my lab is to advance understanding of the extent, organization, causes, and consequences of variation in nature. Some of the issues were are investigating are:

  • What are the evolutionary consequences of hybridization?
  • How does the evolution of novel ecological interactions affect biodiversity?
  • Is temporal variation in natural selection a key determinant of genetic diversity levels in natural populations?

We address these questions through population genomic analyses of natural and experimental populations, and through development of new theory and statistical methods. Our work on Lycaenid butterflies shows that hybridization can be a key creative force in animal evolution and that evolutionary histories are not always well represented by the ‘evolutionary tree’ metaphor. In other words, lineages don’t just split, they come back together.

We have quite a few datasets in Dryad now, including partial genome sequences from over a thousand butterflies.

butterfly in field

Lycaeides melissa

Dryad: What do you think about open science in general? What are advantages of open science? 

Gompert: Science has always been a communal endeavor. Large-scale collaboration is vital now for a number of reasons:

  • Diverse expertise. Many key questions require a diverse group of investigators. This results in big, multifaceted datasets and necessitates rapid sharing of data, methods, and findings.
  • Re-purposing data. It’s common now for data and methods to have applications beyond those that they were originally collected or developed for. Open science allows these to be used by other investigators, accelerating the rate of discovery.
  • Data integrity. Openness ensures a higher level of quality and integrity. When data and methods are available for scrutiny, possible errors are more likely to be identified and corrected. This is particularly relevant for large-scale, multi-investigator projects.
  • Public funding and access. Since much of science is funded by the public, I think scientists have an ethical duty to make the products of research available to everyone.

Dryad: In your opinion, what are disadvantages or concerns about open science?

Gompert: There are two common concerns:

  • Getting scooped. Researchers can be scooped if another group analyzes and publishes the data they generated. While this has some validity, sufficient safeguards and community standards are in place to minimize this problem, and it’s minor compared to the advantages of openness.
  • Poor documentation. I think data archiving is in better shape than it once was, but much of archived data or code are not sufficiently documented to truly be useful to others. Enhancing documentation of data is a big area where we as a community need to do more.

Dryad: You have over 20 datasets archived in Dryad. What do you see as the benefits of data sharing in Dryad?

Gompert: The primary strength of Dryad is its flexibility, specifically the ability to archive diverse types of data (and computer code) in a single location and to link to other more specialized databases such as NCBI. With Dryad, researchers have a central location where they can find all of the data associated with a publication.

Read Full Post »

Older Posts »