NSF sustaining their support for open data

Change_In_Hand

We are pleased to have received a Sustaining Award from the U.S. National Science Foundation.  Sustaining Awards are an innovative proposal track, developed within NSF’s Advances in Bioinformatics program, that provides “limited support for the cost of ongoing operations and maintenance of existing cyberinfrastructure that is critical for the continued advance of priority biological research.”

The award  is to the University of North Carolina at Chapel Hill with Dryad as a subawardee. The grant provides approximately $762K in funding over three years (starting 1-Sep-2016).

From the abstract:

This award will enable Dryad to achieve the scale required for sustainability through continued growth and extension to new research communities. At the same time, it will enable the continued growth of the repository’s valuable collection of diverse and high-quality data for research and education.

The full project description is publicly available and more information about the award is at the NSF Funding Database.

We are grateful to NSF, who have generously supported the Dryad Digital Repository since its inception in 2008, including a recently funded small-scale pilot study to explore direct sponsorship of data publication charges.

 

A snapshot of life on the savannah

Our latest featured data package is from Alexandra Swanson and colleagues at the Snapshot Serengeti project, and accompanies their peer-reviewed article in Scientific Data.  It provides a unique resource for studying one of the world’s most extraordinary mammal assemblages and also for studies of computer vision and machine learning. In addition, data from Snapshot Serengeti is already being used in biology and computer science classrooms to enable students to work on solving real problems with authentic research data.

 lion

Snapshot Serengeti, CC BY-NC-SA 3.0

The raw data (which are being made available from the University of Minnesota Supercomputing Institute) consist of 1.2 million sets of images collected between February 2011 and May 2013 from 225 heat and motion triggered cameras, operating day and night, distributed over 1,135 sq. km. in Serengeti National Park in Tanzania.  This staggering trove of images was classified by 28,040 registered and ~40,000 unregistered volunteers on Snapshot Serengeti (a Zooniverse project) according to the species present (if any), the number of individuals, the presence of young, and what behaviors were being displayed, such as standing, resting, moving, eating, or interacting.

Remarkably, this vast army of citizen scientists was classifying the images faster than they were being produced, and each image set was classified on average by nine different volunteers.  This led to consensus classifications with high accuracy, 96.6% for species identifications relative to an expert-classified gold set.  Of the more than 300,000 image sets that contain animals, 48 different species were seen, including rare mammals such as the aardwolf and the zorilla.

zorilla

zorilla (image from Snapshot Serengeti CC BY-NC-SA 3.0)

The Dryad data package includes the classifications from all the individual volunteers, the consensus classifications, information about when each camera was operational, and the expert classification of 4,149 image sets as a gold standard.

References:

  • Swanson et al. (2015) Snapshot Serengeti, high frequency annotated camera trap images of 40 mammalian species in an African savannah. Scientific Data.  http://dx.doi.org/10.1038/sdata.2015.26
  • Swanson et al. (2015) Data from: Snapshot Serengeti, high frequency annotated camera trap images of 40 mammalian species in an African savannah. Dryad Digital Repository http://doi.org/10.5061/dryad.5pt92

Introducing our new Executive Director Meredith Morovati

mm3We are delighted to introduce Meredith Morovati to the Dryad community. Meredith assumed the role of Executive Director in late 2014, and now that she has had a few months to settle in, we thought this would be a good time to check in with her and hear about her plans for the organization. Before joining Dryad, Meredith was the Vice President of Membership for the American Society of Echocardiography. Her experience prior to that includes stints in the publishing world. with Oxford University Press and Blackwell.

You’ve been on the job just over three months. What has been your impression of Dryad so far?
MM: Dryad is driven by a team of passionate and informed curators, developers, scientists and board members. I have been incredibly impressed by the staff’s commitment and how much they care about what they do. Everyone recognizes that Dryad is not only providing a service, but helping to shape the very landscape of data publishing, which is great to be a part of.

What excites you about this position and how does it build on your prior professional experiences?
MM: I am delighted to be able to apply my experience with academic boards and non-profit management to an organization that is positioned to grow dramatically in the near future. Data publication poses many challenges, yet has so much value to offer to researchers, publishers, librarians, and all of us who benefit from quality scientific and medical research. I am excited to be surrounded by informed and passionate individuals and to put my experience to work making data publication mainstream and sustainable.

What do you see as your top priorities for Dryad?
MM: I see an important role in removing barriers to the natural growth of Dryad’s service, and continuing to build relationships with its diversity of stakeholders. I believe there is a lot more work to be done talking to research communities in different corners of science and medicine on the imperative for data publication and how Dryad can be part of the solution. Dryad is integrated with many well-known journals and has some very prestigious and committed members. But there are many more to whom we need to make the case that data publication is valuable, achievable, and sustainable, and that Dryad is a key piece to that puzzle. Another big part of my job in the coming year will be to getting to know our members and hearing from them about how we can continue to improve the services we provide, both through the repository and through the other activities of the organization.

What can Dryad’s members and users expect to see in the coming year?
MM: First, I think members and users will be impressed with how much Dryad grows and diversifies this year. We are continually integrating manuscript and data submission with new journals, and the diversity of data packages we are now publishing can be seen by those we feature on nearly a daily basis on our social media channels. We are also pleased to be seeing a trend toward having a greater share of articles with data in Dryad from many of our partner journals.

Another trend that we hope will continue is more journals providing their reviewers with access to the draft Dryad data package. I believe that when reviewers pay attention to the data, it will naturally lead to higher quality, more reusable content.

As we grow, we are also working to increase the pool of sponsors, so that submission of data will be free to a greater share of those submitting data to the repository. There are a number of features in the works that will allow stakeholder organizations to see what has been published from the publications and researchers they care about, and how much attention and usage that data is getting, which we hope will make the benefits of sponsorship more apparent.

There’s a lot of work going on behind the scenes to streamline the curation process while continuing to provide personalized user support where needed. This work will allow us to continue scaling up the number of data packages we publish without compromising the attention each one receives.

We expect researchers will also appreciate the enhancements we are making to the data submission experience. We are particularly excited about the upcoming rollout of ORCiDs, which among other things will make it easier for coauthors to collaborate on data packages.

Four newly sponsored journals from Nordic Society Oikos

We are delighted to announce the integration of four new journals: Ecography, Journal of Avian Biology, Nordic Journal of Botany, and Oikos.

OIKOS_123_01_COVER.inddJABY_I_45_05_COVER:JABY_I_45_05_COVER.qxp.qxdNJBY_I_32_02_COVER.inddECOG_Issue Information.indd

The Nordic Society Oikos, which supports scientific research in ecology and related disciplines and to stimulate and enhance communication between stakeholders in ecological research in the Nordic countries and beyond, owns these journals, and is generously sponsoring Data Publication Charges on behalf of its authors. The Oikos Editorial Office, based in the Department of Biology at Lund University, manages the publication of these journals in partnership with Wiley. For all four journals, authors should submit the data to Dryad after the manuscript has been accepted.

Please see here for more information about how your journal can integrate manuscript and data submission to Dryad,

 

Enhanced integration of manuscript and data submission with PLOS

Dryad has been proud to support integrated data and manuscript submission for PLOS Biology since 2012, and for PLOS Genetics since 2013.  Yet there are over 400 data packages in Dryad from six difFeatured imageferent PLOS journals in addition to two research areas of PLOS Currents. Today, we are pleased to announce that we have expanded submission integration to cover all seven PLOS journals, including the two above plus PLOS Computational BiologyPLOS MedicinePLOS Neglected Tropical DiseasesPLOS ONE, and PLOS Pathogens.  

PLOS received a great deal of attention when they modified their Data Policy in March providing more guidance to authors on how and where to make their data available and introducing Data Availability Statements. Dryad’s integration process has been enhanced in a few ways to support this policy and also the needs of a megajournal like PLOS ONE.  We believe these modifications provide an attractive model for integration that other journals may wish to follow. The key difference for authors who wish to deposit data in Dryad is that you are now asked to deposit your data before submitting your manuscript.

  1. PLOS authors are now asked to provide a Data Availability Statement during initial manuscript submission, as shown in the screenshot below. There is evidence that introducing a Data Availability Statement greatly reinforces the effectiveness of a mandatory data archiving policy, and so we expect this change will substantially increase the availability of data for PLOS publications.  PLOS authors using Dryad are encouraged to provide the provisional Dryad DOI as part of the Data Availability Statement.
  2. PLOS authors are now also asked to provide a Data Review URL where reviewers can access the data, as shown in the second screenshot. While Dryad has offered secure, anonymous reviewer access for some time, the difference now is that PLOS authors using Dryad will be able to enter the Data Review URL  at the time of initial manuscript submission.
  3. In addition to these visible changes, we have also introduced an Application Programming Interface (API) to facilitate behind-the-scenes metadata exchange between the journal and the repository, making the process more reliable and scalable. This was critical for PLOS ONE, which published 31,500 articles in 2013.  Use of this API is now available as an integration option to all journals as an alternative to the existing email-based process, which we will continue to support.

PLOS Data Availability Statement interface

PLOS Data Review URL interface

The manuscript submission interface for PLOS now includes fields for a Data Availability Statement and a Data Review URL.

If you are planning to submit a manuscript but are unsure about the Dryad integration options or process for your journal, just consult this page. For all PLOS journals, the data are released by Dryad upon publication of the article.  Should the manuscript be rejected, the data files return to the author’s private workspace and the provisional DOI is not registered.  Authors are responsible for paying Data Publication Charges only if and when their manuscript is accepted.

Jennifer Lin from PLOS and Carly Strasser from the California Digital Library recently offered a set of community recommendations for ways that publishers could promote better access to research data:

  • Establish and enforce a mandatory data availability policy.
  • Contribute to establishing community standards for data management and sharing.
  • Contribute to establishing community standards for data preservation in trusted repositories.
  • Provide formal channels to share data.
  • Work with repositories to streamline data submission.
  • Require appropriate citation to all data associated with a publication—both produced and used.
  • Develop and report indicators that will support data as a first-class scholarly output.
  • Incentivize data sharing by promoting the value of data sharing.

Today’s expanded and enhanced integration with Dryad, which inaugurates the new Data Repository Integration Partner Program at PLOS, is an excellent illustration of how to put these recommendations into action.

A grand milestone for Molecular Ecology

Molecular Ecology cover imageWe are pleased to report that Molecular Ecology is now the first journal to surpass 1000 data packages in Dryad! Our latest featured data package is the one that took Molecular Ecology past the goalposts:

  • Bolnick D, Snowberg L, Caporaso G, Lauber C, Knight R, Stutz W (2014) Major Histocompatibility Complex class IIb polymorphism influences gut microbiota composition and diversity. Molecular Ecology doi:10.1111/mec.12846
  • Bolnick D, Snowberg L, Stutz W, Caporaso G, Lauber C, Knight R (2014) Data from: Major Histocompatibility Complex class IIb polymorphism influences gut microbiota composition and diversity. Dryad Digital Repository doi:10.5061/dryad.2s07s

Why so many data packages from Molecular Ecology?  It is likely due to a few factors.  One, Molecular Ecology publishes a lot of papers (445 in 2012 according to Journal Citation Reports) and have had integrated data and manuscript submission with Dryad since 2010.  Two, the field works with many datatypes for which no specialized repository exists.  Three, Molecular Ecology not only began requiring data archiving in 2011 when it adopted the Joint Data Archiving Policy, but actually goes beyond JDAP by requiring a completed data availability statement in each article, something that managing editor Tim Vines and his colleagues have shown to be associated with very high rates of data archiving. Four, since Dryad introduced Data Publishing Charges, Molecular Ecology has been sponsoring those charges on behalf of its authors.

Other journals looking to support data archiving in their fields would do well to look at Molecular Ecology as a model.

Applications open for Dryad Executive Director

Dryad is seeking an energetic and enthusiastic Executive Director, ideally with experience in scientific or biomedical research, librarianship, or publishing, to oversee development and operation of the organisation during a period of rapid growth and transformation. The role reports to the Board of Directors. Externally, the postholder will be responsible for building relationships with stakeholders, customers and users of the Dryad Digital Repository. Internally, key responsibilities include organisational leadership and ensuring Dryad meets its objectives through sound financial management and oversight of day-to-day operations, with the support of a small but growing staff.  Review of applications will begin by September 1, 2014 and continue until the position is filled. For details please see the full position description and for inquiries please contact director@datadryad.org.

Tamiflu, Relenza, and influenza: what the data do (or don’t) tell us

The following is a guest post from Tom Jefferson of The Cochrane Collaboration, Peter Doshi of the University of Maryland and Carl Heneghan from the University of Oxford. We asked them to tell the story behind their recent Cochrane systematic review [1] and dataset in Dryad [2] which holds valuable lessons about the evidence-base on which major public health recommendations are decided.  -TJV1918 Influenza Poster

In the late 2000s, half the world was busy buying and stockpiling the neuraminidase inhibitors oseltamivir (Tamiflu, Roche) and zanamivir (Relenza, GSK) in fear of an influenza pandemic.

The advice to stockpile for a pandemic and also use the drugs in non-pandemic, seasonal influenza seasons came from such august bodies as the World Health Organization (WHO), the US Centers for Disease Control and Prevention (CDC) and its European counterpart, the ECDC. However, they were stockpiling on the basis of an unclear rationale, mixing the effect of the antiviral drugs on the complications of influenza (mainly pneumonia and hospitalizations) and their capacity to slow down viral spread giving time for vaccines to be crash produced and deployed.

It has since become clear that none of these parties had seen all the clinical trial evidence for these drugs. They had based their recommendations on reviews of “the literature” which sounds impressive, but in fact refers to short trial reports published in journal articles rather than the underlying detailed raw data. For example, key assumptions of antiviral performance found in the US national pandemic plan trace back to a six page long journal article written by Roche which reported on a pooled-analysis of 10 randomized trials of which only 2 have ever been published.

In contrast, each of the corresponding internal clinical study reports for these 10 trials runs thousands of pages (for background on what clinical study reports are, see here.) Despite the stockpiling, these reports have never been reviewed by CDC, ECDC, or WHO. The WHO and CDC both refused to answer our questions on the evidence base for their policies.

Our Cochrane systematic review of neuraminidase inhibitors, funded by the National Institute for Health Research in the UK, was based on analysis of the full clinical study reports for these drugs, not short journal publications. We obtained these reports from the European Medicines Agency, Roche, and GlaxoSmithKline.  It took us nearly four years to obtain the full set of reports. The story of how we got hold of the complete set of clinical trials with no access restrictions is told in our essay “Multisystem failure: the story of anti-influenza drugs”.

With the publication of our review, we are making all 107 full clinical study reports publicly available. If you disagree with our findings, if you want to carry out your own analysis or if you are just curious to see what around 150,000 pages of data look like, they are one click away. Now the discussion about how well these drugs work can happen with all parties able to independently analyze all the trial evidence. This is called open science.

Be aware that there are some minimal redactions carried out by GSK and Roche. They did this to protect investigator and participant identity. While protecting participant identity is understandable, the EMA carries a different view towards protecting investigator identity: “names of experts or designated personnel with legally defined responsibilities and roles with respect to aspects of the Marketing Authorisation dossier (e.g. QP, QPPV, Clinical expert, Investigator) are included in the dossier because they have a legally defined role or responsibility and it is in the public interest to release this data”.

References

  1. Jefferson T, Jones MA, Doshi P, Del Mar CB, Hama R, Thompson MJ, Spencer EA, Onakpoya I, Mahtani KR, Nunan D, Howick J, Heneghan CJ (2014) Neuraminidase inhibitors for preventing and treating influenza in healthy adults and children. Cochrane Database of Systematic Reviews, online in advance of print. doi:10.1002/14651858.CD008965.pub4
  2. Jefferson T, Jones MA, Doshi P, Del Mar CB, Hama R, Thompson MJ, Spencer EA, Onakpoya I, Mahtani KR, Nunan D, Howick J, Heneghan CJ (2014) Data from: Neuraminidase inhibitors for preventing and treating influenza in healthy adults and children. Dryad Digital Repository. doi:10.5061/dryad.77471

We had a busy week in Oxford this past May

We’re happy to announce that presentations are now available from Dryad’s Annual Membership Meeting, held at St. Anne’s College, Oxford this May.  Dryad personnel reported on the state of the repository and the organization’s sustainability and business strategy.  The meeting also included a very valuable “Emerging Issues Forum” that looked forward to new opportunities for the repository and its community of users. We heard from Marianne Bamkin on model journal policies, Jonathan Tedds on review of data associated with publications, Simon Hodson on funding for data archiving costs, Sarah Callaghan on recommendations for data citation policy, Martin Fenner on ways to track data usage and impact, Eefke Smit on the state of the art in repository certification, Susanna-Assunta Sansone on the relevance of the ISA and Biosharing initiatives, and Bill Michener on the opportunities provided by DataONE and other DataNets.

This was the first community meeting since Dryad incorporated as a nonprofit in July 2012, and it was an opportunity for the organization’s Members to exercise their role in governance.  By electronic votes, returning director Susanna-Assunta Sansone, as well as new members Charles Fox, Martin Fenner and Carol Tenopir were elected to the 2016 class of the Board of Directors and several minor amendments to the ByLaws were unanimously adopted.

The meeting capped several days of programming around data, publication and scholarly communication.  The week kicked off with an exciting one-day symposium on The Now and Future of Data Publishing, cosponsored by Jisc, BioSharingDataONE, Dryad, STM and Wiley-Blackwell (presentations available on Slideshare). The next day, Dryad and ORCID co-organized a Symposium on Research Attribution in conjunction with ORCID’s Outreach Meeting and Codefest, and presentations from the symposium are available on the ORCID website.  The symposium featured keynote talks from Joanna McEntyre (Europe PubMedCentral) and David DeRoure (Oxford eResearch Centre); panel discussions with Liz Allen (Wellcome Trust), John Kaye (British Library), Neil Chue Hong (Software Sustainability Institute), Christine Borgman (UCLA), Trish Groves (BMJ) and Martin Fenner (PLOS); and a wrap-up discussion with Cameron Neylon (PLOS).

Many thanks to those of you who contributed as both organizers and participants, and a special thanks to our hosts at the Oxford eResearch Institute.   The next meeting will be in May 2013 in North America and will also be open to the community.  Please let us know if you have ideas for what you’d like to see in the next Emerging Issues forum.

Dryad’s Annual Membership Meeting, and much more, in Oxford this month

Photo by David Iliff; license: CC-BY-SA 3.0

Dryad invites current members, prospective members, and other interested parties to attend the Annual Membership Meeting in Oxford, UK on the 24th of May.  This is the first open meeting of the newly incorporated organization and will be the last membership meeting before the introduction of deposit fees in September.  Attendees will learn about recent developments, get a preview of upcoming features, have a say in the governance of the organization, and weigh in on topics of relevance to the future of Dryad, its members and partner journals.  Speakers scheduled to present emerging issues include:

  • Marianne Bamkin of JoRD – Model journal policies and implementation
  • Jonathan Tedds  of PREPARDE – Review of data associated with publications
  • Simon Hodson of JISC – The use of grant funds for data archiving costs
  • Sarah Callaghan of the CODATA-ICSTI Task Group on Data Citation – Data citation principles
  • Martin Fenner of PLOS ALM – Tracking data usage and impact
  • Eefke Smit of STM – The how and why of repository certification
  • Susanna Assunta-Sansone of ISA and BioSharing – Helping researchers to collect, curate, analyse, share and publish data.
  • Bill Michener of DataONE – Relevance of the DataNet program to Dryad

The Membership Meeting will cap off a series of exciting events spotlighting trends in scholarly communication and research data:

  • The Now and Future of Data Publishing on 22 May – A daylong program featuring new initiatives and current issues in data publishing. Organized by the JISC together with a range of organizations including BioSharingDataONESTM and Wiley-Blackwell.
  • The ORCID Outreach meeting on the morning of 23 May and ORCID CodeFest from 23-24 May
  • A joint Dryad-ORCID Symposium on Research Attribution on the afternoon of 23 May.  The symposium will address the changing culture and technology of how credit is assigned and tracked for data, software, and other research outputs.  Keynote speakers Johanna McEntyre (Europe PubMed Central) and David DeRoure (Oxford eResearch Centre) will be joined by panelists Liz Allen (Wellcome Trust), Christine Borgmann (UCLA), Martin Fenner (PLOS), Neil Chue Hong (Software Sustainability Institute), Trish Groves (BMJ), John Kaye (British Library) and moderator Cameron Neylon (PLOS) to address the many faces of the issue.

You may register for events separately here and here through May 13th.  A block of rooms has been set aside at the Malmaison Hotel; enter corporate code OXER900 to receive a discounted rate. Please consult the Dryad membership meeting website closer to the event if you are interested in viewing the webcast.

We hope to see you there!