Data Curation from Down Under: the 14th International Digital Curation Conference

img_5196.jpgIt was a long journey from Chapel Hill, NC to Melbourne, Australia, but it was definitely worth it to attend the 14th International Digital Curation Conference (IDCC). The IDCC is always a great event for people involved in digital curation and preservation, especially when it is in a beautiful city like Melbourne. I was excited to attend this year and to take part in a 10-minute lightning talk on the Data Curation Network (DCN) entitled “The Data Curation Network: A Curator Perspective”. (More on this later in this post.) I’d like to take this opportunity to share some highlights from the conference.

32258153577_6bdf6bd076_z

This theme of this year’s IDCC, “Collaborations and Partnerships: addressing the big digital challenges together”, fits perfectly with what the Data Curation Network is all about. The Data Curation Network puts into place a cross-institutional staffing model connecting a network of expert data curators to increase local curation capacity, strengthen collaboration and support the sharing of research data. (To read more about the DCN and Dryad’s participation in the network, see Elizabeth Hull’s previous blog post announcing Dryad’s participation in the DCN launch.)

40235616193_c930f23f41_mThe main conference was kicked off with a “Welcome to Country Ceremony” conducted by a Wurundjeri Community Elder, along with a welcome to the University of Melbourne from Gwenda Thomas, Directory Scholarly Services and University Librarian. Kevin Ashley, Director, Digital Curation Centre, also gave a welcome to IDCC19 that included a challenge to conference participants: “listen, talk, interact and be inspired to do something”.

40235541493_9b52a0d5ba_zThe opening keynote, which was presented by independent journalist Christine Kenneally and was entitled “Data, the creation of history and its impact on real lives“, related the compelling story of millions of orphans from around the world (including Australia and the US) searching for information about themselves. The orphans’ story highlighted the importance and direct impact of data on both a societal and an individual level, a theme that would emerge throughout  the conference.

After the keynote, the various presentations in the form of parallel sessions, posters and lightning talks began. Throughout the conference, these presentations were organized into broad topics such as:

  • Grand curation challenges across disciplines
  • Metadata
  • Trust47200267751_009c42e246_z
  • Data quality
  • Digital humanities
  • Examples and models / Models and tools
  • Research disciplines & data services
  • Research data management / Research data services
  • Digital curation & preservation
  • Building diverse and Inclusive Communities
  • Curating indigenous data
  • Skills

As a representative of the DCN, I took part in a lightning talk session with a presentation put together by Erin Clary (Dryad Senior Curator), Lisa Johnston (Principal Investigator for the DCN and Director of the Data Repository for the University of Minnesota) and myself. The presentation focused on the experiences Erin and I have had so far as curators with the DCN pilot. After Lisa gave a brief overview of the DCN, I described the training and preparation all participating curators undertook and what it was like for Erin and me to actually begin curating DCN submissions.

40235544193_9a0f000eb3_z

John Chodacki (Director, University of California Curation Center) gave a great presentation about the “Community Led Open Data Infrastructure: CDL & Dryad Partnership” in which he shared how and why the partnership came about and what it means going forward. John followed up immediately with another presentation about “The Research Organization Registry“. As an added bonus after the conference, John led the workshop “Accelerating Data Publication: new models for research institutions”. (For a summary of the workshop, see the blog post from the perspective of workshop attendee Dr. Richard Ferrers.)

The thought-provoking final keynote was presented remotely (in light of the recent US Government shutdown) by Dr. Patricia Brennan, Director, US National Library of Medicine. Her presentation, “Jumping into the stream of data curation“, highlighted the enormous amount of data curated each day by the National Library of Medicine. Dr. screen-shot-2019-02-28-at-2.47.44-pm.pngBrennan spoke of an “information tsunami”, the challenges inherent in curating all that data and what those challenges may mean for the future of data curation. Her presentation highlighted the shift in focus by data curation professionals over the years from pushing efforts to encourage data curation to figuring out how we move forward now that those efforts are paying off with a torrent of data given the limited resources available.

The conference came to an end all too soon with closing remarks by Kevin Ashley and Donna McRostie and an IDCC 2019 theme song that put a smile on everyone’s face. Next year, curators will do it all again at the 15th International Digital Curation Conference in (drum roll, please) … Dublin, Ireland!

40235555313_91457264a5_z
Continue reading

And Now, the Numbers . . .

As the new year begins, we take note of the increasing diversity of fields represented in data archived at Dryad and review the numbers for 2016.

Dryad Grows into a General Repository

We are excited to see Dryad’s role in the preservation of data expand into new areas and fields in 2016. Researchers submitted more data involving human subjects and data from social media. In addition, a quick look at our most popular data shows that two of the top five downloaded packages were from the fields of cardiology and science journalism. While Dryad’s origins are in the life sciences, it is increasingly being used as a general repository for data from a myriad of fields.

Let’s take a look at the numbers for 2016:

Increase in Number of Data Packages and Data Files

Our curators were busy! The total number of published data packages (sets of data files associated with a publication) at the end of the year was a whopping 15,325. Our curators meticulously archived 4,307 packages, a 10% increase from 2015. The size of data packages also continued to grow – from an average of 481MB to an average of 573MB, an increase of about 20%.summary of Dryad data packages 2016

At the end of 2016, we were closing in on 50,000 archived data files; by January of this year, we passed that mark.

In a future blog, we’ll talk about the integration of new journals into the Dryad submission process, new members, and new partnerships. For now, we’ll just note that there was a 22% increase in the number of journals that have data in Dryad linking back to the article.

New Fields

We’ve seen a significant uptick in human subjects data and social media data this year, which has prompted us to develop an FAQ on cleaning and de-identification of human subjects data for public access. As the idea of what data should be preserved continues to broaden, submissions of these kinds of data will only increase. We’ll keep you updated about this trend in future blogs.

Top Downloads

Let’s take a look at the most popular data published in 2016, in terms of downloads. Among the top 5 downloads includes data on plant genetics, the early history of ray-finned fishes, and, not surprisingly in this age, the effects of climate change on boreal forests.

Also of interest are data from an article in Science evaluating how people make use of Sci-Hub, an open source scholarly library. Our guest blog on these data by science journalist John Bohannon generated a lot of interest this year and was one of our most popular blog posts ever.

Another significant development in 2016 came from the medical sciences. A comparison of coronary diagnostic techniques marked Dryad’s first submission from one of the top five cardiology journals, JACC: Cardiovascular Interventions.

The fact that 2 of the 5 top downloads come from fields outside of life sciences clearly indicates that data in Dryad now cover a broad range of fields.

Top 5 Downloads of Data Archived in 2016

Article Dryad DOI Number of Downloads
Wagner MR et al. (2016) Host genotype and age shape the leaf and root microbiomes of a wild perennial plant. Nature Communications 7: 12151. http://doi.org/10.5061/dryad.g60r3 3123
Bohannon J et al. (2016) Who’s downloading pirated papers? Everyone.  Science 352(6285): 508-512. http://doi.org/10.5061/dryad.q447c 2969
D’Orangeville L et al. (2016) Northeastern North America as a potential refugium for boreal forests in a warming climate. Science 352(6292): 1452-1455. http://doi.org/10.5061/dryad.785cv 741
Johnson NP et al. (2016) Continuum of vasodilator stress from rest to contrast medium to adenosine hyperemia for fractional flow reserve assessment. JACC. Cardiovascular Interventions 9(8): 757-767. http://doi.org/10.5061/dryad.f76nv 453
Lu J et al. (2016) The oldest actinopterygian highlights the cryptic early history of the hyperdiverse ray-finned fishes. Current Biology 26(12): 1602–1608. http://doi.org/10.5061/dryad.t6j72 423

Overall, we’ve had a great year and are delighted to be seeing a broader range of data from an increasing number of journals and fields. Thanks to our Board of Directors, members, and of course our staff for providing their support to make 2016 a notable year for Dryad!

Introducing Dryad’s new board members

One of the most rewarding things about working for Dryad is collaborating with talented and passionate professionals from across the globe who are dedicated to increasing the availably of open data. This summer, two new people were officially elected to serve on Dryad’s Board of Directors and we are excited to have them our governance team.

linJennifer Lin, Director of Product Management at Crossref, comes to us with lots of experience in product development and management, community outreach, scholarly communications, and more. Based in California, USA, Jennifer was instrumental in helping Dryad integrate our data submission system with PLOS journals during her tenure there. She is a data sharing evangelist, and passionate about tools for making data reusable and discoverable. We are thrilled to have her direct her energy and enthusiasm Dryad’s way.

nilssonJohan Nilsson is also new to the Dryad board and comes from the Oikos Editorial Office, a society-owned publishing foundation based at Lund University, Sweden. Johan’s past work has been as a research scientist in evolutionary ecology. He has a strong interest in scientific communication and social media engagement and focuses particularly on how the benefits of open science (and open data in particular) can be better expressed to researchers. We value his expertise and perspective into how Dryad can best serve its users.

dilloWe would be remiss if we didn’t also publicly welcome Ingrid Dillo, who was appointed to the board early in 2016. Ingrid is deputy director at DANS (Data Archiving and Networked Services). She holds a PhD in history and has a long record of policy development at DANS, the National Library of the Netherlands and Dutch Ministry of Education, Culture and Science. She is especially interested in research data management and the certification of trustworthy digital repositories. We are already relying on Ingrid’s expertise and learning from her work with groups like the Research Data Alliance.

Candidates to Dryad’s 12 member Board of Directors are nominated by Member organizations, and four of the Directors are elected or re-elected every year. Once on the Board, Directors serve as individuals rather than organizational representatives. The 12-member rotating Board aims for both diversity of perspective and depth of expertise. We are delighted to have achieved both with our new Directors. We welcome them onboard and wish to extend a heartfelt thanks to Directors past, present, and future for their contributions and dedication to Dryad’s mission.

Applications open for Dryad Executive Director

Dryad is seeking an energetic and enthusiastic Executive Director, ideally with experience in scientific or biomedical research, librarianship, or publishing, to oversee development and operation of the organisation during a period of rapid growth and transformation. The role reports to the Board of Directors. Externally, the postholder will be responsible for building relationships with stakeholders, customers and users of the Dryad Digital Repository. Internally, key responsibilities include organisational leadership and ensuring Dryad meets its objectives through sound financial management and oversight of day-to-day operations, with the support of a small but growing staff.  Review of applications will begin by September 1, 2014 and continue until the position is filled. For details please see the full position description and for inquiries please contact director@datadryad.org.

Dryad’s Annual Membership Meeting, and much more, in Oxford this month

Photo by David Iliff; license: CC-BY-SA 3.0

Dryad invites current members, prospective members, and other interested parties to attend the Annual Membership Meeting in Oxford, UK on the 24th of May.  This is the first open meeting of the newly incorporated organization and will be the last membership meeting before the introduction of deposit fees in September.  Attendees will learn about recent developments, get a preview of upcoming features, have a say in the governance of the organization, and weigh in on topics of relevance to the future of Dryad, its members and partner journals.  Speakers scheduled to present emerging issues include:

  • Marianne Bamkin of JoRD – Model journal policies and implementation
  • Jonathan Tedds  of PREPARDE – Review of data associated with publications
  • Simon Hodson of JISC – The use of grant funds for data archiving costs
  • Sarah Callaghan of the CODATA-ICSTI Task Group on Data Citation – Data citation principles
  • Martin Fenner of PLOS ALM – Tracking data usage and impact
  • Eefke Smit of STM – The how and why of repository certification
  • Susanna Assunta-Sansone of ISA and BioSharing – Helping researchers to collect, curate, analyse, share and publish data.
  • Bill Michener of DataONE – Relevance of the DataNet program to Dryad

The Membership Meeting will cap off a series of exciting events spotlighting trends in scholarly communication and research data:

  • The Now and Future of Data Publishing on 22 May – A daylong program featuring new initiatives and current issues in data publishing. Organized by the JISC together with a range of organizations including BioSharingDataONESTM and Wiley-Blackwell.
  • The ORCID Outreach meeting on the morning of 23 May and ORCID CodeFest from 23-24 May
  • A joint Dryad-ORCID Symposium on Research Attribution on the afternoon of 23 May.  The symposium will address the changing culture and technology of how credit is assigned and tracked for data, software, and other research outputs.  Keynote speakers Johanna McEntyre (Europe PubMed Central) and David DeRoure (Oxford eResearch Centre) will be joined by panelists Liz Allen (Wellcome Trust), Christine Borgmann (UCLA), Martin Fenner (PLOS), Neil Chue Hong (Software Sustainability Institute), Trish Groves (BMJ), John Kaye (British Library) and moderator Cameron Neylon (PLOS) to address the many faces of the issue.

You may register for events separately here and here through May 13th.  A block of rooms has been set aside at the Malmaison Hotel; enter corporate code OXER900 to receive a discounted rate. Please consult the Dryad membership meeting website closer to the event if you are interested in viewing the webcast.

We hope to see you there!

Submission fees to be introduced in September 2013

seed-1

Dryad is a nonprofit organization fully committed to making scientific and medical research data permanently available to all researchers and educators free-of-charge without barriers to reuse.  For the past four years, we have engaged experts and consulted with our many stakeholders in order to develop a sustainability plan that will ensure Dryad’s content remains free to users indefinitely.  The resulting plan allows Dryad to recoup its operating costs in a way that recovers revenues fairly and in a scalable manner.  The plan includes revenue from submission fees, membership dues, grants and contributions.

A one-time submission fee will offset the actual costs of preserving data in Dryad.  The majority of costs are incurred at the time of submission when curators process new files, and long-term storage costs scale with each submission, so this transparent one-time charge ensures that resources scale with demand.  Dryad offers a variety of pricing plans for journals and other organizations such societies, funders and libraries to purchase discounted submission fees on behalf of their researchers.  For data packages not covered by a pricing plan, the researcher pays upon submission.  Waivers are provided to researchers from developing economies.  See Pricing Plans for a complete list of fees and payment options.  Submission fees will apply to all new submissions starting September 2013.

Membership dues will supplement submission fees, allowing Dryad to maintain its strong ties to the research community through its volunteer Board of Directors, Annual Membership Meetings, and  other outreach activities to researchers, educators and stakeholder organizations.  See Membership Information.

Grants will fund research, development and innovation.

Donations will support all of the above efforts.  In addition, Dryad will occasionally appeal to donors to fund special projects or specific needs, such as preservation of valuable legacy datasets and deposit waivers for researchers from developing economies.

We are grateful for all the input we have received into our sustainability plan, and look forward to your continued support in carrying out our nonprofit mission for many long years to come.

Hope and change for research data in the US

OSTP homepageOn Friday, the Obama administration made a long-awaited announcement regarding public access to the results of federally funded research in the United States.

There has been considerable attention given to the implications for research publications (a concise analysis here).  Less discussed so far — but just as far reaching — the new policy also has quite a lot to say about research data, a topic on which the White House solicited, and received, an earful of input just over a year ago.

What does the directive actually require?  All federal government agencies with at least $100M in R&D expenditures must develop, in the next six month, policies for digital data arising from non-classified research that address a host of objectives, including:

  • to “maximize access, by the general public and without charge, to digitally formatted scientific data created with federal funds” while recognizing that there are cases in which preservation and access may not be desirable or feasible.
  • to promote greater use of data management plans for both intramural and extramural grants and contracts, including review of such plans and mechanisms for ensuring compliance
  • to allow inclusion of appropriate costs for data management and access in grants
  • to promote the deposit of data in publicly accessible databases
  • to address issues of attribution to scientific data sets
  • to support training in data management and stewardship
  • to “outline options for developing and sustaining repositories for scientific data in digital formats, taking into account the efforts of public and private sector entities”

Interestingly, the directive is silent on the issue of embargo periods for research data, neither explicitly allowing or disallowing them.

In the words of White House Science Advisor John Holdren

…the memorandum requires that agencies start to address the need to improve upon the management and sharing of scientific data produced with Federal funding. Strengthening these policies will promote entrepreneurship and jobs growth in addition to driving scientific progress. Access to pre-existing data sets can accelerate growth by allowing companies to focus resources and efforts on understanding and fully exploiting discoveries instead of repeating basic, pre-competitive work already documented elsewhere.

The breadth of research impacted by this directive is notable.  Based on the White House’s proposed 2013 budget, the covered agencies would spend more then $60 billion on R&D.  A partial list includes:

  • The National Institutes of Health (NIH)
  • The National Science Foundation (NSF)
  • The National Aeronautics and Space Administration (NASA)
  • The Department of Energy (DOE)
  • The Department of Agriculture (USDA)
  • The National Oceanic and Atmospheric Administration (NOAA)
  • The National Institutes for Standards and Technology (NIST)
  • The Department of the Interior (which includes the Geological Survey)
  • The Environmental Protection Agency (EPA)
  • and even the Smithsonian Institution

We applaud OSTP for moving to dramatically improve the availability of research data collected in the public interest with federal funds.

You can read the full memo here: the data policies are covered in Section 4.