Feeds:
Posts
Comments

Posts Tagged ‘data archiving’

1E+3

Fig 1. Helen of Troy, detail from an Attic red-figure krater, c. 450–440 BC

It is said that a picture is worth a thousand words and that Helen of Troy (Fig 1) had a face that launched a thousand ships.  Why is the number 1000 significant to those of us at Dryad today?  (Especially since its place in literature is ultimately an accident of our decimal number system [1]).

The reason is that Dryad released its 1000th data package.  The lucky submission is: Hager R, Cheverud JM, Wolf JB (2011) Data from: Genotype dependent responses to levels of sibling competition over maternal resources in mice. doi:10.5061/dryad.8qq3p0d8  [2]. This (arbitrary, but see [3]) milestone has put us in a reflective mood, and so here we take the opportunity to consider what it means.

First, it encourages us that Dryad’s multipronged approach to making data available for reuse (raising awareness of the issues, coordinating data archiving policy across journals, providing a user-friendly submission interface, paying attention to the incentives of researchers) is bearing fruit.  As a result of this strategy, the rate of submissions continues to grow; over 60% of submissions are from the past nine months alone.  Since a picture is worth a thousand words, see Fig 2.

Figure 2. Data packages submitted to Dryad through September 2011

We are mindful will take some time before we can measure the impact of the availability of these data for reuse, but there are encouraging signs from the frequency with which data are being downloaded.  We will discuss those results in a separate post.

What else can we learn from these first 1000 submissions?  One is the importance of making data submission integral to publication. While there are 88 different journals in which the corresponding articles appear, about three quarters of the submissions come from the first nine journals that worked to integrate manuscript and data submission with Dryad [4].  Journal policy matters, and the enthusiasm with which journals implement policy matters.

As far as disciplinary diversity goes, the first 1000 submissions are dominated by journals in evolutionary biology and ecology.  Dryad’s first biomedical journal partner, BMJ Open, was integrated within the past few months, and as a result of many other new journal partnerships being developed, we expect submissions to the repository to represent a much broader array of basic and applied biosciences in the near future.

Interestingly, most of the deposits are relatively small in size. Counting all files in a data package together, almost 80% of data packages are less than one megabyte.  Furthermore, the majority of data packages contain only one data file and the mean is a little less than two and a half.  As one might expect, many of the files are spreadsheets or in tabular text format.  Thus, the files are rich in information but not so difficult to transfer or store.

We are pleasantly surprised to report that most authors, most of the time, see the value in having their data released at the same time as the article is published.  Authors are making their data available immediately upon publication, or earlier, for over 90% of data files.  In nearly all cases where files are put under embargo, authors choose to release them one-year post-publication rather than requesting a longer embargo from the journal.

Thomson Reuters indexes more than half a million abstracts annually in BIOSIS.  A difficult-to-estimate, but undoubtedly substantial, fraction of this literature reports on data that cannot be, or is not, archived in a specialized public data repository.  This helps put Dryad’s 1000 data packages in perspective.   As a discipline, we still have a long way to go to preserve and make available for reuse all the “published” data that has no home.  But every data package that is submitted to Dryad is a little victory for the transparency and robustness of science.

So here’s to the first thousand.  May they have plenty of company in the coming years.

Footnotes:

  1. Things might have turned out very differently judging by the presence early vertebrate fossils with more than five digits (see http://en.wikipedia.org/wiki/Polydactyly_in_early_tetrapods)
  2. To celebrate, we are sending a Dryad-logo coffee mug to Dr. Reinmar Hager, who submitted the 1000th data package.
  3. Random cool fact about the number 1000.  It is “the smallest number that generates three primes in the fastest way possible by concatenation of decremented numbers (1000999, 1000999998997, and 1000999998997996995994993 are prime) … [excluding] the number itself” (see http://primes.utm.edu/curios/page.php/1000.html).
  4. This includes a collection of legacy data packages from the Systematic Biology archives that was submitted en masse to Dryad in mid-2009.

Read Full Post »

Dryad is pleased to welcome BMJ Open as a new partner journal, reflecting the recently expanded scope of repository to be inclusive of all of basic and applied biosciences, including medicine. BMJ Open is a new online-only, open access journal from the esteemed London-based BMJ Group.  It is dedicated to publishing medical research from all disciplines and therapeutic areas, utilizing fully open peer review and immediate online publication.

BMJ Open authors are now being strongly encouraged to deposit the data underlying their articles in Dryad or a more specialized repository, as appropriate.  Authors submitting articles to the journal will benefit from Dryad’s journal submission integration, the process by which data deposit is streamlined for authors through behind-the-scenes communication between the journal and the repository.

An extremely important issue with archiving medical data is, of course, the need to protect patient privacy. To assist its authors, BMJ Open is providing special guidance on data sharing.  Authors must be able to release data to the public domain as with all data in Dryad, and the repository will err on the side of caution by turning back any data that may compromise patient privacy.

To quote from the BMJ Group press release:

Data sharing aims to help scientists and doctors validate and scrutinise researchers’ findings in a bid to prevent fraud and eradicate the kind of selective reporting that has enabled some treatments to acquire regulatory approval, based on incomplete and biased data. In some cases this lack of transparency has prompted the subsequent restriction or withdrawal of certain treatments because of patient safety or effectiveness concerns, which were already evident in the unpublished data.  Data repositories also allow researchers to develop new methods of analysis and use the data to answer questions that the original researchers have not thought of. They also facilitate the acquisition of data for meta analysis (more in-depth comparative reviews).

Commenting on the move, Dr Trish Groves, editor in chief of BMJ Open, said: “Since launch, BMJ Open has championed transparency in medical research through open peer review, open access, and full reporting of studies’ methods and results, all exemplified by last week’s paper on the safety (or not) of medical devices (doi:10.5061/dryad.585t4)…”

This data package in Dryad, which illustrates the tremendous value of medical data for informing medical policy and practice without compromising patient privacy, is available at:

  • Heneghan C, Thompson M, Billingsley M, Cohen D (2011) Data from: Medical-device recalls in the UK and the device-regulation process: retrospective review of safety notices and alerts. Dryad Digital Repository. doi:10.5061/dryad.585t4

Groves goes on to say

We strongly encourage authors to share their datasets, and now we’re delighted to be making that easier to do, with the help of DryadUK.

Kudos to the Dryad UK project team, based at the British Library, for facilitating this pioneering partnership.

Read Full Post »

Dryad is happy to announce a new initiative with Pensoft Publishers, the pioneering publisher behind ZooKeys and other rapid-publication open access journals, including BioRisk, Comparative Cytogenetics, International Journal of Myriapodology, Journal of Hymenoptera Research, NeoBiota, PhytoKeys, and Subterranean Biology.  Dryad is working with Pensoft to support publication of data papers in the area of biodiversity, together with the Global Biodiversity Information Facility and the Barcode of Life.  Through this effort, we aim to make the data publishing experience as smooth and rewarding as possible for authors, while at the same time making sure these important data are vetted through peer review and available for reuse in public repositories.  The full press release from Pensoft is below.

Data publishing policies and guidelines for biodiversity data published by Pensoft

Pensoft Publishers announced a data publishing project for biodiversity data in response to the increasing demands from institutions and scientists to open scientific data to anyone who would be interested to use them.

“An opinion survey amongst the authors, readers and editors of the Pensoft journal ZooKeys carried out in April convinced us that the majority of participants (84 %) are willing to publish their data, so that to make them available to anyone to use, share or integrate with other data” said Dr Lyubomir Penev, managing director of Pensoft Publishers. Among the most important incentives to publish data, the scientists mentioned  that  “open data increases transparency and the overall quality of science, the potential for collaborative research as well as an opportunity to increase academic credit in the form of citations. Therefore, providing a service to ensure a permanent publication record for published data is of key importance for the success of the project”, adds Dr Penev.

The core of the project is the concept of the “Data Paper” developed in a cooperation with the Global Biodiversity Information Facility (GBIF). Data Papers are peer-reviewed scholarly publications that describe the published datasets and provide an opportunity to data authors to receive the academic credit for their efforts. Currently, Pensoft offers the opportunity to published Data papers describing biodiversity data, Barcode of Life genome data and biodiversity-related software tools, such as interactive keys and others.

Pensoft reached an agreement for cooperation in data hosting and developing of data publishing workflows with the GBIF, the Dryad Data Repository and the Consortium for Barcode of Life.

“Data publishing becomes increasingly important and already affects the policies of the world’s leading science funding frameworks and organizations. Opening and integrating biodiversity data will be the future basis to increase efficiency of monitoring the processes of global change, conservation of nature and saving life on our planet” concluded Dr Vincent Smith, coordinator of the European Union FP7 project ViBRANT, in the framework of which a part of the work has been carried out.

Read Full Post »

If you have recently published data in Dryad, chances are it was in the course of publishing an article at a partner journal that steered you our way.

But you may be aware that Dryad accepts data from any peer-reviewed article in biology or biomedicine.  That includes journals that are not (at least not yet) partners.  In fact, as of the the time of writing, Dryad has data associated with articles in 79 journals, approximately four times the number of partners.

Dryad even accepts data from articles that have already been published.  Now, why might you wish to go to the trouble of rummaging through those old files and putting your legacy data online?

Well, we noticed a while back that some individuals were beginning to do this systematically.  For example, there was a sudden influx of data packages with Frédéric Delsuc’s name on them a little while back.  Delsuc, of the French National Centre for Scientific Research (CNRS) and the Université Montpellier, is a member of an international team of collaborators (from France, Norway, Canada, Spain, Japan, Germany, Switzerland, and the United States) that has been using DNA sequence data to reconstruct the evolutionary history of a wide range of vertebrates and vertebrate relatives, from anteaters to sea squirts.

Giant Anteaters

Giant Anteaters (Myrmecophaga tridactyla). The pup clinging to his mother is Cyrano, who was born at the Smithsonian’s National Zoo in 2009. Photo credit: Mehgan Murphy, CC-BY-NC-ND, http://creativecommons.org/licenses/by-nc-nd/2.0/

So far, Delsuc and his team [1] have deposited data from 20 articles in Dryad. The articles are in partner journals such as Molecular Biology and Evolution, Molecular Phylogenetics and Evolution, Systematic Biology, as well as more general science journals such as Nature, Science, and the Proceedings of the National Academy of Sciences USA.

The articles stretch back to 2002, a time when most new desktop computers were still being outfitted with floppy drives. (Remember those?)

We asked Delsuc what he saw as the advantages to archiving his team’s heritage of legacy data?

We [...] decided in our team to try to systematically submit our datasets to Dryad because we really think they are valuable. Dryad offers a very nice way of archiving the data ensuring their durability over time.

For Delsuc and his team, no more rummaging through old storage devices to find the files when they receive an email request.  No more worrying about the data when  lab or departmental websites move.  They just need to point their colleagues to Dryad.

It has been reported that the number one reason cited when scientists are asked why they have denied their colleagues’ requests for data in the past was the amount of effort required to dig them up [2].  Delsuc’s and his team intuitively understood that, and went back to archive their data before memories faded, storage devices failed, and graduate students moved on.

The downside to archiving legacy data in this way is that an article’s readers won’t immediately know about the existence of the Dryad data package, since the data DOI will not be published within the text. So, while archiving legacy data has its advantages, there is no substitute for depositing the data before the article is published, as Dryad does with the new articles appearing in its partner journals.

To give Delsuc the final word:

It would be great if more and more journals in the field decide to include data deposit in their publication policies.

[1] Equipe Phylogénie et Evolution Moléculaire” (Phylogeny and Molecular Evolution team) of the Institut des Sciences de l’Evolution (Institute of Evolutionary Sciences), part of the CNRS: Centre National de la Recherche Scientifique (French National Centre for Scientific Research) and the Université Montpellier 2 (University of Montpellier 2).

[2] Campbell EG et al. (2002) Data Withholding in Academic Genetics: Evidence From a National Survey. JAMA 287(4):473-480. doi:10.1001/jama.287.4.473

Read Full Post »

We encourage individuals and project teams seeking to comply with data management planning mandates to consider Dryad as the destination repository for published data from their research.  Dryad is not only a widely applicable, best-practice solution for research data management, it is also a quick and easy solution!

Research datasets associated with a publication in any biological or biomedical field are welcome in Dryad, regardless of file type. Archived data files may include spreadsheets or other tables, images or maps, alignments, character matrices, etc.

Data files deposited in Dryad are permanently preservedpublicly available with no legal restrictions on re-use, and uniquely identified for attribution.

Data submission is simple, quick, and easy. Data files may be uploaded to Dryad in any file format, with a short README and a few metadata terms.

Finally, using an established best-practice data repository like Dryad facilitates a simple description in a data management plan. For example, grant applicants can use language like this to describe their intention to archive data in Dryad:

We plan to use the Dryad public repository for the long-term preservation and dissemination of data underlying publications from this funded research project. Data submitted to Dryad is made publicly available upon online publication** of the associated article. All data in Dryad is released to the public domain without legal restrictions on reuse, through a Creative Commons Zero waiver. There is a (legally non-binding) expectation of attribution of the Dryad data record and associated article. A one-time data deposit charge is paid by the authors or the associated journals, which allows Dryad data to be available for download without cost to users.

**Researchers may instead choose to stipulate an embargo period of 1 year.

If your funding agency allows it, don’t forget to budget for data preservation (data submission to Dryad is free through 2011).

Data deposited in Dryad can help researchers meet these policies and expectations:

  • the (US) National Science Foundation requires that data management plans include provisions for data archiving and preservation, and access policies and provisions for secondary use
  • the Wellcome Trust “expects all of its funded researchers to maximise the availability of research data with as few restrictions as possible”
  • the (US) National Institutes of Health data sharing policies state that “Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”
  • the (UK) Medical Research Council policy on data sharing and preservation states: “Where possible, published results should include links to the associated data. Investigators must show how data will be preserved and their strategies for sharing, e.g. by depositing it in a community database.”

Summaries of funding agencies’ data policies can be found here:

Resources on data management & sharing:

Questions about the role of the Dryad repository in data management planning can be directed to the Dryad team.

Sample data file, Gilbert J and Manica A (2010) Data from: Parental care trade-offs and life history relationships in insects. Dryad Digital Repository. doi:10.5061/dryad.1451

Read Full Post »

Credit: adamthelibrarian, from Flickr

This is an important month, because a host of our partner journals are implementing new policies on data archiving, and, in the U.S., the National Science Foundation is asking its new grantees to have explicit data management plans.  There are over 1000 data files from over 50 journals now in Dryad, and much of this content has been submitted only within the past year. Clearly, Dryad’s role in supporting the growing data archiving mandates from journals and funders continues to expand.

New Features
In the past few months, several new features have been added to Dryad.  Users can now save an incomplete submission and come back later to complete it.  They can see a listing of their completed and in progress submissions.  Users can download data citations to their favorite bibliography management programs and upload them to their favorite social bookmarking tools.  A new “faceted search” interface allows users to find data more easily, and also displays related content in other repositories, including ecological and environmental science data (from the Knowledge Network for Biocomplexity) and phylogenetic data (from TreeBASE). To provide an early indication of scientific impact, users can see how often data have been viewed and downloaded.

An important new feature is “handshaking”, which is what we call the process whereby authors upload some of their data to Dryad, and the information is conveyed behind-the-scenes to a specialized repository. The aim of handshaking is to reduce the time and effort need to deposit data when there are different repositories managing different aspects of the data.  Handshaking also enables persistent linkages among data in the different repositories. As a first foray into handshaking, we now offer users the option of initiating a deposit in TreeBASE, the primary repository for published phylogenetic data, whenever a NEXUS file is uploaded to Dryad.  Alternatively, the option is available to deposit in another repository first, and report the identifiers to Dryad to ensure that users can find all the data relevant to a given article.  We will be working in the months ahead to handshake with other specialized repositories required by our partner journals.

See our recent blog post about these features for more details.

Data Deposit in Three Easy Steps: The Movie
Are you looking for a way to show a colleague how straightforward data archiving can be?  We’ve added a short (2-minute) video to the site that walks users through the deposit process in three easy steps.  The video also available at SciVee.

Journals Implement Joint Data Archiving Policy
Starting this month, a number of Dryad partner journals have implemented a Joint Data Archiving Policy that requires, as a condition of publication, that authors deposit the data underlying their article in a public repository.  Some of the journals implementing this policy include: The American Naturalist, Evolution, Evolutionary Applications, Heredity, Journal of Evolutionary Biology, and Molecular Ecology. A recent TREE article by Michael Whitlock suggests how “data generators, data re-users, and journals can maximize the fairness and scientific value of data archiving.”

A growing number of journals now integrate their submission process with Dryad, meaning that the repository and journal exchange information to facilitate the author’s data deposition process and to ensure persistent linkage between articles and data. The current list includes The American Naturalist, The Biological Journal of the Linnean Society, Evolution, Journal of Evolutionary Biology, Journal of Heredity, Molecular Ecology, and Molecular Ecology Resources. And more are on the way (stay tuned).

NSF Data Management Plan Mandate
Starting this month, the U.S. National Science Foundation is requiring grant applicants to provide a data management plan describing how data will be collected, preserved and made available, and these plans will be subject to peer review.  We encourage applicants to leverage Dryad in their data management plans as a solution for the long-term preservation and dissemination of the data associated with their publications.  There are some pointers to resources for data management planning on the Dryad website.

Dryad UK Project
The Joint Information Science Committee (JISC) in the UK has made an award to Dryad and through Oxford University and the British Library to expand the scope of the journals involved, including into the areas of infectious disease and epidemiology, and to create a UK mirror of Dryad.  More information is here and at the Dryad UK site.

New Twitter Feed for Data Deposits
Interested in keeping up with new data available in Dryad?  Follow our Twitter feed (@datadryadnew) or subscribe to our RSS feed. We also Tweet general news about the repository and the world of data science as @datadryad.

Browse and search the repository at http://datadryad.org/
Follow Dryad on Twitter http://twitter.com/datadryad

This blog post is the first issue of the Dryad newsletter, summarizing recent achievements and milestones of the data repository.  If you’d like to receive future newsletters by email, please sign up for the Dryad Users mailing list.

Read Full Post »

Researchers working in data-intensive science, as well as science editors and publishers thinking about data policies, may want to take note of a new article by Michael Whitlock, Data archiving in ecology and evolution: best practices in the current issue of Trends in Ecology & Evolution.

Whitlock has long been a leader in advocating for data archiving and is the current Chair of the Dryad Consortium Board.  In this article he presents concrete suggestions for the what, how and when of data archiving.

But archiving is only half the equation.  Whitlock attempts to articulate sensible guidelines for data reuse, as well. Under what circumstances should researchers contact the original creators of the data set they are re-using, and when is co-authorship appropriate? How should authors properly acknowledge the original creators of the data?

Journals, editors, and publishers have an important role in promoting both data archiving and responsible data reuse.  One problem that merits broader discussion is how journals can conduct peer review so as to prevent data misuse.  Should researchers be given a chance to review manuscripts that report on new results reusing data that they originally published?  Or is it better to avoid the potential for conflict of interest (e.g. “how dare they not replicate my findings!”) and instead recruit independent experts?

Although the article is especially timely for those working in evolutionary biology and ecology, due to the recent adoption of mandatory data archiving at many of the leading journals in the field, these best practice recommendations are relevant across the sciences.

Michael C. Whitlock (2011) Data archiving in ecology and evolution: best practices, Trends in Ecology & Evolution,  26 (2): 61-65.  doi:10.1016/j.tree.2010.11.006.

Read Full Post »

Are you curious about what’s involved in depositing data in Dryad? looking for a quick way to show colleagues how straightforward data archiving can be?  Dryad’s new 2-minute video demonstrates the data deposit process from start to finish.

How to deposit data in Dryad

The video is embedded on the Dryad website, and also available on SciVee. Feel free to link to it and share it with colleagues.

Read Full Post »

It’s January 2011– do you know where your data are? 

It would be a good idea to know and be ready to deposit your files in a data repository, because this month marks the implementation of the Joint Data Archiving Policy.  The policy, endorsed by a consortium of prominent journals and societies, states that journals will require

as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive.

The policy can be customized by each journal, and enables both embargoes and editorial discretion to make special exceptions. Blanket exemptions apply to sensitive data such as identifiable human records and endangered species localities.

The journals (and corresponding societies) implementing the policy this month are:

  • The American Naturalist (American Society of Naturalists)
  • Evolution (Society for the Study of Evolution)
  • Evolutionary Applications
  • Heredity (The Genetics Society)
  • Journal of Evolutionary Biology (European Society for Evolutionary Biology)
  • Molecular Biology and Evolution (Society for Molecular Biology and Evolution)
  • Molecular Ecology
  • Systematic Biology (Society for Systematic Biology)

A sampling of the revised Instructions to Authors includes:

  • The American Naturalist: “The American Naturalist requires authors to deposit the data associated with accepted papers in a public archive. For gene sequence data and phylogenetic trees, deposition in GenBank or TreeBASE, respectively, is required. There are many possible archives that may suit a particular data set, including the Dryad repository for ecological and evolutionary biology data (http://datadryad.org). All accession numbers for GenBank, TreeBASE, and Dryad must be included in accepted manuscripts before they go to Production. Any impediments to data sharing should be brought to the attention of the editors at the time of submission.”
  • Journal of Evolutionary BiologyThe editors and publisher of this journal expect authors to make the data underlying published articles available. An investigator who feels that reasonable requests have not been met by the authors should correspond with the Editor-in-Chief. Authors must use the appropriate database to deposit detailed information supplementing submitted papers, and quote the accession number in their manuscripts.”
  • Molecular Ecology: “Data Accessibility: To enable readers to locate archived data from Molecular Ecology papers, as of January 2011 we will require that authors include a ‘Data Accessibility’ section after their references. This should list the data base and respective accession numbers for all data from the manuscript that has been made publicly available…. Please note that this section must be complete prior to the submission of the final version of your manuscript. Papers lacking this section will not be sent to Production.”

At Dryad, we have been working for some time now with editors and publishers at these and other partner journals to support the implementation of this policy. If you submit an article to a “JDAP journal,” you will be invited to simultaneously submit your data to Dryad. This may occur either prior to review or, depending on the journal, at the time your article is accepted. Dryad and the journal communicate behind the scenes to make it as easy as possible for you to deposit your data, and also ensure that a permanent, resolvable, and citable data identifier is published in the final article.  That way, in the future, no one need be frightened by the question “do you know where your data are?”

Read Full Post »

Asclepius statue

Statue of Asclepius, the Greek God of Medicine, from the Museum of Epidaurus Theatre. Image from: Wikimedia Commons, Licensed under: GFDL 1.3.

An international group of major health research funders have made a “joint statement of purpose” announcing, in strong and clear terms, their intent to promote greater sharing of research data.

As public and charitable funders of this research, we believe that making research data sets available to investigators beyond the original research team in a timely and responsible manner, subject to appropriate safeguards, will generate three key benefits: faster progress in improving health, better value for money, and higher quality science.

The 17 signatories (so far) include many major governmental funding agencies (e.g. US National Institutes of Health, the Wellcome Trust, The Centers for Disease Control, the UK Medical Research Council, Australia’s National Health and Medical Research Council, the Canadian Institutes of Health Research, France’s National Institute for Health and Medical Research, and the German Research Foundation), private foundations (e.g. the Bill & Melinda Gates Foundation and the Hewlett Foundation) and even international organizations such as the World Bank.  The group has invited additional funders to sign on to the statement.

Some of the long-term goals articulated in the document are near and dear to our hearts, in particular:

To the extent possible, datasets underpinning research papers in peer-reviewed journals are archived and made available to other researchers in a clear and transparent manner.

and

The human and technical resources and infrastructures needed to support data management, archiving and access are developed and supported for long-term sustainability.

An accompanying comment in The Lancet by Mark Walport of the Wellcome Trust and Paul Brest of the Hewlett Foundation (Sharing research data to improve public health, DOI:10.1016/S0140-6736(10)62234-9) raises some of the hard, but by now familiar, questions that will drive the approaches taken by the funding organizations: how to balance the rights and responsibilities of data generators and data users; how to safeguard and further the interests of the data subjects themselves; and how to ensure that the benefits of data sharing justify the expense and burden involved.

It will be very interesting to watch how the funding organizations work singly and in concert to overcome decades of cultural familiarity with data hoarding in the health sciences and, as Walport and Brent put it, “mend their ways.”

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 6,789 other followers