Feeds:
Posts
Comments

Posts Tagged ‘data archiving’

Dryad is delighted to join with PLOS today to announce our partnership with PLOS Biologyas described here on the official PLOS Biology blog, Biologue.  As the first Public Library of Science (PLOS) journal to partner with Dryad to integrate manuscript submission, “PLOS Biology can offer authors a seamless tying together of an article with its underlying data; [and] can also provide confidential access for editors and reviewers to data associated with articles under review.”
PLoS Biology - www.plosbiology.org

Here’s how it works: During manuscript evaluation, PLOS Biology invites authors to deposit the underlying data files in Dryad, sending them a link to Dryad which enables a streamlined upload process (no need to enter the article details).  Authors may deposit complex and varied data types in multiple formats, and these files are then accessible to editors and reviewers by anonymous and secure access during the manuscript review process.  Behind the scenes, the journal’s editorial system and the Dryad repository exchange metadata, ensuring that upon publication, the article links to the associated data in Dryad, and permanently connecting the published article with its securely archived, publicly available data.

Dr. Theodora Bloom, Chief Editor, PLOS Biology, mentions that journals “are uniquely well-placed to help researchers ensure that all data underlying a study are made available alongside any published articles.”

We welcome PLOS Biology authors and editors to Dryad, and look forward to extending this partnership to other PLOS journals.

Read Full Post »

Christopher Pirrone excavating an odontocete skull (photo by Robert Boessenecker)

Perhaps it’s understandable that paleontologists are committed to preserving the scientific record, since they spend a lot of time and energy finding and extracting shreds of evidence millions of years old.  Now, thanks to a partnership between Dryad and The Paleontological Society announced last year [1], coupled with strong data archiving policies adopted by two of its journals (Paleobiology and the Journal of Paleontology), a rich trove of data will be available for future researchers to unearth from Dryad.

For both journals, authors are being instructed to deposit the underlying data at the time their manuscript is submitted, so that editors and referees will be able to review it prior to acceptance.  Once published on Dryad, the data will be independently discoverable and citable, while at the same time prominently linked both to and from the original article.  Researchers are able to track the reuse impact of their data, independent of the citation impact of their article, by monitoring downloads from Dryad.

Preserved for ages.

Smilodon, by Charles Knight (1905), from a mural at the American Museum of Natural History.

Here’s an example from a recent issue of Paleobiology to sink your teeth into:

Article: Meachen-Samuels JA (2012) Morphological convergence of the prey-killing arsenal of sabertooth predators. Paleobiology 38(1): 1-14. doi:10.1666/10036.1

Data: Meachen-Samuels JA (2012) Data from: Morphological convergence of the prey-killing arsenal of sabertooth predators. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.h58q6

References:

[1]  Callaway E (2011) Fossil data enter the web period. Nature 472, 150. http://dx.doi.org/10.1038/472150a

Read Full Post »

Scaling up. Courtesy of Swamibu via flickr, CC-BY-NC

The US National Science Foundation, through its Advances in Biological Informatics program, has announced a new award of $2.4M over four years to Duke University (NESCent), the University of North Carolina Chapel Hill (Metadata Research Center), and North Carolina State University (Digital Library).

The award will enable Dryad to scale up its technical infrastructure to support the rapidly expanding user base of journals and researchers, ensure that the repository is meeting the needs of that user base, and to complete the transition to a financially independent non-profit organization.

This is one of a new breed of Development Awards being made by ABI, in which the review criteria judge the ability of the project to produce “robust, broadly-adopted cyberinfrastructure” with an emphasis on “user engagement, design quality, engineering practices, management plan, and dissemination”.

Repositories such as Dryad enable researchers to comply with funding agency expectations for long-term data preservation and availability, and we are grateful to NSF for its continuing support of this mission.

Read Full Post »

1E+3

Fig 1. Helen of Troy, detail from an Attic red-figure krater, c. 450–440 BC

It is said that a picture is worth a thousand words and that Helen of Troy (Fig 1) had a face that launched a thousand ships.  Why is the number 1000 significant to those of us at Dryad today?  (Especially since its place in literature is ultimately an accident of our decimal number system [1]).

The reason is that Dryad released its 1000th data package.  The lucky submission is: Hager R, Cheverud JM, Wolf JB (2011) Data from: Genotype dependent responses to levels of sibling competition over maternal resources in mice. doi:10.5061/dryad.8qq3p0d8  [2]. This (arbitrary, but see [3]) milestone has put us in a reflective mood, and so here we take the opportunity to consider what it means.

First, it encourages us that Dryad’s multipronged approach to making data available for reuse (raising awareness of the issues, coordinating data archiving policy across journals, providing a user-friendly submission interface, paying attention to the incentives of researchers) is bearing fruit.  As a result of this strategy, the rate of submissions continues to grow; over 60% of submissions are from the past nine months alone.  Since a picture is worth a thousand words, see Fig 2.

Figure 2. Data packages submitted to Dryad through September 2011

We are mindful will take some time before we can measure the impact of the availability of these data for reuse, but there are encouraging signs from the frequency with which data are being downloaded.  We will discuss those results in a separate post.

What else can we learn from these first 1000 submissions?  One is the importance of making data submission integral to publication. While there are 88 different journals in which the corresponding articles appear, about three quarters of the submissions come from the first nine journals that worked to integrate manuscript and data submission with Dryad [4].  Journal policy matters, and the enthusiasm with which journals implement policy matters.

As far as disciplinary diversity goes, the first 1000 submissions are dominated by journals in evolutionary biology and ecology.  Dryad’s first biomedical journal partner, BMJ Open, was integrated within the past few months, and as a result of many other new journal partnerships being developed, we expect submissions to the repository to represent a much broader array of basic and applied biosciences in the near future.

Interestingly, most of the deposits are relatively small in size. Counting all files in a data package together, almost 80% of data packages are less than one megabyte.  Furthermore, the majority of data packages contain only one data file and the mean is a little less than two and a half.  As one might expect, many of the files are spreadsheets or in tabular text format.  Thus, the files are rich in information but not so difficult to transfer or store.

We are pleasantly surprised to report that most authors, most of the time, see the value in having their data released at the same time as the article is published.  Authors are making their data available immediately upon publication, or earlier, for over 90% of data files.  In nearly all cases where files are put under embargo, authors choose to release them one-year post-publication rather than requesting a longer embargo from the journal.

Thomson Reuters indexes more than half a million abstracts annually in BIOSIS.  A difficult-to-estimate, but undoubtedly substantial, fraction of this literature reports on data that cannot be, or is not, archived in a specialized public data repository.  This helps put Dryad’s 1000 data packages in perspective.   As a discipline, we still have a long way to go to preserve and make available for reuse all the “published” data that has no home.  But every data package that is submitted to Dryad is a little victory for the transparency and robustness of science.

So here’s to the first thousand.  May they have plenty of company in the coming years.

Footnotes:

  1. Things might have turned out very differently judging by the presence early vertebrate fossils with more than five digits (see http://en.wikipedia.org/wiki/Polydactyly_in_early_tetrapods)
  2. To celebrate, we are sending a Dryad-logo coffee mug to Dr. Reinmar Hager, who submitted the 1000th data package.
  3. Random cool fact about the number 1000.  It is “the smallest number that generates three primes in the fastest way possible by concatenation of decremented numbers (1000999, 1000999998997, and 1000999998997996995994993 are prime) … [excluding] the number itself” (see http://primes.utm.edu/curios/page.php/1000.html).
  4. This includes a collection of legacy data packages from the Systematic Biology archives that was submitted en masse to Dryad in mid-2009.

Read Full Post »

Dryad is pleased to welcome BMJ Open as a new partner journal, reflecting the recently expanded scope of repository to be inclusive of all of basic and applied biosciences, including medicine. BMJ Open is a new online-only, open access journal from the esteemed London-based BMJ Group.  It is dedicated to publishing medical research from all disciplines and therapeutic areas, utilizing fully open peer review and immediate online publication.

BMJ Open authors are now being strongly encouraged to deposit the data underlying their articles in Dryad or a more specialized repository, as appropriate.  Authors submitting articles to the journal will benefit from Dryad’s journal submission integration, the process by which data deposit is streamlined for authors through behind-the-scenes communication between the journal and the repository.

An extremely important issue with archiving medical data is, of course, the need to protect patient privacy. To assist its authors, BMJ Open is providing special guidance on data sharing.  Authors must be able to release data to the public domain as with all data in Dryad, and the repository will err on the side of caution by turning back any data that may compromise patient privacy.

To quote from the BMJ Group press release:

Data sharing aims to help scientists and doctors validate and scrutinise researchers’ findings in a bid to prevent fraud and eradicate the kind of selective reporting that has enabled some treatments to acquire regulatory approval, based on incomplete and biased data. In some cases this lack of transparency has prompted the subsequent restriction or withdrawal of certain treatments because of patient safety or effectiveness concerns, which were already evident in the unpublished data.  Data repositories also allow researchers to develop new methods of analysis and use the data to answer questions that the original researchers have not thought of. They also facilitate the acquisition of data for meta analysis (more in-depth comparative reviews).

Commenting on the move, Dr Trish Groves, editor in chief of BMJ Open, said: “Since launch, BMJ Open has championed transparency in medical research through open peer review, open access, and full reporting of studies’ methods and results, all exemplified by last week’s paper on the safety (or not) of medical devices (doi:10.5061/dryad.585t4)…”

This data package in Dryad, which illustrates the tremendous value of medical data for informing medical policy and practice without compromising patient privacy, is available at:

  • Heneghan C, Thompson M, Billingsley M, Cohen D (2011) Data from: Medical-device recalls in the UK and the device-regulation process: retrospective review of safety notices and alerts. Dryad Digital Repository. doi:10.5061/dryad.585t4

Groves goes on to say

We strongly encourage authors to share their datasets, and now we’re delighted to be making that easier to do, with the help of DryadUK.

Kudos to the Dryad UK project team, based at the British Library, for facilitating this pioneering partnership.

Read Full Post »

Dryad is happy to announce a new initiative with Pensoft Publishers, the pioneering publisher behind ZooKeys and other rapid-publication open access journals, including BioRisk, Comparative Cytogenetics, International Journal of Myriapodology, Journal of Hymenoptera Research, NeoBiota, PhytoKeys, and Subterranean Biology.  Dryad is working with Pensoft to support publication of data papers in the area of biodiversity, together with the Global Biodiversity Information Facility and the Barcode of Life.  Through this effort, we aim to make the data publishing experience as smooth and rewarding as possible for authors, while at the same time making sure these important data are vetted through peer review and available for reuse in public repositories.  The full press release from Pensoft is below.

Data publishing policies and guidelines for biodiversity data published by Pensoft

Pensoft Publishers announced a data publishing project for biodiversity data in response to the increasing demands from institutions and scientists to open scientific data to anyone who would be interested to use them.

“An opinion survey amongst the authors, readers and editors of the Pensoft journal ZooKeys carried out in April convinced us that the majority of participants (84 %) are willing to publish their data, so that to make them available to anyone to use, share or integrate with other data” said Dr Lyubomir Penev, managing director of Pensoft Publishers. Among the most important incentives to publish data, the scientists mentioned  that  “open data increases transparency and the overall quality of science, the potential for collaborative research as well as an opportunity to increase academic credit in the form of citations. Therefore, providing a service to ensure a permanent publication record for published data is of key importance for the success of the project”, adds Dr Penev.

The core of the project is the concept of the “Data Paper” developed in a cooperation with the Global Biodiversity Information Facility (GBIF). Data Papers are peer-reviewed scholarly publications that describe the published datasets and provide an opportunity to data authors to receive the academic credit for their efforts. Currently, Pensoft offers the opportunity to published Data papers describing biodiversity data, Barcode of Life genome data and biodiversity-related software tools, such as interactive keys and others.

Pensoft reached an agreement for cooperation in data hosting and developing of data publishing workflows with the GBIF, the Dryad Data Repository and the Consortium for Barcode of Life.

“Data publishing becomes increasingly important and already affects the policies of the world’s leading science funding frameworks and organizations. Opening and integrating biodiversity data will be the future basis to increase efficiency of monitoring the processes of global change, conservation of nature and saving life on our planet” concluded Dr Vincent Smith, coordinator of the European Union FP7 project ViBRANT, in the framework of which a part of the work has been carried out.

Read Full Post »

If you have recently published data in Dryad, chances are it was in the course of publishing an article at a partner journal that steered you our way.

But you may be aware that Dryad accepts data from any peer-reviewed article in biology or biomedicine.  That includes journals that are not (at least not yet) partners.  In fact, as of the the time of writing, Dryad has data associated with articles in 79 journals, approximately four times the number of partners.

Dryad even accepts data from articles that have already been published.  Now, why might you wish to go to the trouble of rummaging through those old files and putting your legacy data online?

Well, we noticed a while back that some individuals were beginning to do this systematically.  For example, there was a sudden influx of data packages with Frédéric Delsuc’s name on them a little while back.  Delsuc, of the French National Centre for Scientific Research (CNRS) and the Université Montpellier, is a member of an international team of collaborators (from France, Norway, Canada, Spain, Japan, Germany, Switzerland, and the United States) that has been using DNA sequence data to reconstruct the evolutionary history of a wide range of vertebrates and vertebrate relatives, from anteaters to sea squirts.

Giant Anteaters

Giant Anteaters (Myrmecophaga tridactyla). The pup clinging to his mother is Cyrano, who was born at the Smithsonian’s National Zoo in 2009. Photo credit: Mehgan Murphy, CC-BY-NC-ND, http://creativecommons.org/licenses/by-nc-nd/2.0/

So far, Delsuc and his team [1] have deposited data from 20 articles in Dryad. The articles are in partner journals such as Molecular Biology and Evolution, Molecular Phylogenetics and Evolution, Systematic Biology, as well as more general science journals such as Nature, Science, and the Proceedings of the National Academy of Sciences USA.

The articles stretch back to 2002, a time when most new desktop computers were still being outfitted with floppy drives. (Remember those?)

We asked Delsuc what he saw as the advantages to archiving his team’s heritage of legacy data?

We [...] decided in our team to try to systematically submit our datasets to Dryad because we really think they are valuable. Dryad offers a very nice way of archiving the data ensuring their durability over time.

For Delsuc and his team, no more rummaging through old storage devices to find the files when they receive an email request.  No more worrying about the data when  lab or departmental websites move.  They just need to point their colleagues to Dryad.

It has been reported that the number one reason cited when scientists are asked why they have denied their colleagues’ requests for data in the past was the amount of effort required to dig them up [2].  Delsuc’s and his team intuitively understood that, and went back to archive their data before memories faded, storage devices failed, and graduate students moved on.

The downside to archiving legacy data in this way is that an article’s readers won’t immediately know about the existence of the Dryad data package, since the data DOI will not be published within the text. So, while archiving legacy data has its advantages, there is no substitute for depositing the data before the article is published, as Dryad does with the new articles appearing in its partner journals.

To give Delsuc the final word:

It would be great if more and more journals in the field decide to include data deposit in their publication policies.

[1] Equipe Phylogénie et Evolution Moléculaire” (Phylogeny and Molecular Evolution team) of the Institut des Sciences de l’Evolution (Institute of Evolutionary Sciences), part of the CNRS: Centre National de la Recherche Scientifique (French National Centre for Scientific Research) and the Université Montpellier 2 (University of Montpellier 2).

[2] Campbell EG et al. (2002) Data Withholding in Academic Genetics: Evidence From a National Survey. JAMA 287(4):473-480. doi:10.1001/jama.287.4.473

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 6,789 other followers