Feeds:
Posts
Comments

Posts Tagged ‘data archiving’

Our guest post today is from Mohamed Noor of Duke University, president of the American Genetic Association. The AGA is a scholarly society dating back to 1903.  AGA, together with Oxford University Press, publishes the Journal of Heredity, which is a charter member in the Dryad organization and one of the first journals to integrate manuscript and data submission with the repository.  The society just held their annual symposium in Durham, North Carolina, not so far from Dryad’s NESCent headquarters, and has some excellent news to report from the Council meeting.

The American Genetic Association is pleased to announce that it has now fully adopted the Joint Data Archiving Policy (JDAP) for the Journal of Heredity.  The Journal of Heredity had previously required that newly reported nucleotide or amino acid sequences, and structural coordinates, be submitted to appropriate public databases. For other forms of data, the Journal “endorsed the principles of the Joint Data Archiving Policy (JDAP) in encouraging all authors to archive primary datasets in an appropriate public archive, such as Dryad, TreeBASE, or the Knowledge Network for Biocomplexity.”

This voluntary archiving policy was facilitated by the direct link between the Journal of Heredity and Dryad, in effect since February 2010.

To further support data-sharing and data access, in July 2012, the AGA Council voted unanimously to make data archiving a requirement for publication, under the terms specified in the JDAP.

The requirement will take effect by January 1, 2013. The American Genetic Association also recognizes the vast investment of individual researchers in generating and curating large datasets. Consequently, we recommend that this investment be respected in secondary analyses or meta-analyses in a gracious collaborative spirit.

Many other leading journals in ecology and evolutionary biology have adopted policies modeled on JDAP over the past two years, and other journals are invited to consider it as a policy that has attracted wide support among scientists.

Read Full Post »

Dryad is delighted to join with PLOS today to announce our partnership with PLOS Biologyas described here on the official PLOS Biology blog, Biologue.  As the first Public Library of Science (PLOS) journal to partner with Dryad to integrate manuscript submission, “PLOS Biology can offer authors a seamless tying together of an article with its underlying data; [and] can also provide confidential access for editors and reviewers to data associated with articles under review.”
PLoS Biology - www.plosbiology.org

Here’s how it works: During manuscript evaluation, PLOS Biology invites authors to deposit the underlying data files in Dryad, sending them a link to Dryad which enables a streamlined upload process (no need to enter the article details).  Authors may deposit complex and varied data types in multiple formats, and these files are then accessible to editors and reviewers by anonymous and secure access during the manuscript review process.  Behind the scenes, the journal’s editorial system and the Dryad repository exchange metadata, ensuring that upon publication, the article links to the associated data in Dryad, and permanently connecting the published article with its securely archived, publicly available data.

Dr. Theodora Bloom, Chief Editor, PLOS Biology, mentions that journals “are uniquely well-placed to help researchers ensure that all data underlying a study are made available alongside any published articles.”

We welcome PLOS Biology authors and editors to Dryad, and look forward to extending this partnership to other PLOS journals.

Read Full Post »

Christopher Pirrone excavating an odontocete skull (photo by Robert Boessenecker)

Perhaps it’s understandable that paleontologists are committed to preserving the scientific record, since they spend a lot of time and energy finding and extracting shreds of evidence millions of years old.  Now, thanks to a partnership between Dryad and The Paleontological Society announced last year [1], coupled with strong data archiving policies adopted by two of its journals (Paleobiology and the Journal of Paleontology), a rich trove of data will be available for future researchers to unearth from Dryad.

For both journals, authors are being instructed to deposit the underlying data at the time their manuscript is submitted, so that editors and referees will be able to review it prior to acceptance.  Once published on Dryad, the data will be independently discoverable and citable, while at the same time prominently linked both to and from the original article.  Researchers are able to track the reuse impact of their data, independent of the citation impact of their article, by monitoring downloads from Dryad.

Preserved for ages.

Smilodon, by Charles Knight (1905), from a mural at the American Museum of Natural History.

Here’s an example from a recent issue of Paleobiology to sink your teeth into:

Article: Meachen-Samuels JA (2012) Morphological convergence of the prey-killing arsenal of sabertooth predators. Paleobiology 38(1): 1-14. doi:10.1666/10036.1

Data: Meachen-Samuels JA (2012) Data from: Morphological convergence of the prey-killing arsenal of sabertooth predators. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.h58q6

References:

[1]  Callaway E (2011) Fossil data enter the web period. Nature 472, 150. http://dx.doi.org/10.1038/472150a

Read Full Post »

Scaling up. Courtesy of Swamibu via flickr, CC-BY-NC

The US National Science Foundation, through its Advances in Biological Informatics program, has announced a new award of $2.4M over four years to Duke University (NESCent), the University of North Carolina Chapel Hill (Metadata Research Center), and North Carolina State University (Digital Library).

The award will enable Dryad to scale up its technical infrastructure to support the rapidly expanding user base of journals and researchers, ensure that the repository is meeting the needs of that user base, and to complete the transition to a financially independent non-profit organization.

This is one of a new breed of Development Awards being made by ABI, in which the review criteria judge the ability of the project to produce “robust, broadly-adopted cyberinfrastructure” with an emphasis on “user engagement, design quality, engineering practices, management plan, and dissemination”.

Repositories such as Dryad enable researchers to comply with funding agency expectations for long-term data preservation and availability, and we are grateful to NSF for its continuing support of this mission.

Read Full Post »

1E+3

Fig 1. Helen of Troy, detail from an Attic red-figure krater, c. 450–440 BC

It is said that a picture is worth a thousand words and that Helen of Troy (Fig 1) had a face that launched a thousand ships.  Why is the number 1000 significant to those of us at Dryad today?  (Especially since its place in literature is ultimately an accident of our decimal number system [1]).

The reason is that Dryad released its 1000th data package.  The lucky submission is: Hager R, Cheverud JM, Wolf JB (2011) Data from: Genotype dependent responses to levels of sibling competition over maternal resources in mice. doi:10.5061/dryad.8qq3p0d8  [2]. This (arbitrary, but see [3]) milestone has put us in a reflective mood, and so here we take the opportunity to consider what it means.

First, it encourages us that Dryad’s multipronged approach to making data available for reuse (raising awareness of the issues, coordinating data archiving policy across journals, providing a user-friendly submission interface, paying attention to the incentives of researchers) is bearing fruit.  As a result of this strategy, the rate of submissions continues to grow; over 60% of submissions are from the past nine months alone.  Since a picture is worth a thousand words, see Fig 2.

Figure 2. Data packages submitted to Dryad through September 2011

We are mindful will take some time before we can measure the impact of the availability of these data for reuse, but there are encouraging signs from the frequency with which data are being downloaded.  We will discuss those results in a separate post.

What else can we learn from these first 1000 submissions?  One is the importance of making data submission integral to publication. While there are 88 different journals in which the corresponding articles appear, about three quarters of the submissions come from the first nine journals that worked to integrate manuscript and data submission with Dryad [4].  Journal policy matters, and the enthusiasm with which journals implement policy matters.

As far as disciplinary diversity goes, the first 1000 submissions are dominated by journals in evolutionary biology and ecology.  Dryad’s first biomedical journal partner, BMJ Open, was integrated within the past few months, and as a result of many other new journal partnerships being developed, we expect submissions to the repository to represent a much broader array of basic and applied biosciences in the near future.

Interestingly, most of the deposits are relatively small in size. Counting all files in a data package together, almost 80% of data packages are less than one megabyte.  Furthermore, the majority of data packages contain only one data file and the mean is a little less than two and a half.  As one might expect, many of the files are spreadsheets or in tabular text format.  Thus, the files are rich in information but not so difficult to transfer or store.

We are pleasantly surprised to report that most authors, most of the time, see the value in having their data released at the same time as the article is published.  Authors are making their data available immediately upon publication, or earlier, for over 90% of data files.  In nearly all cases where files are put under embargo, authors choose to release them one-year post-publication rather than requesting a longer embargo from the journal.

Thomson Reuters indexes more than half a million abstracts annually in BIOSIS.  A difficult-to-estimate, but undoubtedly substantial, fraction of this literature reports on data that cannot be, or is not, archived in a specialized public data repository.  This helps put Dryad’s 1000 data packages in perspective.   As a discipline, we still have a long way to go to preserve and make available for reuse all the “published” data that has no home.  But every data package that is submitted to Dryad is a little victory for the transparency and robustness of science.

So here’s to the first thousand.  May they have plenty of company in the coming years.

Footnotes:

  1. Things might have turned out very differently judging by the presence early vertebrate fossils with more than five digits (see http://en.wikipedia.org/wiki/Polydactyly_in_early_tetrapods)
  2. To celebrate, we are sending a Dryad-logo coffee mug to Dr. Reinmar Hager, who submitted the 1000th data package.
  3. Random cool fact about the number 1000.  It is “the smallest number that generates three primes in the fastest way possible by concatenation of decremented numbers (1000999, 1000999998997, and 1000999998997996995994993 are prime) … [excluding] the number itself” (see http://primes.utm.edu/curios/page.php/1000.html).
  4. This includes a collection of legacy data packages from the Systematic Biology archives that was submitted en masse to Dryad in mid-2009.

Read Full Post »

Dryad is pleased to welcome BMJ Open as a new partner journal, reflecting the recently expanded scope of repository to be inclusive of all of basic and applied biosciences, including medicine. BMJ Open is a new online-only, open access journal from the esteemed London-based BMJ Group.  It is dedicated to publishing medical research from all disciplines and therapeutic areas, utilizing fully open peer review and immediate online publication.

BMJ Open authors are now being strongly encouraged to deposit the data underlying their articles in Dryad or a more specialized repository, as appropriate.  Authors submitting articles to the journal will benefit from Dryad’s journal submission integration, the process by which data deposit is streamlined for authors through behind-the-scenes communication between the journal and the repository.

An extremely important issue with archiving medical data is, of course, the need to protect patient privacy. To assist its authors, BMJ Open is providing special guidance on data sharing.  Authors must be able to release data to the public domain as with all data in Dryad, and the repository will err on the side of caution by turning back any data that may compromise patient privacy.

To quote from the BMJ Group press release:

Data sharing aims to help scientists and doctors validate and scrutinise researchers’ findings in a bid to prevent fraud and eradicate the kind of selective reporting that has enabled some treatments to acquire regulatory approval, based on incomplete and biased data. In some cases this lack of transparency has prompted the subsequent restriction or withdrawal of certain treatments because of patient safety or effectiveness concerns, which were already evident in the unpublished data.  Data repositories also allow researchers to develop new methods of analysis and use the data to answer questions that the original researchers have not thought of. They also facilitate the acquisition of data for meta analysis (more in-depth comparative reviews).

Commenting on the move, Dr Trish Groves, editor in chief of BMJ Open, said: “Since launch, BMJ Open has championed transparency in medical research through open peer review, open access, and full reporting of studies’ methods and results, all exemplified by last week’s paper on the safety (or not) of medical devices (doi:10.5061/dryad.585t4)…”

This data package in Dryad, which illustrates the tremendous value of medical data for informing medical policy and practice without compromising patient privacy, is available at:

  • Heneghan C, Thompson M, Billingsley M, Cohen D (2011) Data from: Medical-device recalls in the UK and the device-regulation process: retrospective review of safety notices and alerts. Dryad Digital Repository. doi:10.5061/dryad.585t4

Groves goes on to say

We strongly encourage authors to share their datasets, and now we’re delighted to be making that easier to do, with the help of DryadUK.

Kudos to the Dryad UK project team, based at the British Library, for facilitating this pioneering partnership.

Read Full Post »

Dryad is happy to announce a new initiative with Pensoft Publishers, the pioneering publisher behind ZooKeys and other rapid-publication open access journals, including BioRisk, Comparative Cytogenetics, International Journal of Myriapodology, Journal of Hymenoptera Research, NeoBiota, PhytoKeys, and Subterranean Biology.  Dryad is working with Pensoft to support publication of data papers in the area of biodiversity, together with the Global Biodiversity Information Facility and the Barcode of Life.  Through this effort, we aim to make the data publishing experience as smooth and rewarding as possible for authors, while at the same time making sure these important data are vetted through peer review and available for reuse in public repositories.  The full press release from Pensoft is below.

Data publishing policies and guidelines for biodiversity data published by Pensoft

Pensoft Publishers announced a data publishing project for biodiversity data in response to the increasing demands from institutions and scientists to open scientific data to anyone who would be interested to use them.

“An opinion survey amongst the authors, readers and editors of the Pensoft journal ZooKeys carried out in April convinced us that the majority of participants (84 %) are willing to publish their data, so that to make them available to anyone to use, share or integrate with other data” said Dr Lyubomir Penev, managing director of Pensoft Publishers. Among the most important incentives to publish data, the scientists mentioned  that  “open data increases transparency and the overall quality of science, the potential for collaborative research as well as an opportunity to increase academic credit in the form of citations. Therefore, providing a service to ensure a permanent publication record for published data is of key importance for the success of the project”, adds Dr Penev.

The core of the project is the concept of the “Data Paper” developed in a cooperation with the Global Biodiversity Information Facility (GBIF). Data Papers are peer-reviewed scholarly publications that describe the published datasets and provide an opportunity to data authors to receive the academic credit for their efforts. Currently, Pensoft offers the opportunity to published Data papers describing biodiversity data, Barcode of Life genome data and biodiversity-related software tools, such as interactive keys and others.

Pensoft reached an agreement for cooperation in data hosting and developing of data publishing workflows with the GBIF, the Dryad Data Repository and the Consortium for Barcode of Life.

“Data publishing becomes increasingly important and already affects the policies of the world’s leading science funding frameworks and organizations. Opening and integrating biodiversity data will be the future basis to increase efficiency of monitoring the processes of global change, conservation of nature and saving life on our planet” concluded Dr Vincent Smith, coordinator of the European Union FP7 project ViBRANT, in the framework of which a part of the work has been carried out.

Read Full Post »

« Newer Posts - Older Posts »