It is said that a picture is worth a thousand words and that Helen of Troy (Fig 1) had a face that launched a thousand ships. Why is the number 1000 significant to those of us at Dryad today? (Especially since its place in literature is ultimately an accident of our decimal number system ).
First, it encourages us that Dryad’s multipronged approach to making data available for reuse (raising awareness of the issues, coordinating data archiving policy across journals, providing a user-friendly submission interface, paying attention to the incentives of researchers) is bearing fruit. As a result of this strategy, the rate of submissions continues to grow; over 60% of submissions are from the past nine months alone. Since a picture is worth a thousand words, see Fig 2.
We are mindful will take some time before we can measure the impact of the availability of these data for reuse, but there are encouraging signs from the frequency with which data are being downloaded. We will discuss those results in a separate post.
What else can we learn from these first 1000 submissions? One is the importance of making data submission integral to publication. While there are 88 different journals in which the corresponding articles appear, about three quarters of the submissions come from the first nine journals that worked to integrate manuscript and data submission with Dryad . Journal policy matters, and the enthusiasm with which journals implement policy matters.
As far as disciplinary diversity goes, the first 1000 submissions are dominated by journals in evolutionary biology and ecology. Dryad’s first biomedical journal partner, BMJ Open, was integrated within the past few months, and as a result of many other new journal partnerships being developed, we expect submissions to the repository to represent a much broader array of basic and applied biosciences in the near future.
Interestingly, most of the deposits are relatively small in size. Counting all files in a data package together, almost 80% of data packages are less than one megabyte. Furthermore, the majority of data packages contain only one data file and the mean is a little less than two and a half. As one might expect, many of the files are spreadsheets or in tabular text format. Thus, the files are rich in information but not so difficult to transfer or store.
We are pleasantly surprised to report that most authors, most of the time, see the value in having their data released at the same time as the article is published. Authors are making their data available immediately upon publication, or earlier, for over 90% of data files. In nearly all cases where files are put under embargo, authors choose to release them one-year post-publication rather than requesting a longer embargo from the journal.
Thomson Reuters indexes more than half a million abstracts annually in BIOSIS. A difficult-to-estimate, but undoubtedly substantial, fraction of this literature reports on data that cannot be, or is not, archived in a specialized public data repository. This helps put Dryad’s 1000 data packages in perspective. As a discipline, we still have a long way to go to preserve and make available for reuse all the “published” data that has no home. But every data package that is submitted to Dryad is a little victory for the transparency and robustness of science.
So here’s to the first thousand. May they have plenty of company in the coming years.
- Things might have turned out very differently judging by the presence early vertebrate fossils with more than five digits (see http://en.wikipedia.org/wiki/Polydactyly_in_early_tetrapods)
- To celebrate, we are sending a Dryad-logo coffee mug to Dr. Reinmar Hager, who submitted the 1000th data package.
- Random cool fact about the number 1000. It is “the smallest number that generates three primes in the fastest way possible by concatenation of decremented numbers (1000999, 1000999998997, and 1000999998997996995994993 are prime) … [excluding] the number itself” (see http://primes.utm.edu/curios/page.php/1000.html).
- This includes a collection of legacy data packages from the Systematic Biology archives that was submitted en masse to Dryad in mid-2009.