Feeds:
Posts
Comments

Archive for the ‘New features’ Category

We are pleased to share below the text of a press release from Elsevier, announcing use of this image to direct viewers of online articles to the underlying data in Dryad. Dryad_web_banner_small_v4eSee below for sample articles; the image displays on the right sidebar, under Applications and Tools. This Dryad widget may be used by any publisher or journal to facilitate access to related publicly available data that authors have archived in Dryad.

Elsevier, a world-leading provider of scientific, technical and medical information products and services, and the Dryad Digital Repository, a leading archive for scientific and medical research data, today announced that they have implemented two-way linking between their respective content.

The Dryad Digital Repository provides facilities for archiving, discovery and accessibility of data files associated with any published article in the sciences or medicine, as well as software scripts and other files important to the article. Dryad is a nonprofit organization committed to its mission of making data publicly available for research and educational reuse. All data files stored in Dryad receive persistent, resolvable Digital Object Identifiers (DOIs) to ensure their proper citation.

Scientific and medical research datasets stored by Dryad for research articles published in 28 Elsevier journals can now be immediately accessed from the online articles on ScienceDirect and vice versa. This allows readers to easily find the background information they need in order to develop a deeper understanding of the article, and also helps to place the article in a larger context.

Dr. Todd Vision, Associate Director for Informatics at the National Evolutionary Synthesis Center and Principal Investigator on the primary NSF grant funding Dryad since 2008, said, “We are delighted to work with Elsevier in cementing the union between scientific articles and research data. Molecular Phylogenetics and Evolution was one of the first journals that joined the Dryad consortium and we would like to applaud them for recognizing that the integrity, rigor, and long-term impact of the science published by the journal is strengthened by archiving the associated data at the time of publication. We also believe that authors themselves will ultimately benefit, in the form of increased citations and other forms of professional credit, for making their data available for others to reuse.”

Dr. Derek Wildman, Editor in Chief of Molecular Phylogenetics and Evolution, a journal which has successfully incorporated and uses the reciprocal linking option between Dyrad and ScienceDirect, said, “DNA sequence data serves as the basis for the majority of the studies we publish. Dryad has done an excellent job in establishing a public archive for all types of data used in evolutionary biology. By making these data sets public and allowing for direct linking between a published research papers, scientists can more efficiently build on the work of their predecessors, strengthening scientific research enterprise. We see the incorporation of the two-way linking as a win for all parties.”

The first 28 journals hosted on ScienceDirect to feature the reciprocal linking option, displayed in the right hand sidebar of the online article page view, are:

  • Animal Behaviour
  • Applied Soil Ecology
  • Behavioural Processes
  • Biological Conservation
  • Comparative Biochemistry and Physiology. Part D, Genomics and Proteomics
  • Ecological Indicators
  • Environmental Pollution
  • Fisheries Research
  • Fungal Genetics and Biology
  • Gene
  • Hormones and Behavior
  • Infection, Genetics and Evolution
  • International Journal for Parasitology
  • Journal of Biomedical Informatics
  • Journal of Human Evolution
  • Journal of Informetrics
  • Marine Genomics
  • Molecular and Biochemical Parasitology
  • Molecular Phylogenetics and Evolution
  • Palaeogeography, Palaeoclimatology, Palaeoecology
  • Protist
  • Quaternary Science Reviews
  • Science of the Total Environment
  • Soil Biology and Biochemistry
  • Theoretical Population Biology
  • Toxicon
  • Trends in Ecology and Evolution
  • Virus Research

This type of linking between articles and data is one of the pillars of Article of the Future, Elsevier’s on-going program to improve the format of the scientific article. Elsevier collaborates with more than thirty data repositories, and is continually looking to collaborate with other relevant organizations.

View article examples on ScienceDirect:

R. Alexander Pyron, John J. Wiens, A large-scale phylogeny of Amphibia including over 2800 species, and a revised classification of extant frogs, salamanders, and caecilians, Molecular Phylogenetics and Evolution, Volume 61, Issue 2, November 2011, pp. 543-583, http://dx.doi.org/10.1016/j.ympev.2011.06.012.

Peter J. Unmack, Gerald R. Allen, Jerald B. Johnson, Phylogeny and biogeography of rainbowfishes (Melanotaeniidae) from Australia and New Guinea, Molecular Phylogenetics and Evolution, Volume 67, Issue 1, April 2013, pp. 15-27, http://dx.doi.org/10.1016/j.ympev.2012.12.019.

James Starrett, Marshal Hedin, Nadia Ayoub, Cheryl Y. Hayashi, Hemocyanin gene family evolution in spiders (Araneae), with implications for phylogenetic relationships and divergence times in the infraorder Mygalomorphae, Gene, Volume 524, Issue 2, July 2013, pp. 175-186, http://dx.doi.org/10.1016/j.gene.2013.04.037.

Mercy Y. Akinyi, Jenny Tung, Maamun Jeneby, Nilesh B. Patel, Jeanne Altmann, Susan C. Alberts, Role of grooming in reducing tick load in wild baboons (Papio cynocephalus), Animal Behaviour, Volume 85, Issue 3, March 2013, pp. 559-568, http://dx.doi.org/10.1016/j.anbehav.2012.12.012

# # #

About Dryad

The Dryad Digital Repository is a curated resource that makes the data underlying scientific and medical publications discoverable, freely reusable, and citable. By providing a general-purpose home for a wide diversity of data types, Dryad benefits individual researchers, educators and students as well as a diversity of stakeholder organizations. Dryad is a member-based nonprofit organization incorporated in North Carolina, USA with users around the world.

About Elsevier

Elsevier is a world-leading provider of scientific, technical and medical information products and services. The company works in partnership with the global science and health communities to publish more than 2,000 journals, including The Lancet and Cell, and close to 20,000 book titles, including major reference works from Mosby and Saunders. Elsevier’s online solutions include ScienceDirect, Scopus, Reaxys, ClinicalKey and Mosby’s Suite, which enhance the productivity of science and health professionals, and the SciVal suite and MEDai’s Pinpoint Review, which help research and health care institutions deliver better outcomes more cost-effectively.

A global business headquartered in Amsterdam, Elsevier employs 7,000 people worldwide. The company is part of Reed Elsevier Group plc, a world leading provider of professional information solutions. The group employs more than 30,000 people, including more than 15,000 in North America. Reed Elsevier Group plc is owned equally by two parent companies, Reed Elsevier PLC and Reed Elsevier NV. Their shares are traded on the London, Amsterdam and New York Stock Exchanges using the following ticker symbols: London: REL; Amsterdam: REN; New York: RUK and ENL.

Media contact

Dale Seaton
Executive Publisher, Journals
Elsevier
+1 212 633 3862
d.seaton@elsevier.com
 
**Update: In addition to DOIs, the banner widget also works with PubMed IDs

Read Full Post »

seed-1

Dryad is a nonprofit organization fully committed to making scientific and medical research data permanently available to all researchers and educators free-of-charge without barriers to reuse.  For the past four years, we have engaged experts and consulted with our many stakeholders in order to develop a sustainability plan that will ensure Dryad’s content remains free to users indefinitely.  The resulting plan allows Dryad to recoup its operating costs in a way that recovers revenues fairly and in a scalable manner.  The plan includes revenue from submission fees, membership dues, grants and contributions.

A one-time submission fee will offset the actual costs of preserving data in Dryad.  The majority of costs are incurred at the time of submission when curators process new files, and long-term storage costs scale with each submission, so this transparent one-time charge ensures that resources scale with demand.  Dryad offers a variety of pricing plans for journals and other organizations such societies, funders and libraries to purchase discounted submission fees on behalf of their researchers.  For data packages not covered by a pricing plan, the researcher pays upon submission.  Waivers are provided to researchers from developing economies.  See Pricing Plans for a complete list of fees and payment options.  Submission fees will apply to all new submissions starting September 2013.

Membership dues will supplement submission fees, allowing Dryad to maintain its strong ties to the research community through its volunteer Board of Directors, Annual Membership Meetings, and  other outreach activities to researchers, educators and stakeholder organizations.  See Membership Information.

Grants will fund research, development and innovation.

Donations will support all of the above efforts.  In addition, Dryad will occasionally appeal to donors to fund special projects or specific needs, such as preservation of valuable legacy datasets and deposit waivers for researchers from developing economies.

We are grateful for all the input we have received into our sustainability plan, and look forward to your continued support in carrying out our nonprofit mission for many long years to come.

Read Full Post »

seed-2We encourage you to visit the Dryad homepage today and check out our new look.  We’ve made many changes, both large and small, and added lots of new content.

Highlights include:

  • A new Ideas Forum, where you can let us know what features you’d like us to work on next, upvote or comment on ideas submitted by others, and check back to see our responses.
  • New membership and pricing plans, which we will feature in upcoming posts.
  • Updates about our  Annual Membership Meeting and related events from 22-24 May in Oxford, UK.
  • An Integrated Journals page that helps depositors see which journals are coordinating the submission process with Dryad, figure out which stage in the publication process to submit data for your chosen journal, and more.
  • Prominent positioning of Dryad’s Terms of Service, which we view as a two-way compact with our users. We wrote it in plain language and sincerely want it to be read!
  • Improved accessibility to persons with visual disabilities (following the guidelines in Section 508 of the U.S. code)
  • Improved navigation, including an integrated page of Frequently Asked Questions
  • More intuitive search and browse of data packages and a revamped layout for the data package page

There are lots more improvements underway.  Not all of these will be immediately obvious to website visitors, but you can expect to see more changes over the coming months.  Thanks to all who have provided feedback and helped with usability testing, and please let us know what you think!

Read Full Post »

PubMed and GenBank, from the National Center for Biotechnology Information (NCBI), are hugely popular resources for searching and retrieving article abstracts and nucleotide sequence data, respectively.  PubMed indexes the vast majority of the biomedical literature, and deposition of nucleotide sequences in GenBank or one of the other INSDC databases is a near universal requirement for publication in a scientific journal.

Thanks to NCBI’s “LinkOut” feature, it is now easy to find associated data in Dryad from either PubMed or GenBank. For example, this Dryad data package is linked from:ncbi._linkout_tjv2

  • the article’s abstract in PubMed. “LinkOut” is at the bottom of the page;  expand “+” to see the links to Dryad and other resources.
  • nucleotide data associated with the same publication in GenBank. “LinkOut” is in the right hand navigation bar

LinkOut allows the data from an article to be distributed among repositories without compromising its discoverability.

At Dryad, we intend to expand on this feature in a couple of ways. First, we plan to make Dryad content searchable via the PubMed and GenBank identifiers, which because of their wide use will provide a convenient gateway for other biomedical databases to link out to Dryad.  Second, we will be using open web standards to expose relationships between content in Dryad and other repositories, not just NCBI.  For example, keen eyes may have noted the relationship of the Dryad data package in the example above to two records in TreeBASE.

To learn more about how Dryad implements NCBI’s LinkOut feature, please see our wiki.

Read Full Post »

A number of enhancements to the repository have been made in recent months, including these three that were in high demand from users:

  • First, we have modified our submission process to enable the data to be deposited prior to editorial review of the manuscript. Journals that integrate manuscript and data submission at the review stage can now offer their editors and peer reviewers anonymous access to the data in Dryad while the manuscript is in review. This option is currently being used by several of our partner journals, BMJ Open, Molecular Ecology, and Systematic Biology, and is available to any existing or future integrated journal. Note: authors still begin their data deposit process at the journal.
  • Second, when authors submit data associated with previously published articles, they can pull up the article information using the article DOI or its PubMed ID, greatly simplifying the deposition process for legacy data.
  • Third, Dryad now supports versioning of datafiles. Authors can upload new versions of their files to correct or update the original file. Once logged in to their Dryad account, the My Submissions option appears under My Account in the left side-menu. Prior unfinished and completed submissions are listed; selecting an archived submission allows the author to add a new file.  Note that the earlier versions of the file will still be available to users, but the metadata may be modified to reflect the reason for the update. The DOIs will be appended with a number (e.g., “.1”, “.2”) so that each version can be uniquely referenced.  By default, users will be shown the most current version of each datafile.  They will be notified of the existence of any previous/subsequent versions.
  • Access and download statistics have been displayed for content in the repository since late 2010; Dryad now displays the statistics for an article’s data together on one page so you can see at a glance how many times the page has been viewed and how many times each component data file has been downloaded. Check out this example from Evolutionary Applications.

Read Full Post »

Until recently, Mark Hahnel was a PhD student in stem cell biology. Frustrated by seeing how much of his own research output didn’t make it to publications, he endeavored to do something about it by developing a scientific file sharing platform called FigShare. Recently, Mark and FigShare were taken under the wing of Digital Science, a Nature Publishing Group spinoff, and a sleek new FigShare was relaunched in January 2012 with many more features and an ambitious scope.

FigShare allows researchers to publish all of their research outputs in seconds in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. By opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient.

Users do not have to pay for access to the content: public data is made available under the terms of a CC0 waiver and other content under CC-BY.  And FigShare is currently providing unlimited public space and 1GB of private storage space for free.

This is a promising solution for getting negative and otherwise unpublished results out into the world (figures, tables, data, etc.) in a way that is discoverable and citable.  Importantly, much of this content would not be appropriate for Dryad, since it is not associated with (and not documented by) an authoritative publication.

There are clearly some challenges to the FigShare model.  A big one, shared with many other Open Science experiments that disseminate prior to peer review, is ensuring that there is adequate documentation for users to assess fitness for reuse.  Another challenge that Dryad is greatly concerned about is guaranteeing that the content will still be usable, and there will be the means to host it, ten or twenty years down the road.  These are reflections of larger unanswered questions about how the research community can best take advantage of the web for scholarly communication, and how to optimize filtering, curating or preserving such communications. To answer these questions, the world of open data needs many more more innovative projects like FigShare.

Considering FigShare’s relaunch suggests a few strengths of the Dryad model:

  • Dryad works with journals to integrate article and data submission, streamlining the deposit process.
  • Dryad curators review files for technical problems before they are released, and ensure that their metadata enables optimal retrieval.
  • Dryad’s scope is focused on data files associated with published articles in the biosciences (plus software scripts and other files important to the article.)
  • Dryad can make data securely available during peer review, at the request of the journal.
  • Dryad is community-led, with priorities and policies shaped by the members of the Dryad Consortium, including scientific societies, publishers, and other stakeholder organizations.
  • Dryad can be accessed programmatically through a sitemap or OAI-PMH interface.
  • Dryad content is searchable and replicated through the DataONE network, and it handshakes with other repositories to coordinate data submission.

For more about Dryad, browse the repository or see Why Should I Choose Dryad for My Data?

A file sharing platform and a data repository are different animals, to be sure; both have a place in a lively open data ecosystem. We wish success to the Digital Science team, and look forward to both working together, and challenging each other, to better meet the needs of the research community.  To see what other options are out there for different disciplines and types of data, DataCite provides an updated list of list of research data repositories.

Read Full Post »

Our last post celebrated the 1000th data package in Dryad. This week, with the release of two data packages associated with articles in Ecological Monographs, we celebrate another important milestone, our 100th journal.

We believe this validates one of the premises on which Dryad was founded, that a non-specialist data repository can serve as shared infrastructure for a large and diverse set of journals.  As a group, they have little in common, serving authors and readers from many different research communities, nationalities, types of institutional affiliation, etc., and working with many different kinds of data.  Some are owned by societies, some by commercial publishers, some by not-for-profits.  Some are Open Access, many are not.  Some have specialized disciplinary or taxonomic scope (e.g. including journals that publish on birds, herps, insects, mammals, plants, protists, viruses, etc.) while some publish findings from all corners of science (Nature, PNAS, Science).

Interestingly, this set of 100 is roughly five times the number of journals that have integrated manuscript submission with Dryad in order to facilitate authors’ data archiving.  While the integrated journals still account for the majority of new data submissions, we are pleased to continue receiving data volunteered by authors publishing in outlets new to Dryad.

The journals that have integrated their manuscript processing with Dryad to date are mostly, though not exclusively, from the fields of evolutionary biology and ecology:

  • The American Naturalist
  • Biological Journal of the Linnean Society
  • BMJ Open (an important first step in that it is our first integrated biomedical journal)
  • Ecological Monographs
  • Evolution
  • Evolutionary Applications
  • Heredity
  • Journal of Evolutionary Biology
  • Journal of Heredity
  • Molecular Ecology and Molecular Ecology Resources
  • Paleobiology
  • Pensoft Publishers – 8 different journals
  • Systematic Biology

But Dryad’s broadening disciplinary coverage is best illustrated by listing some of the journals with content in the repository that have not, at least not yet, implemented integrated submission:

  • Animal Behaviour
  • Bioinformatics
  • Biotropica
  • Conservation Genetics
  • Environmental Microbiology
  • Evolution and Development
  • Frontiers in Psychology
  • Genome Biology and Evolution
  • Human Genomics
  • Integrative and Comparative Biology
  • Journal of Biogeography
  • Journal of Fish and Wildlife Management
  • The Journal of Parasitology
  • Limnology and Oceanography
  • The Plant Cell
  • PLoS Pathogens
  • Symbiosis
  • Toxicon

And we are particularly pleased by the irony of hosting data from Genesis ;)

If you are an editor, publisher, or just a passionate reader of a journal that currently has content in Dryad (you can find out for yourself here), and you would like to talk about how manuscript submission integration could strengthen the service that Dryad provides to your journal, then please contact us.

Read Full Post »

1E+3

Fig 1. Helen of Troy, detail from an Attic red-figure krater, c. 450–440 BC

It is said that a picture is worth a thousand words and that Helen of Troy (Fig 1) had a face that launched a thousand ships.  Why is the number 1000 significant to those of us at Dryad today?  (Especially since its place in literature is ultimately an accident of our decimal number system [1]).

The reason is that Dryad released its 1000th data package.  The lucky submission is: Hager R, Cheverud JM, Wolf JB (2011) Data from: Genotype dependent responses to levels of sibling competition over maternal resources in mice. doi:10.5061/dryad.8qq3p0d8  [2]. This (arbitrary, but see [3]) milestone has put us in a reflective mood, and so here we take the opportunity to consider what it means.

First, it encourages us that Dryad’s multipronged approach to making data available for reuse (raising awareness of the issues, coordinating data archiving policy across journals, providing a user-friendly submission interface, paying attention to the incentives of researchers) is bearing fruit.  As a result of this strategy, the rate of submissions continues to grow; over 60% of submissions are from the past nine months alone.  Since a picture is worth a thousand words, see Fig 2.

Figure 2. Data packages submitted to Dryad through September 2011

We are mindful will take some time before we can measure the impact of the availability of these data for reuse, but there are encouraging signs from the frequency with which data are being downloaded.  We will discuss those results in a separate post.

What else can we learn from these first 1000 submissions?  One is the importance of making data submission integral to publication. While there are 88 different journals in which the corresponding articles appear, about three quarters of the submissions come from the first nine journals that worked to integrate manuscript and data submission with Dryad [4].  Journal policy matters, and the enthusiasm with which journals implement policy matters.

As far as disciplinary diversity goes, the first 1000 submissions are dominated by journals in evolutionary biology and ecology.  Dryad’s first biomedical journal partner, BMJ Open, was integrated within the past few months, and as a result of many other new journal partnerships being developed, we expect submissions to the repository to represent a much broader array of basic and applied biosciences in the near future.

Interestingly, most of the deposits are relatively small in size. Counting all files in a data package together, almost 80% of data packages are less than one megabyte.  Furthermore, the majority of data packages contain only one data file and the mean is a little less than two and a half.  As one might expect, many of the files are spreadsheets or in tabular text format.  Thus, the files are rich in information but not so difficult to transfer or store.

We are pleasantly surprised to report that most authors, most of the time, see the value in having their data released at the same time as the article is published.  Authors are making their data available immediately upon publication, or earlier, for over 90% of data files.  In nearly all cases where files are put under embargo, authors choose to release them one-year post-publication rather than requesting a longer embargo from the journal.

Thomson Reuters indexes more than half a million abstracts annually in BIOSIS.  A difficult-to-estimate, but undoubtedly substantial, fraction of this literature reports on data that cannot be, or is not, archived in a specialized public data repository.  This helps put Dryad’s 1000 data packages in perspective.   As a discipline, we still have a long way to go to preserve and make available for reuse all the “published” data that has no home.  But every data package that is submitted to Dryad is a little victory for the transparency and robustness of science.

So here’s to the first thousand.  May they have plenty of company in the coming years.

Footnotes:

  1. Things might have turned out very differently judging by the presence early vertebrate fossils with more than five digits (see http://en.wikipedia.org/wiki/Polydactyly_in_early_tetrapods)
  2. To celebrate, we are sending a Dryad-logo coffee mug to Dr. Reinmar Hager, who submitted the 1000th data package.
  3. Random cool fact about the number 1000.  It is “the smallest number that generates three primes in the fastest way possible by concatenation of decremented numbers (1000999, 1000999998997, and 1000999998997996995994993 are prime) … [excluding] the number itself” (see http://primes.utm.edu/curios/page.php/1000.html).
  4. This includes a collection of legacy data packages from the Systematic Biology archives that was submitted en masse to Dryad in mid-2009.

Read Full Post »

Why does Dryad use CC0?

Early in the process of depositing data to the Dryad repository,  authors are asked to consent to the explicit release of their data into the public domain under the terms of a Creative Commons Zero (CC0) waiver. We are frequently asked why Dryad uses CC0 rather than a license such as CC-BY, and it is important for all users to understand the rationale for this, as well as its implications.

Obviously, one of the primary purposes of archiving data in Dryad is to enable its reuse by others.  Having clear and open terms of reuse helps realize that goal.  (Along with having well-organized data, good documentation, persistent file-formats, etc.)

CC0 was crafted specifically to reduce any legal and technical impediments, be they intentional and unintentional, to the reuse of data.   In most cases, CC0 does not actually affect the legal status of the data, since facts in and of themselves are not eligible for copyright in most countries (e.g. see this commentary from Bitlaw regarding U.S. copyright law).  But where they are, CC0 waives copyright and related rights to the extent permitted by law.

Importantly, CC0 does not exempt those who reuse the data from following community norms for scholarly communication.  It does not exempt researchers from reusing the data in a way that is mindful of its limitations.  Nor does it exempt researchers from the obligation of citing the original data authors.  However, like other scientific norms, these expectations are best articulated and enforced by the community itself through processes such as peer review.

In fact, by removing un-enforcable legal barriers, CC0 facilitates the discovery, re-use, and citation of that data.

“Community norms can be a much more effective way of encouraging positive behaviour, such as citation, than applying licenses. A well functioning community supports its members in their application of norms, whereas licences can only be enforced through court action and thus invite people to ignore them when they are confident that this is unlikely.” (Panton Principles FAQ)

Dryad’s policy ultimately follows the recommendations of Science Commons, which discourage researchers from presuming copyright and using licenses that include “attribution” and “share-alike” conditions for scientific data.

Both of these conditions can put legitimate users in awkward positions.  First, specifying how “attribution” must be carried out may put a user at odds with accepted citation practice:

when you federate a query from 50,000 databases (not now, perhaps, but definitely within the 70-year duration of copyright!) will you be liable to a lawsuit if you don’t formally attribute all 50,000 owners?” Science Commons Database Protocol FAQ)

While “share-alike” conditions create their own unnecessary legal tangle:

“ ‘share-alike’ licenses typically impose the condition that some or all derivative products be identically licensed. Such conditions have been known to create significant “license compatibility” problems under existing license schemes that employ them. In the context of data, license compatibility problems will likely create significant barriers for data integration and reuse for both providers and users of data.” (Science Commons Database Protocol FAQ)

Thus,

“… given the potential for significantly negative unintended consequences of using copyright, the size of the public domain, and the power of norms inside science, we believe that copyright licenses and contractual restrictions are simply the wrong tool [for data], even if those licenses and contracts are used with the best of intentions.” (Science Commons Database Protocol FAQ)

Furthermore, Dryad’s use of CC0 to make the terms of reuse explicit has some important advantages:

  • interoperability: Since CC0 is both human and machine-readable, other people and indexing services will automatically be able to determine the terms of use.
  • universality: CC0 is a single mechanism that is both global and universal, covering all data and all countries.  It is also widely recognized.
  • simplicity: there is no need for humans to make, and respond to, individual data requests, and no need for click-through agreements.  This allows more scientists to spend their time doing science.

It is important to note that if you have data that, due to pre-existing agreements, cannot be released under the terms of CC0, please do not deposit that data to Dryad.  Journals that require data archiving in Dryad as a condition of publication can make exceptions for such special cases.

Footnote:  Interestingly, the repository had originally applied CC-BY to all its contents.  The very deliberate decision to use CC0 instead, made by Dryad’s Board in May of 2009, required us to obtain permission from all the early contributors to change the terms of reuse of their content.   And today, there are still a few items in Dryad under CC-BY for which permission was not granted.

Read Full Post »

Behind a scientific finding, in addition to unique data, there is often unique software. If Dryad archives data in part to allow others to validate the findings reported in the literature, then should we not also enable researchers to archive the software that was used to process, analyze and, in the case of simulations — create those data?

Some users have already deposited software source code alongside their data (e.g. doi:10.5061/dryad.8384, doi:10.5061/dryad.18) [1]. If users are willing and able to release their code under a CC-Zero waiver [2], then there is nothing stopping this practice. In fact, Creative Commons and the Free Software Foundation have recently stated that CC-Zero is appropriate for release of software to the public domain [3].

Yet, a number of journal partners and users have requested that Dryad provide more, or different, options for software, and that authors should not be required to waive legal rights with CC-Zero. Since software is clearly a creative work, source code unambiguously carries copyrightable intellectual property. Enabling a greater range of licensing options could open the door to more authors archiving software that is integral to their paper, and this would further Dryad’s mission of enabling scientists to validate and build upon previously work. So, how should we do that?

One important consideration is that we aim to make the submission process as easy as possible for users. This would be compromised by presenting a confusing array of licensing options, and having those differ between types of files.

The principle desiderata of a license for deposited software are more or less the same as for data: freedom to reuse, modify (analogous to the “recombine” for data), and redistribute (in original or modified form), with no more than attribution expected or required. It turns out that these are also the principles common to all licenses approved by the Open Source Initiative, or OSI [4].

So, could we just pick one of the minimally restrictive OSI-approved licenses (since we want to facilitate reuse rather than hamper it), and require release of software under those terms? We are currently of the opinion that the answer is “no”, for a couple of reasons:

(1) Some, though not all, software will already be licensed. Asking a user to choose a different one would clearly be a burden, since changing a license requires express consent from all copyright holders, including possibly the employer or funder.

(2) If the software includes third-party code to which a ‘share-alike’ license has been assigned (e.g. the GNU Public License, or GPL [5]) , then the user is required to release the code under equivalent licensing terms. Unlike for data, it would be highly unusual to combine software source code from many different sources, and so this does not pose an insurmountable barrier to archiving and reuse for scientific purposes.

Given the above, our current thinking is that Dryad should enable users to select any OSI-approved license they deem appropriate. However, we also wish to strongly guide users, when there is no prior license assigned to any part of their software, to choose either a non-share alike OSI license or a CC-Zero waiver. It is currently unclear whether dedicating software to the public domain with CC-Zero would be of as much value as it is for data [6]. We’d welcome your thoughts on that.

There are some other considerations on our plate, as well:

  • We want to be careful to avoid steering users away from using a public source code repository when that is more appropriate [7]. Is it better for Dryad to host code snapshots, or to direct users to specific versions of software in a public code repository?
  • Some users bundle software and data together in tarballs or zip archives. Since we cannot easily assign different terms to the data and software within such a combined file, it could increase the burden on users to separate these components out.
  • In addition to software, there is other content that publishers host in Supplemental Materials that some of our partner journals would like Dryad to host, instead. To the extent that some of this content is neither data nor software, should we be recognizing a third category of intellectual property, to which a license such as CC-BY [8] would be assigned?

If you have opinions or ideas, we would like to encourage you to share them with us as public comments on this blog. What’s the best way to accommodate software (and other non-data material) within Dryad?

Notes

[1] Some software source code in Dryad is already available under grandfathered license terms, such as in doi:10.5061/dryad.18.

[2] Dryad currently requires users to assign CC-Zero to all archived files. This waives all copyright and related rights in the data (to the extent legally possible in an author’s jurisdiction), effectively dedicating the data to the public domain. The use of CC-Zero is predicated on most data being “facts”, and facts in most jurisdictions cannot be copyrighted, although this not universally true (e.g. photographs). Note that Dryad has a policy that the original article and the data package are to be cited when the data are reused, but we feel that this is most appropriately enforced through scholarly practice, not through a license.

[3] According to Creative Common’s FAQ, CC-Zero “is suitable for dedicating your copyright and related rights in computer software to the public domain, to the fullest extent possible under law. Unlike CC licenses, which should not be used for software, CC0 is compatible with many software licenses, including the GPL“.

[4] http://www.opensource.org/

[5] http://www.gnu.org/licenses/gpl.html

[6] For the motivation behind the recommended use of CC-Zero for data, see the Science Commons Protocol for Implementing Open Access Data

[7] Public open source code repositories include generic ones, such as Sourceforge, as well as those specific to particular types of code, such as R-forge for R, and CPAN for Perl. For more about best practices in scientific software development, see Baxter SM, Day SW, Fetrow JS, Reisinger SJ (2006) Scientific Software Development Is Not an Oxymoron. PLoS Comput Biol 2(9): e87. doi:10.1371/journal.pcbi.0020087

[8] http://creativecommons.org/licenses/by/3.0

[9] Many thanks to H. Lapp for starting this post. I (T. Vision) take responsibility for the opinions expressed here, as well as any sins of omission or commission.

Read Full Post »

Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 6,787 other followers