Request for input: archiving and licensing software

Behind a scientific finding, in addition to unique data, there is often unique software. If Dryad archives data in part to allow others to validate the findings reported in the literature, then should we not also enable researchers to archive the software that was used to process, analyze and, in the case of simulations — create those data?

Some users have already deposited software source code alongside their data (e.g. doi:10.5061/dryad.8384, doi:10.5061/dryad.18) [1]. If users are willing and able to release their code under a CC-Zero waiver [2], then there is nothing stopping this practice. In fact, Creative Commons and the Free Software Foundation have recently stated that CC-Zero is appropriate for release of software to the public domain [3].

Yet, a number of journal partners and users have requested that Dryad provide more, or different, options for software, and that authors should not be required to waive legal rights with CC-Zero. Since software is clearly a creative work, source code unambiguously carries copyrightable intellectual property. Enabling a greater range of licensing options could open the door to more authors archiving software that is integral to their paper, and this would further Dryad’s mission of enabling scientists to validate and build upon previously work. So, how should we do that?

One important consideration is that we aim to make the submission process as easy as possible for users. This would be compromised by presenting a confusing array of licensing options, and having those differ between types of files.

The principle desiderata of a license for deposited software are more or less the same as for data: freedom to reuse, modify (analogous to the “recombine” for data), and redistribute (in original or modified form), with no more than attribution expected or required. It turns out that these are also the principles common to all licenses approved by the Open Source Initiative, or OSI [4].

So, could we just pick one of the minimally restrictive OSI-approved licenses (since we want to facilitate reuse rather than hamper it), and require release of software under those terms? We are currently of the opinion that the answer is “no”, for a couple of reasons:

(1) Some, though not all, software will already be licensed. Asking a user to choose a different one would clearly be a burden, since changing a license requires express consent from all copyright holders, including possibly the employer or funder.

(2) If the software includes third-party code to which a ‘share-alike’ license has been assigned (e.g. the GNU Public License, or GPL [5]) , then the user is required to release the code under equivalent licensing terms. Unlike for data, it would be highly unusual to combine software source code from many different sources, and so this does not pose an insurmountable barrier to archiving and reuse for scientific purposes.

Given the above, our current thinking is that Dryad should enable users to select any OSI-approved license they deem appropriate. However, we also wish to strongly guide users, when there is no prior license assigned to any part of their software, to choose either a non-share alike OSI license or a CC-Zero waiver. It is currently unclear whether dedicating software to the public domain with CC-Zero would be of as much value as it is for data [6]. We’d welcome your thoughts on that.

There are some other considerations on our plate, as well:

  • We want to be careful to avoid steering users away from using a public source code repository when that is more appropriate [7]. Is it better for Dryad to host code snapshots, or to direct users to specific versions of software in a public code repository?
  • Some users bundle software and data together in tarballs or zip archives. Since we cannot easily assign different terms to the data and software within such a combined file, it could increase the burden on users to separate these components out.
  • In addition to software, there is other content that publishers host in Supplemental Materials that some of our partner journals would like Dryad to host, instead. To the extent that some of this content is neither data nor software, should we be recognizing a third category of intellectual property, to which a license such as CC-BY [8] would be assigned?

If you have opinions or ideas, we would like to encourage you to share them with us as public comments on this blog. What’s the best way to accommodate software (and other non-data material) within Dryad?

Notes

[1] Some software source code in Dryad is already available under grandfathered license terms, such as in doi:10.5061/dryad.18.

[2] Dryad currently requires users to assign CC-Zero to all archived files. This waives all copyright and related rights in the data (to the extent legally possible in an author’s jurisdiction), effectively dedicating the data to the public domain. The use of CC-Zero is predicated on most data being “facts”, and facts in most jurisdictions cannot be copyrighted, although this not universally true (e.g. photographs). Note that Dryad has a policy that the original article and the data package are to be cited when the data are reused, but we feel that this is most appropriately enforced through scholarly practice, not through a license.

[3] According to Creative Common’s FAQ, CC-Zero “is suitable for dedicating your copyright and related rights in computer software to the public domain, to the fullest extent possible under law. Unlike CC licenses, which should not be used for software, CC0 is compatible with many software licenses, including the GPL“.

[4] http://www.opensource.org/

[5] http://www.gnu.org/licenses/gpl.html

[6] For the motivation behind the recommended use of CC-Zero for data, see the Science Commons Protocol for Implementing Open Access Data

[7] Public open source code repositories include generic ones, such as Sourceforge, as well as those specific to particular types of code, such as R-forge for R, and CPAN for Perl. For more about best practices in scientific software development, see Baxter SM, Day SW, Fetrow JS, Reisinger SJ (2006) Scientific Software Development Is Not an Oxymoron. PLoS Comput Biol 2(9): e87. doi:10.1371/journal.pcbi.0020087

[8] http://creativecommons.org/licenses/by/3.0

[9] Many thanks to H. Lapp for starting this post. I (T. Vision) take responsibility for the opinions expressed here, as well as any sins of omission or commission.

9 thoughts on “Request for input: archiving and licensing software

  1. I think it would be best for DRYAD to stick to CC0. Allowing a wider variation of licenses at DRYAD could lead to the kind of confusion that you are striving to avoid, e.g. of the licensing of data and code being mixed up.

    Software in general is probably better hosted in dedicated software repositories, provided they meet some basic long-term preservation criteria. These also typically allow for a wide variety of licenses. The DRYAD entry and/ or the paper describing it could then link there.

    If the authors want a DOI for their code, they can choose between putting it under CC0 at DRYAD and submitting it to dedicated journals.

    As for supplementary materials, I think CC0 would be an appropriate default, but allowing CC-BY is probably necessary as long as journals have not clarified their policies in this regard.

    As a sidenote, I think it would help if you would add a preview option and/ or a list of allowed HTML tags.

  2. Is there some way for Dryad to connect with a site that hosts code, ideally Github? I ask because they have a great system for writing code collaboratively, and lots of people use it that are developing code for software. Would this only work if the code was for open source software, like R, Python, etc. (instead of e.g., SAS)?

    • I was also thinking about linking to software hosting sites. Github is certainly popular and usable, but it is not the only choice. I think the best solution is for Dryad to host a snapshot of the code in archive form, as is typical for software source code distribution (as opposed to developer access), along with links to the developer access. This avoids people having to use a particular version control system or a particular site. If I were to use git for my projects, I would use gitorious or Savannah.

  3. No answers here but more questions [I have certainly not read all documents, and therefore the below text may contain misconceptions]:

    CC0 is rather restrictive and under this a license I surely would not put any of my code into DRYAD. I would prefer a more liberal system that allows to assign any of the opensource licenses that does not restrict the creator of the software so strongly [Note 3 from the cc0-faq mention that CC0 is compatible with GPL — I fail to see how that could work].

    If I create software and put it under CC0, but keep developing and refining the code, are these extension now under my own copyright or under CC0? How different has derivative work to be to not fall under the license anymore? GPL is rather clear on this, because it “poisons” all derivative work, but allows the original creator the right to assign different licenses, in contrast to CC0 that takes that right away.

    Daniel mentions dedicated software repositories that allow a variety of opensource licenses. That sounds superficially very nice, but I always wondered about the long-term perspective. Will github/sourceforge and the like survive the next 20 years? For example, Sourceforge is owned by geeknet, a publicly traded company, if they go under, then what? Would we be OK of the library of congress is run by a private company?

    In addition, sourceforge/github represent active projects, it would be nice to have a repository, like DRYAD, that can hold a particular snapshot of a software project.

    Peter Beerli

    • Peter, thanks for bringing up the compatibility of CC0 and the GPL. I also am confused. Your code remains under your licensing terms. When you release under a particular license, only that version that carries the license is released under those terms. What this means is that when you release something into public domain, someone can proprietarize *that* version, while you go on developing your code under Apache or GPL. The problem with this is that the proprietarizers can then develop modifications of your code, possibly patent it, and then sue you for making the same modifications in your GPL’d version. BIG BIG problem.

      This raises another point that every file must contain a copyright notice, and replacing copyright notices for the GPL with those for CC0 or another license would be rather laborious.

  4. I’m glad Dryad is paying attention to this critical aspect, as I was hoping to bring it up at Evolution 2011. The GPL is not “viral” contrary to the beliefs of many. A single GPL’d component of a set of software does not necessitate the entire collection being made available under the GPL: the presence of the GPL cannot change the licensing terms of another piece of software. Most GNU/Linux distributions include both the Apache Web Server and the GNU Compiler Collection, and somehow Apache is still under the Apache license. You need to consult GPL experts on this issue, such as Bradley Kuhn of the Software Freedom Conservancy, the Free Software Foundation and the Software Freedom Law Center. I also don’t understand that statement that it’s “unusual to combine software from many different sources.” I do that all the time. It’s quite usual.

    My solution is that authors should absolutely be allowed to keep their own licensing terms upon putting code in Dryad. If they haven’t thought of licensing terms, they should be encouraged to use the GPLv3 or the Apache license (which are compatible with each other).

    • I’ll add that Dryad may be taking on too much work in trying to make things easy for users. That burden is really on the authors. If authors want to make things easy or hard for users to figure out (distinguishing parts once they unpack a tarball, for instance), why should Dryad intercede on behalf of users? Those users can always contact the authors. I don’t think that poorly understandable file names really constitutes “non-transparency.”

  5. I agree that depositors should be required to divide content, and not have this done by Dryad. This would be a huge amount of work for Dryad, but very little for depositors.

  6. Most software I see in the evolution domain has a GPL license — probably due to inertia, but it’s there, nonetheless. Some people might be able to figure out whether GPL code can go under a CC0 license, but most would just say, “that’s not my license, I can’t do it.” So I think allowing code, and just code, to be deposited under an OSI license makes sense.

    As far as relying on software repositories, I think this is risky. What are the long term guarantees that they will stay up? Even if, say, sourceforge as a whole remains up, there’s a chance that individual projects can die. I know both sourceforge and Google Code allow project names to be taken over by new projects [I’ve actually done this, in one case]: they typically require evidence that the other project is dead and has little product, but that’s a decision up to staff at each site. A project that consists of a few scripts done by a grad student who moves on to other things might have important code for our community but might not be seen as critically important by a sourceforge admin.

    I also think it’s important to save the state of the code at the time of publication, rather than pointing to its current version. Software has bugs — it’s critical to be able to see if a paper’s conclusions come from a bug in the software, and this is harder to do if we’re just pointed to the current version (unless you start doing other add ons like requiring storing of commit numbers or other detailed info about the version of the software used in the paper).

    A final point: software is a key ingredient to many studies, but it is ephemeral (for example, Mark Pagel’s classic Continuous program is no longer available for download from his website, though its functions are in another program). It is thus important that software find a home in Dryad.

Comments are closed.