Posts Tagged ‘open data’

We present a guest post from researcher Falk Lüsebrink highlighting the benefits of data sharing. Falk is currently working on his PhD in the Department of Biomedical Magnetic Resonance at the Otto-von-Guericke University in Magdeburg, Germany. Here, he talks about his experience of sharing early MRI data and the unexpected impact that it is having on the research community.

Early release of data

The first time I faced a decision about publishing my own data was while writing a grant proposal. One of our proposed objectives was to acquire ultrahigh resolution brain images in vivo, making use of an innovative development: a combination of an MR scanner with ultrahigh field strength and a motion correction setup to remediate subject motion during data acquisition. While waiting for the funding decision, I simply could not resist acquiring a first dataset. We scanned a highly experienced subject for several hours, allowing us to acquire in vivo images of the brain with a resolution far beyond anything achieved thus far.

 MRI data showing the cerebellum in vivo

MRI data showing the cerebellum in vivo at (a) neuroscientific standard resolution of 1 mm, (b) our highest achieved resolution of 250 µm, and (c) state-of-the-art 500 µm resolution.

When our colleagues saw the initial results, they encouraged us to share the data as soon as possible. Through Scientific Data and Dryad, we were able to do just that. The combination of a peer-reviewed open access journal and an open access digital repository for the data was perfect for presenting our initial results.

17,000 downloads and more

‘Sharing the wealth’ seems to have been the right decision; in the three months since we published our data, there has been an enormous amount of activity:

A distinct need for data re-use

MRI studies are highly interdisciplinary, opening up numerous opportunities for sharing and re-using data. For example, our data might be used to build MR brain atlases and illustrate brain structures in much greater detail, or even for the first time. This could advance our understanding of brain functions. Algorithms used to quantify brain structures needed in the research of neurodegenerative disorders could be enhanced, increasing accuracy and reproducibility. Furthermore, by making available raw signals measured by the MR scanner, image reconstruction methods could be used to refine image quality or reduce the time it takes to collect the data.

There are also opportunities beyond those that our particular dataset offers. A recent emerging trend in MRI comes from the field of machine learning. Neuronal networks are being built to perform and potentially improve all kinds of tasks, from image reconstruction, to image processing, and even diagnostics. To train such networks, huge amounts of data are necessary; these data could come from repositories open to the public. Such re-use of MRI data by researchers in other disciplines is having a strong impact on the advancement of science. By publicly sharing our data, we are allowing others to pursue new and exciting directions.

Download the data for yourself and see what you can do with it. In the meantime, I am still eagerly awaiting the acceptance of the grant application . . . but that’s a different story.

The data: http://dx.doi.org/10.5061/dryad.38s74

The article: http://dx.doi.org/10.1038/sdata.2017.32

— Falk Lüsebrink

Read Full Post »

Until recently, Mark Hahnel was a PhD student in stem cell biology. Frustrated by seeing how much of his own research output didn’t make it to publications, he endeavored to do something about it by developing a scientific file sharing platform called FigShare. Recently, Mark and FigShare were taken under the wing of Digital Science, a Nature Publishing Group spinoff, and a sleek new FigShare was relaunched in January 2012 with many more features and an ambitious scope.

FigShare allows researchers to publish all of their research outputs in seconds in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. By opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient.

Users do not have to pay for access to the content: public data is made available under the terms of a CC0 waiver and other content under CC-BY.  And FigShare is currently providing unlimited public space and 1GB of private storage space for free.

This is a promising solution for getting negative and otherwise unpublished results out into the world (figures, tables, data, etc.) in a way that is discoverable and citable.  Importantly, much of this content would not be appropriate for Dryad, since it is not associated with (and not documented by) an authoritative publication.

There are clearly some challenges to the FigShare model.  A big one, shared with many other Open Science experiments that disseminate prior to peer review, is ensuring that there is adequate documentation for users to assess fitness for reuse.  Another challenge that Dryad is greatly concerned about is guaranteeing that the content will still be usable, and there will be the means to host it, ten or twenty years down the road.  These are reflections of larger unanswered questions about how the research community can best take advantage of the web for scholarly communication, and how to optimize filtering, curating or preserving such communications. To answer these questions, the world of open data needs many more more innovative projects like FigShare.

Considering FigShare’s relaunch suggests a few strengths of the Dryad model:

  • Dryad works with journals to integrate article and data submission, streamlining the deposit process.
  • Dryad curators review files for technical problems before they are released, and ensure that their metadata enables optimal retrieval.
  • Dryad’s scope is focused on data files associated with published articles in the biosciences (plus software scripts and other files important to the article.)
  • Dryad can make data securely available during peer review, at the request of the journal.
  • Dryad is community-led, with priorities and policies shaped by the members of the Dryad Consortium, including scientific societies, publishers, and other stakeholder organizations.
  • Dryad can be accessed programmatically through a sitemap or OAI-PMH interface.
  • Dryad content is searchable and replicated through the DataONE network, and it handshakes with other repositories to coordinate data submission.

For more about Dryad, browse the repository or see Why Should I Choose Dryad for My Data?

A file sharing platform and a data repository are different animals, to be sure; both have a place in a lively open data ecosystem. We wish success to the Digital Science team, and look forward to both working together, and challenging each other, to better meet the needs of the research community.  To see what other options are out there for different disciplines and types of data, DataCite provides an updated list of list of research data repositories.

Read Full Post »

Early in the process of depositing data to the Dryad repository,  authors are asked to consent to the explicit release of their data into the public domain under the terms of a Creative Commons Zero (CC0) waiver. We are frequently asked why Dryad uses CC0 rather than a license such as CC-BY, and it is important for all users to understand the rationale for this, as well as its implications.

Obviously, one of the primary purposes of archiving data in Dryad is to enable its reuse by others.  Having clear and open terms of reuse helps realize that goal.  (Along with having well-organized data, good documentation, persistent file-formats, etc.)

CC0 was crafted specifically to reduce any legal and technical impediments, be they intentional and unintentional, to the reuse of data.   In most cases, CC0 does not actually affect the legal status of the data, since facts in and of themselves are not eligible for copyright in most countries (e.g. see this commentary from Bitlaw regarding U.S. copyright law).  But where they are, CC0 waives copyright and related rights to the extent permitted by law.

Importantly, CC0 does not exempt those who reuse the data from following community norms for scholarly communication.  It does not exempt researchers from reusing the data in a way that is mindful of its limitations.  Nor does it exempt researchers from the obligation of citing the original data authors.  However, like other scientific norms, these expectations are best articulated and enforced by the community itself through processes such as peer review.

In fact, by removing un-enforcable legal barriers, CC0 facilitates the discovery, re-use, and citation of that data.

“Community norms can be a much more effective way of encouraging positive behaviour, such as citation, than applying licenses. A well functioning community supports its members in their application of norms, whereas licences can only be enforced through court action and thus invite people to ignore them when they are confident that this is unlikely.” (Panton Principles FAQ)

Dryad’s policy ultimately follows the recommendations of Science Commons, which discourage researchers from presuming copyright and using licenses that include “attribution” and “share-alike” conditions for scientific data.

Both of these conditions can put legitimate users in awkward positions.  First, specifying how “attribution” must be carried out may put a user at odds with accepted citation practice:

when you federate a query from 50,000 databases (not now, perhaps, but definitely within the 70-year duration of copyright!) will you be liable to a lawsuit if you don’t formally attribute all 50,000 owners?” Science Commons Database Protocol FAQ)

While “share-alike” conditions create their own unnecessary legal tangle:

“ ‘share-alike’ licenses typically impose the condition that some or all derivative products be identically licensed. Such conditions have been known to create significant “license compatibility” problems under existing license schemes that employ them. In the context of data, license compatibility problems will likely create significant barriers for data integration and reuse for both providers and users of data.” (Science Commons Database Protocol FAQ)


“… given the potential for significantly negative unintended consequences of using copyright, the size of the public domain, and the power of norms inside science, we believe that copyright licenses and contractual restrictions are simply the wrong tool [for data], even if those licenses and contracts are used with the best of intentions.” (Science Commons Database Protocol FAQ)

Furthermore, Dryad’s use of CC0 to make the terms of reuse explicit has some important advantages:

  • interoperability: Since CC0 is both human and machine-readable, other people and indexing services will automatically be able to determine the terms of use.
  • universality: CC0 is a single mechanism that is both global and universal, covering all data and all countries.  It is also widely recognized.
  • simplicity: there is no need for humans to make, and respond to, individual data requests, and no need for click-through agreements.  This allows more scientists to spend their time doing science.

It is important to note that if you have data that, due to pre-existing agreements, cannot be released under the terms of CC0, please do not deposit that data to Dryad.  Journals that require data archiving in Dryad as a condition of publication can make exceptions for such special cases.

Footnote:  Interestingly, the repository had originally applied CC-BY to all its contents.  The very deliberate decision to use CC0 instead, made by Dryad’s Board in May of 2009, required us to obtain permission from all the early contributors to change the terms of reuse of their content.   And today, there are still a few items in Dryad under CC-BY for which permission was not granted.

Read Full Post »

BioMed Central, publisher of over 200 peer-reviewed journals, has issued a draft statement on data sharing and open data, inviting comments from the scientific community.  BMC’s Iain Hrynaszkiewicz consulted with several Dryad team members in the formulation of the statement.  A related editorial in BMC Research Notes names Dryad as an example of a repository where data are assigned a unique identifier and “available in perpetuity with permanence guaranteed.”  BMC Research Notes is seeking to encourage greater data sharing by waiving the publication fee for all articles which use or link to open data that is prepared in line with a community-accepted standard.

The draft statement supports data deposition in repositories assigning permanent identifiers to data, such as the DOI used by Dryad.    BMC endorses the publishers’ role of providing “clear and permanent links to data hosted in repositories” and are working on a list of the available repositories.

Furthermore the statement says  that “a way forward would be to require that from a specific date, any author submitting to a BioMed Central journal agrees to dedicate the data elements of their article and supplementary material to the public domain and apply the CC0 licence.” This proposed policy aligns closely with the Joint Data Archiving Policy (JDAP) already adopted by several Dryad partner journals.

Comments on the statement can be directed to the BMC blog.

Read Full Post »

Panton Principles

“For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.” The just-released Panton Principles propose that “data related to published science should be explicitly placed in the public domain.”

The creators recommend “adopting and acting on the following principles:”

  1. When publishing data make an explicit and robust statement of your wishes.
  2. Use a recognized waiver or license that is appropriate for data.
  3. If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
  4. Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.

These principles were written by Peter Murray-Rust, Cameron Neylon, Rufus Pollock and John Wilbanks at the Panton Arms in Cambridge, UK, and then refined by the Open Knowledge Foundation Working Group on Open Data in Science. There are open data web buttons available, and individuals and organizations can endorse the principles here.

Read Full Post »