Feeds:
Posts
Comments

Posts Tagged ‘data repositories’

Until recently, Mark Hahnel was a PhD student in stem cell biology. Frustrated by seeing how much of his own research output didn’t make it to publications, he endeavored to do something about it by developing a scientific file sharing platform called FigShare. Recently, Mark and FigShare were taken under the wing of Digital Science, a Nature Publishing Group spinoff, and a sleek new FigShare was relaunched in January 2012 with many more features and an ambitious scope.

FigShare allows researchers to publish all of their research outputs in seconds in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. By opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient.

Users do not have to pay for access to the content: public data is made available under the terms of a CC0 waiver and other content under CC-BY.  And FigShare is currently providing unlimited public space and 1GB of private storage space for free.

This is a promising solution for getting negative and otherwise unpublished results out into the world (figures, tables, data, etc.) in a way that is discoverable and citable.  Importantly, much of this content would not be appropriate for Dryad, since it is not associated with (and not documented by) an authoritative publication.

There are clearly some challenges to the FigShare model.  A big one, shared with many other Open Science experiments that disseminate prior to peer review, is ensuring that there is adequate documentation for users to assess fitness for reuse.  Another challenge that Dryad is greatly concerned about is guaranteeing that the content will still be usable, and there will be the means to host it, ten or twenty years down the road.  These are reflections of larger unanswered questions about how the research community can best take advantage of the web for scholarly communication, and how to optimize filtering, curating or preserving such communications. To answer these questions, the world of open data needs many more more innovative projects like FigShare.

Considering FigShare’s relaunch suggests a few strengths of the Dryad model:

  • Dryad works with journals to integrate article and data submission, streamlining the deposit process.
  • Dryad curators review files for technical problems before they are released, and ensure that their metadata enables optimal retrieval.
  • Dryad’s scope is focused on data files associated with published articles in the biosciences (plus software scripts and other files important to the article.)
  • Dryad can make data securely available during peer review, at the request of the journal.
  • Dryad is community-led, with priorities and policies shaped by the members of the Dryad Consortium, including scientific societies, publishers, and other stakeholder organizations.
  • Dryad can be accessed programmatically through a sitemap or OAI-PMH interface.
  • Dryad content is searchable and replicated through the DataONE network, and it handshakes with other repositories to coordinate data submission.

For more about Dryad, browse the repository or see Why Should I Choose Dryad for My Data?

A file sharing platform and a data repository are different animals, to be sure; both have a place in a lively open data ecosystem. We wish success to the Digital Science team, and look forward to both working together, and challenging each other, to better meet the needs of the research community.  To see what other options are out there for different disciplines and types of data, DataCite provides an updated list of list of research data repositories.

Read Full Post »

Several journals in the field of proteomics have decided to mandate data sharing at the time of publication. These journals are leading the way toward data sharing out of a conviction that “the provenance of data sets and their proper citation is central to the research process,” as described in a recent commentary in Bio-IT World Share the Data: Making Large-Scale Proteomics Data Widely Available.

Mass Spectrometer, photo from U.S. Department of Energy Genome Programs

Now “authors who publish a manuscript containing mass spectrometry data in Molecular and Cellular Proteomics (MCP) must submit the raw data to a publicly accessible site.”   The journal Proteomics also requires data deposit in a public archive.

There are several specialized data repositories in the field, and several are working together as ProteomExchange “to provide a single point of submission to proteomics repositories, and encourage the data exchange and sharing of identifiers between the repositories so that the community may easily find datasets in the participating repositories.”

Read Full Post »

Nature journals now list Dryad among their suggested data repositories. Citing “an inherent principle of publication is that others should be able to replicate and build upon the authors’ published claims,” the editorial policies mandate data sharing and archiving.

The policy on data sets reads:

A condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to others without preconditions.

Data sets must be made freely available to readers from the date of publication, and must be provided to editors and peer-reviewers at submission, for the purposes of evaluating the manuscript.

For the following types of data set, submission to a community-endorsed, public repository is mandatory. Accession numbers must be provided in the paper. Examples of appropriate public repositories are listed below.

Read Full Post »

PANGAEA (Publishing Network for Geoscientific & Environmental Data) is a repository for geoscience data with many features similar to Dryad, including use of DOIs for data files.  A recent press release reports that Elsevier and PANGAEA have implemented reciprocal linking between data in the repository and journal articles.   Research data sets deposited at PANGAEA are now automatically linked to the corresponding articles in Elsevier journals on its electronic platform ScienceDirect and vice versa.   The data are freely available from the publication’s page in ScienceDirect, without a login or subscription.

Try it out:

  1. From this PANGAEA record, follow the DOI to the article in ScienceDirect (citations and abstracts only, unless you or your institution have subscription access)
  2. The PANGAEA link is to the right of the article with Supplementary Data beside it

This valuable two-way connectivity between data and article is most easily achieved when the data are captured at the time of article submission.  See this previous post for more on Dryad’s approach to this problem, which is designed to work across multiple publishers.

Similar to the appearance of the PANGAEA logo in the online version of the article, we are toying with the idea of calling attention to the link in the opposite direction by placing  journal cover images next to article DOIs in the Dryad display.  We’d like to hear your thoughts on that.  Is it helpful signage?  Or distracting eye candy?

Read Full Post »

Follow

Get every new post delivered to your Inbox.

Join 6,212 other followers