Feeds:
Posts
Comments

Archive for the ‘Discoverability’ Category

We present a guest post from researcher Falk Lüsebrink highlighting the benefits of data sharing. Falk is currently working on his PhD in the Department of Biomedical Magnetic Resonance at the Otto-von-Guericke University in Magdeburg, Germany. Here, he talks about his experience of sharing early MRI data and the unexpected impact that it is having on the research community.

Early release of data

The first time I faced a decision about publishing my own data was while writing a grant proposal. One of our proposed objectives was to acquire ultrahigh resolution brain images in vivo, making use of an innovative development: a combination of an MR scanner with ultrahigh field strength and a motion correction setup to remediate subject motion during data acquisition. While waiting for the funding decision, I simply could not resist acquiring a first dataset. We scanned a highly experienced subject for several hours, allowing us to acquire in vivo images of the brain with a resolution far beyond anything achieved thus far.

 MRI data showing the cerebellum in vivo

MRI data showing the cerebellum in vivo at (a) neuroscientific standard resolution of 1 mm, (b) our highest achieved resolution of 250 µm, and (c) state-of-the-art 500 µm resolution.

When our colleagues saw the initial results, they encouraged us to share the data as soon as possible. Through Scientific Data and Dryad, we were able to do just that. The combination of a peer-reviewed open access journal and an open access digital repository for the data was perfect for presenting our initial results.

17,000 downloads and more

‘Sharing the wealth’ seems to have been the right decision; in the three months since we published our data, there has been an enormous amount of activity:

A distinct need for data re-use

MRI studies are highly interdisciplinary, opening up numerous opportunities for sharing and re-using data. For example, our data might be used to build MR brain atlases and illustrate brain structures in much greater detail, or even for the first time. This could advance our understanding of brain functions. Algorithms used to quantify brain structures needed in the research of neurodegenerative disorders could be enhanced, increasing accuracy and reproducibility. Furthermore, by making available raw signals measured by the MR scanner, image reconstruction methods could be used to refine image quality or reduce the time it takes to collect the data.

There are also opportunities beyond those that our particular dataset offers. A recent emerging trend in MRI comes from the field of machine learning. Neuronal networks are being built to perform and potentially improve all kinds of tasks, from image reconstruction, to image processing, and even diagnostics. To train such networks, huge amounts of data are necessary; these data could come from repositories open to the public. Such re-use of MRI data by researchers in other disciplines is having a strong impact on the advancement of science. By publicly sharing our data, we are allowing others to pursue new and exciting directions.

Download the data for yourself and see what you can do with it. In the meantime, I am still eagerly awaiting the acceptance of the grant application . . . but that’s a different story.

The data: http://dx.doi.org/10.5061/dryad.38s74

The article: http://dx.doi.org/10.1038/sdata.2017.32

— Falk Lüsebrink

Read Full Post »

Did you ever wonder what goes on behind the scenes when Dryad curators review data files submitted by authors?  There are no wizards behind our curtains, just real live information specialists and trained data curators.

by Kaptain Kobold via Flickr

by Kaptain Kobold via Flickr

Dryad’s curation process is intentionally lightweight, so it doesn’t delay the availability of the data. Curators don’t review the scientific merit of the files – that is left to peer reviewers and the scientific community. Instead, we rely on our curators’ expertise in library and information science to ensure the integrity and preservation of the data.

Curators perform basic checks on each submission (can the files be opened? are they free of copyright restrictions? do they appear to be free of sensitive data?). The completeness and correctness of the metadata is checked and the DOI is officially registered. During their work, Dryad curators encounter thousands of data files in any number of file formats. Our team examines all of these data files to ensure they do, in fact, include data, and not manuscripts, or pictures of kittens.

Curators may communicate directly with submitters to address issues and/or to make suggestions about enhancing the description and reusability of the data package. They can also create new versions of data packages should corrections or additions be needed after archiving. Ultimately, the responsibility for the content of the files rests with the submitters, but Dryad’s curators can help to catch and fix many common problems – and some rare ones, too.

fileTypes_wordleSince Dryad’s inception, curation operations have been led by the Metadata Research Center (or MRC) directed by Dr. Jane Greenberg, initially at the University of North Carolina at Chapel Hill, and now at Drexel University. The team is supervised by Senior Curator Erin Clary, and currently includes all students in, or graduates of, Library and Information Science (LIS) or Informatics Master’s programs.

So, (wizard) hats off to all our behind-the-curtains data curators, whose vital contributions ensure that the data in the repository is findable and usable. If you have a question about Dryad curation or need advice on preparing your data for archiving, don’t hesitate to email us at curator@datadryad.org.

Read Full Post »

The Data Citation Synthesis Group has released a draft Declaration of Data Citation Principles and invites comment.

This has been a very interesting and positive collaborative process and has involved a number of groups and committed individuals. Encouraging the practice of data citation, it seems to me, is one of the key steps towards giving research data its proper place in the literature.

As the preamble to the draft principles states:

Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice.

In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data citation.

Please do comment on these principles. We hope that with community feedback and support, a finalised set of principles can be widely endorsed and adopted.

Discussion on a variety of lists is welcome, of course. However, if you want the Synthesis Group to take full account of your views, please be sure to post your comments on the discussion forum.

Some notes and observations on the background to these principles

I would like to add here some notes and observations on the genesis of these principles. As has been widely observed there have been a number of groups and interested parties involved in exploring the principles of data citation for a number of years. Mentioning only some of the sources and events that affected my own thinking on the matter, there was the 2007 Micah Altman and Gary King article, in DLib, which offered ‘A Proposed Standard for the Scholarly Citation of Quantitative Data’ and Toby Green’s OECD White Paper ‘We need publishing standards for datasets and data tables’ in 2009. Micah Altman and Mercè Crosas organised a workshop at Harvard in May 2011 on Data Citation Principles. Later the same year, the UK Digital Curation Centre published a guide to citing data in 2011.

The CODATA-ICSTI Task Group on Data Citation Standards and Practices (co-chaired by Christine Borgman, Jan Brase and Sara Callaghan) has been in existence since 2010. In collaboration with the US National CODATA Committee and the Board on Research Data and Information, a major workshop was organised in August 2011, which was reported in ‘For Attribution: Developing Data Attribution and Citation Practices and Standards’.

The CODATA-ICSTI Task Group then started work on a report covering data citation principles, eventually entitled ‘Out of Cite, Out of Mind’ – drafts were circulated for comment in April 2013 and the final report was released in September 2013.

Following the first ‘Beyond the PDF’ meeting in Jan 2011 participants produced the Force11 Manifesto ‘Improving Future Research Communication and e-Scholarship’ which places considerable weight on the availability of research data and the citation of those data in the literature. At ‘Beyond the PDF II’ in Amsterdam, March 2013, a group comprising Mercè Crosas, Todd Carpenter, David Shotton and Christine Borgman produced ‘The Amsterdam Manifesto on Data Citation Principles’. In the very same week, in Gothenburg, an RDA Birds of a Feather group was discussing the more specific problem of how to support, technologically, the reliable and efficient citation of dynamically changing or growing datasets and subsets thereof. And the broader issues of the place of data and research publication were being considered in the ICSU World Data Service Working Group on Data Publication. This group has, in turn, formed the basis for an RDA Interest Group.  Oooffff!

How great a thing is collaboration?

From June 2013, as the Force11 Group was preparing its website and activities to take forward the work on the Amsterdam Manifesto, calls came in from a number of sources for these various groups and initiatives to coordinate and collaborate. This was admirably well-received and from July the ‘Data Citation Synthesis Group’ had come into being with an agreed mission statement:

The data citation synthesis group is a cross-team committee leveraging the perspectives from the various existing initiatives working on data citation to produce a consolidated set of data citation principles (based on the Amsterdam Manifesto, the CODATA and other sets of principles provided by others) in order to encourage broad adoption of a consistent policy for data citation across disciplines and venues. The synthesis group will review existing efforts and make a set of recommendations that will be put up for endorsement by the organizations represented by this synthesis group.

The synthesis group will produce a set of principles, illustrated with working examples, and a plan for dissemination and distribution. This group will not be producing detailed specifications for implementation, nor focus on technologies or tools.

As has been noted elsewhere , the group comprised 40 individuals and brought together a large number of organisations and initiatives. What followed over the summer was a set of weekly calls to discuss and align the principles. I must say, I thought these were admirably organised and benefitted considerably from participants’ efforts to prepare documents comparing the various groups’ statements. The face-to-face meeting of the group, in which a lot of detailed discussion to finalise the draft was undertaken, was hosted (with a funding contribution from CODATA) at the US National Academies of Science between the 2nd RDA Plenary and the DataCite Summer Meeting (which CODATA also co-sponsored). It has been intellectually stimulating and a real pleasure to contribute to these discussions and to witness so many informed and engaged people bashing out these issues.

The principles developed by the Synthesis Group are now open for comment and I urge as many people, researchers, editors and publishers as possible who believe that data has a place in scholarly communications to comment on them and, in due course, to endorse them and put them into practice.

Are we finally at the cusp of real change in practice? Will we now start seeing the practice of citing data sources become more and more widespread? It’s soon to say for sure, but I hope these principles, and the work on which they build, have got us to a stage where we can start really believing the change is well underway.

Simon Hodson is Executive Director of CODATA and a member of the Dryad Board of Directors.  This post was originally published on the CODATA blog.

Read Full Post »