Feeds:
Posts
Comments

Posts Tagged ‘data archiving’

We present a guest post from researcher Falk Lüsebrink highlighting the benefits of data sharing. Falk is currently working on his PhD in the Department of Biomedical Magnetic Resonance at the Otto-von-Guericke University in Magdeburg, Germany. Here, he talks about his experience of sharing early MRI data and the unexpected impact that it is having on the research community.

Early release of data

The first time I faced a decision about publishing my own data was while writing a grant proposal. One of our proposed objectives was to acquire ultrahigh resolution brain images in vivo, making use of an innovative development: a combination of an MR scanner with ultrahigh field strength and a motion correction setup to remediate subject motion during data acquisition. While waiting for the funding decision, I simply could not resist acquiring a first dataset. We scanned a highly experienced subject for several hours, allowing us to acquire in vivo images of the brain with a resolution far beyond anything achieved thus far.

 MRI data showing the cerebellum in vivo

MRI data showing the cerebellum in vivo at (a) neuroscientific standard resolution of 1 mm, (b) our highest achieved resolution of 250 µm, and (c) state-of-the-art 500 µm resolution.

When our colleagues saw the initial results, they encouraged us to share the data as soon as possible. Through Scientific Data and Dryad, we were able to do just that. The combination of a peer-reviewed open access journal and an open access digital repository for the data was perfect for presenting our initial results.

17,000 downloads and more

‘Sharing the wealth’ seems to have been the right decision; in the three months since we published our data, there has been an enormous amount of activity:

A distinct need for data re-use

MRI studies are highly interdisciplinary, opening up numerous opportunities for sharing and re-using data. For example, our data might be used to build MR brain atlases and illustrate brain structures in much greater detail, or even for the first time. This could advance our understanding of brain functions. Algorithms used to quantify brain structures needed in the research of neurodegenerative disorders could be enhanced, increasing accuracy and reproducibility. Furthermore, by making available raw signals measured by the MR scanner, image reconstruction methods could be used to refine image quality or reduce the time it takes to collect the data.

There are also opportunities beyond those that our particular dataset offers. A recent emerging trend in MRI comes from the field of machine learning. Neuronal networks are being built to perform and potentially improve all kinds of tasks, from image reconstruction, to image processing, and even diagnostics. To train such networks, huge amounts of data are necessary; these data could come from repositories open to the public. Such re-use of MRI data by researchers in other disciplines is having a strong impact on the advancement of science. By publicly sharing our data, we are allowing others to pursue new and exciting directions.

Download the data for yourself and see what you can do with it. In the meantime, I am still eagerly awaiting the acceptance of the grant application . . . but that’s a different story.

The data: http://dx.doi.org/10.5061/dryad.38s74

The article: http://dx.doi.org/10.1038/sdata.2017.32

— Falk Lüsebrink

Read Full Post »

We’re beginning a series highlighting researchers who use Dryad to openly publish their research data. We ask them about their current projects, why they believe in open science, and why they choose Dryad.

photo of Zach Gompert

Zach Gompert

For our first researcher profile, we talked with Dr. Zach Gompert, assistant professor in the Department of Biology at Utah State University, about how his work ties in with open science:

Dryad: What is your area of research and what’s your current focus?

Gompert: The overarching goal in my lab is to advance understanding of the extent, organization, causes, and consequences of variation in nature. Some of the issues were are investigating are:

  • What are the evolutionary consequences of hybridization?
  • How does the evolution of novel ecological interactions affect biodiversity?
  • Is temporal variation in natural selection a key determinant of genetic diversity levels in natural populations?

We address these questions through population genomic analyses of natural and experimental populations, and through development of new theory and statistical methods. Our work on Lycaenid butterflies shows that hybridization can be a key creative force in animal evolution and that evolutionary histories are not always well represented by the ‘evolutionary tree’ metaphor. In other words, lineages don’t just split, they come back together.

We have quite a few datasets in Dryad now, including partial genome sequences from over a thousand butterflies.

butterfly in field

Lycaeides melissa

Dryad: What do you think about open science in general? What are advantages of open science? 

Gompert: Science has always been a communal endeavor. Large-scale collaboration is vital now for a number of reasons:

  • Diverse expertise. Many key questions require a diverse group of investigators. This results in big, multifaceted datasets and necessitates rapid sharing of data, methods, and findings.
  • Re-purposing data. It’s common now for data and methods to have applications beyond those that they were originally collected or developed for. Open science allows these to be used by other investigators, accelerating the rate of discovery.
  • Data integrity. Openness ensures a higher level of quality and integrity. When data and methods are available for scrutiny, possible errors are more likely to be identified and corrected. This is particularly relevant for large-scale, multi-investigator projects.
  • Public funding and access. Since much of science is funded by the public, I think scientists have an ethical duty to make the products of research available to everyone.

Dryad: In your opinion, what are disadvantages or concerns about open science?

Gompert: There are two common concerns:

  • Getting scooped. Researchers can be scooped if another group analyzes and publishes the data they generated. While this has some validity, sufficient safeguards and community standards are in place to minimize this problem, and it’s minor compared to the advantages of openness.
  • Poor documentation. I think data archiving is in better shape than it once was, but much of archived data or code are not sufficiently documented to truly be useful to others. Enhancing documentation of data is a big area where we as a community need to do more.

Dryad: You have over 20 datasets archived in Dryad. What do you see as the benefits of data sharing in Dryad?

Gompert: The primary strength of Dryad is its flexibility, specifically the ability to archive diverse types of data (and computer code) in a single location and to link to other more specialized databases such as NCBI. With Dryad, researchers have a central location where they can find all of the data associated with a publication.

Read Full Post »

We are pleased to announce that Elementa is the latest journal to integrate submission of manuscripts with data to Dryad.  Elementa’s integration with Dryad means that all authors will be invited to archive the data supporting the conclusions in their article, and their process of depositing data files has been simplified by the behind-scenes-coordination between the journal and the repository. Authors will be invited to submit data to Dryad when their manuscript is accepted, and will have the option to set a one-year embargo on the availability of their data files.

The journal has a strong data policy, requiring “all major datasets associated with an article to be made freely and widely available.” The journal is also a Dryad member, and will be covering the charges for its authors when Dryad begins assessing Data Publishing Charges (DPC) on September 1.

Elementa: Science of the Anthropocene is a new open access scientific journal publishing original research reporting new knowledge of the Earth’s physical, chemical, and biological systems.

logo The journal is a nonprofit collaborative involving BioOne, Dartmouth, the Georgia Institute of Technology, the University of Colorado, the University of Michigan, and the University of Washington. Elementa is comprised of six inaugural knowledge domains: Atmospheric Science, Earth and Environmental Science, Ecology, Ocean Science, Sustainable Engineering, and Sustainability Transitions.

The journal is now welcoming article submissions, and the first articles will be published in September.

Read Full Post »

We are pleased to announce that Ecology Letters is the latest journal to integrate submission of manuscripts with data to Dryad.  In this process, the journal and repository communicate behind the scenes in order to streamline data submission for authors, and also to ensure that the article contains a permanent link to the data.

EcolLettCover copyEcology Letters is published by The French National Center for Scientific Research (CNRS), a public basic-research organization that defines its mission as producing knowledge and making it available to society. Marcel Holyaok, the journal’s Editor-in-Chief, has been actively involved with Dryad since 2009, serving on the Consortium Board from 2009-2011, and currently on the elected Board of Directors.

There are already a number of articles in Ecology Letters with associated data in Dryad, including the most frequently downloaded data file in Dryad, The Global Wood Density Database, which has been downloaded nearly 6000 times:

Zanne AE, Lopez-Gonzalez G, Coomes DA, Ilic J, Jansen S, Lewis SL, Miller RB, Swenson NG, Wiemann MC, Chave J (2009) Data from: Towards a worldwide wood economics spectrum. Dryad Digital Repository. doi:10.5061/dryad.234

Article:

Chave J, Coomes D, Jansen S, Lewis SL, Swenson NG, Zanne AE (2009) Towards a worldwide wood economics spectrum. Ecology Letters 12: 351-366. doi:10.1111/j.1461-0248.2009.01285.x

Dryad is delighted to welcome Ecology Letters to the growing group of journals that have taken this important step to support and facilitate their authors’ data archiving.

Read Full Post »

We are celebrating the recent publication in Dryad of the first data to accompany a book [1, 2]. Odd Couples: Extraordinary Differences Between the Sexes in the Animal Kingdom, from Princeton University Press, examines the occasionally surprising gender differences in animals, and what it means to be male or female in the animal kingdom. It is intended for both general and scientific readers.

The author, Daphne Fairbairn, a professor of biology at the University of California, Riverside, and Editor-in-Chief of Evolution, a Dryad partner journal, describes the data as:

…a survey of all recorded sexual dimorphisms in all of the animal classes that contain dioecious species (species with separate sexes).  It categorizes the prevalence of dioecy, the types of differences between the sexes (size, shape, color, etc.) and the magnitude of the differences.  I use this survey to construct frequency plots in the book, but there was no room to publish the full survey results.  This is the first time that such a survey has been done and I am hoping that it will prove useful to other biologists who might use the data for hypothesis testing.  I might even get around to this myself!

I think these archived data are one of the most significant contributions of the book to the scientific literature, even though they will not be important for non-specialist readers.

While most data in Dryad accompany journal articles, we are happy to see data archiving catching on with other types of publications such as books, thesis dissertations and conference proceedings.  Please contact us if you are interested in submitting data and have any questions about its suitability for Dryad.

[1] Fairbairn DJ (2013) Data from: Odd couples: extraordinary differences between the sexes in the animal kingdom. Dryad Digital Repository. doi:10.5061/dryad.n48cm

[2] Fairbairn DJ (2013) Odd Couples: Extraordinary Differences Between the Sexes in the Animal Kingdom, Princeton University Press, ISBN:9780691141961.

Read Full Post »

heatherMarch2013A study providing new insights into the citation boost from open data has been released in preprint form on PeerJ by Dryad researchers Heather Piwowar and Todd Vision. The researchers looked at thousands of papers reporting new microarray data and thousands of cited instances of data reuse. They found that the citation boost, while more modest than seen in earlier studies (overall, ~9%), was robust to confounding factors, distributed across many archived datasets, continued to grow for at least five years after publication, and was driven to a large extent by actual instances of data reuse. Furthermore, they found that the intensity of dataset reuse has been rising steadily since 2003.

Heather, a post-doc based in Vancouver, may be known to readers of this blog for her earlier work on data sharing, her blog, her role as cofounder of ImpactStory, or her work to promote access to the literature for text mining. Recently Tim Vines, managing editor of Molecular Ecology and a past member of Dryad’s Consortium Board, managed to pull Heather briefly away from her many projects to ask her about her background and latest passions:

TV: Your research focus over the last five years has been on data archiving and science publishing- how did your interest in this field develop?

HP: I wanted to reuse data.  My background is electrical engineering and digital signal processing: I worked for tech companies for 10 years. The most recent was a biotech developing predictive chemotherapy assays. Working there whetted my appetite for doing research, so I went back to school for my PhD to study personalized cancer therapy.

My plan was to use data that had already been collected, because I’d seen first-hand the time and expense that goes into collecting clinical trials data.  Before I began, though, I wanted to know if the stuff in NCBI’s databases was good quality, because highly selective journals like Nature often require data archiving, or was it instead mostly the dregs of research because that was all investigators were willing to part with.  I soon realized that no one knew… and that it was important, and we should find out.  Studying data archiving and reuse became my new PhD topic, and my research passion.

My first paper was rejected from a High Profile journal.  Next I submitted it to PLOS Biology. It was rejected from there too, but they mentioned they were starting this new thing called PLOS ONE.  I read up (it hadn’t published anything yet) and I liked the idea of reviewing only for scientific correctness.

I’ve become more and more of an advocate for all kinds of open science as I’ve run into barriers that prevented me from doing my best research.  The barriers kept surprising me. Really, other fields don’t have a PubMed? Really, there is no way to do text mining across all scientific literature?  Seriously, there is no way to query that citation data by DOI, or export it other than page by page in your webapp, and you won’t sell subscriptions to individuals?  For real, you won’t let me cite a URL?  In this day and age, you don’t value datasets as contributions in tenure decisions?  I’m working for change.

TV: You’ve been involved with a few of the key papers relating data archiving to subsequent citation rate. Could you give us a quick summary of what you’ve found?

HP: Our 2007 PLOS ONE paper was a small analysis related to one specific data type: human cancer gene expression microarray data.  About half of the 85 publications in my sample had made their data publicly available.  The papers with publicly available data received about 70% more citations than similar studies without available data.

I later discovered there had been an earlier study in the field of International Studies — it has the awesome title “Posting your data: will you be scooped or will you be famous?”  There have since been quite a few additional studies of this question, the vast majority finding a citation benefit for data archiving.  Have a look at (and contribute to!) this public Mendeley group initiated by Joss Winn.

There was a significant limitation to these early studies: they didn’t control for several of important confounders of citation rate (number of authors, of example).  Thanks to Angus Whyte at the Digital Curation Centre (DCC) for conversations on this topic.  Todd Vision and I have been working on a larger study of data citation and data reuse to address this, and understand deeper patterns of data reuse.  Our conclusions:

After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported.  We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data.  Other factors that may also contribute to the citation boost are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

TV: Awareness of data archiving and its importance for the progress of science has increased massively over the past five years, but very few organizations have actually introduced mandatory archiving policies. What do you see as the remaining obstacles?

HP: Great question. I don’t know. Someone should do a study!  Several journals have told me it is simply not a high priority for them: it takes time to write and decide on a policy, and they don’t have time.  Perhaps wider awareness of the Joint Data Archiving Policy will help.

Some journals are afraid authors will choose a competitor journal if they impose additional requirements. I’m conducting a study to monitor the attitudes, experiences, and practices of authors in journals that have adopted JDAP policy and similar authors who publish elsewhere.  The study will run for 3 years, so although I have more than 2500 responses there is still another whole year of data collection to go.  Stay tuned 🙂

Keep an eye on Journal Research Data Policy Bank (JoRD) to stay current on journal policies for data archiving.

Funders, though.  Why aren’t more funders introducing mandatory public data archiving policies (with appropriate exceptions)?  I don’t know.  They should.  Several are taking steps towards it, but golly it is slow.  Is anyone thinking of the opportunity cost of moving this slowly?  More specific thoughts in my National Science Foundation RFI response with coauthor Todd Vision.

TV: You’re a big advocate of ‘open notebook’ science. How did you first get interested in working in this way?

HP: I was a grad student, hungry for information.  I wanted to know if everyone’s science looked like my science.  Was it messy in the same ways?  What processes did they have that I could learn from?  What were they are excited about *now* — findings and ideas that wouldn’t hit journal pages for months or years?

This was the same time that Jean-Claude Bradley was starting to talk about open notebook science in his chemistry lab.  I was part of the blogosphere conversations, and had a fun ISMB 2007 going around to all the publisher booths asking about their policies on publishing results that had previously appeared on blogs and wikis (my blog posts from the time; for a current resource see the list of journal responses maintained by F1000 Posters).

TV: It’s clearly a good way to work for people whose work is mainly analysis of data, but how can the open notebook approach be adapted to researchers who work at the bench or in the field?

HP: Jean-Claude Bradley has shown it can work well very in a chemistry lab.  I haven’t worked in the field, so I don’t want to presume to know what is possible or easy: guessing in many cases it wouldn’t be easy.  That said, more often than not, where there is a will there is a way!

TV: Given the growing concerns over the validity of the results in scientific papers, do you think that external supervision of scientists (i.e. mandated open notebook science) would ever become a reality?

HP: I’m not sure.  Such a policy may well have disadvantages that outweigh its advantages.  It does sound like a good opportunity to do some research, doesn’t it?  A few grant programs could have a precondition that the awardees be randomized to different reporting requirements, then we monitor and see what happens. Granting agencies ought to be doing A LOT MORE EXPERIMENTING to learn the implications of their policies, followed by quick and open dissemination of the results of the experiments, and refinements in policies to reflect this growing evidence-base.

TV: You’re involved in a lot of initiatives at the moment. Which ones are most exciting for you? 

HP: ImpactStory.  The previous generation of tools for discovering the impact of research are simply not good enough.  We need ways to discover citations to datasets, in citation lists and elsewhere.  Ways to find blog posts written about research papers — and whether those blog posts, in turn, inspire conversation and new thinking.  We need ways to find out which research is being bookmarked, read, and thought about even if that background learning doesn’t lead to citations.  Research impact isn’t the one dimensional winners-and-losers situation we have now with our single-minded reliance on citation counts: it is multi-dimensional — research has an impact flavour, not an impact number.

Metrics data locked behind subscription paywalls might have made sense years ago, when gathering citation data required a team of people typing in citation lists.  That isn’t the world we live in any more: keeping our evaluation and discovery metrics locked behind subscription paywalls is simply neither necessary nor acceptable.  Tools need to be open, provide provenance and context, and support a broad range of research products.

We’re realizing this future through ImpactStory: a nonprofit organization dedicated to telling the story of our research impact.  Researchers can build a CV that includes citations and altmetrics for their papers, datasets, software, and slides: embedding altmetrics on a CV is a powerful agent of change for scholars and scholarship.  ImpactStory is co-founded by me and Jason Priem, funded by the Alfred P. Sloan Foundation while we become self-sustaining, and is committed to building a future that is good for scholarship.  Check it out! and contact if you want to learn more: team@impactstory.org

Thanks for the great questions, Tim!

Read Full Post »

We are pleased to announce that Biology Letters is the latest journal to integrate submission of manuscripts with data to Dryad.  In this process, the journal and repository communicate behind the scenes in order to streamline data submission for authors and ensure that the article contains a permanent link to the data.

It is particularly apt because Biology Letters is published by the Royal Society, which invented the idea of sharing knowledge through a scientific journal back in 1665.  Scientific communication has come a long way from those early letters among gentleman natural philosophers to the current conception of Science as an Open Enterprise conducted in the public interest.  Reflecting these changes in science and technology, the Royal Society recently strengthened its policy on the availability of research data:

To allow others to verify and build on the work published in Royal Society journals it is a condition of publication that authors make available the data and research materials supporting the results in the article.

Datasets should be deposited in an appropriate, recognized repository and the associated accession number, link or DOI to the datasets must be included in the methods section of the article. Reference(s) to datasets should also be included in the reference list of the article with DOIs (where available). Where no discipline-specific data repository exists authors should deposit their datasets in a general repository such as Dryad.

There are already a healthy number of articles in Biology Letters with associated data in Dryad, including one of last year’s hit data packages, Monsters are people too.  The first to be published via integrated submission is:

Article:

Jevanandam N, Goh AGR, Corlett RT (2013) Climate warming and the potential extinction of fig wasps, the obligate pollinators of figs. Biology Letters 9(3): 20130041. doi:10.1098/rsbl.2013.0041

Data:

Goh AGR, Corlett RT, Jevanandam N (2013) Data from: Climate warming and the potential extinction of fig wasps, the obligate pollinators of figs. Dryad Digital Repository. doi:10.5061/dryad.hj7h2

Read Full Post »

Older Posts »