Feeds:
Posts
Comments

Posts Tagged ‘data sharing’

heatherMarch2013A study providing new insights into the citation boost from open data has been released in preprint form on PeerJ by Dryad researchers Heather Piwowar and Todd Vision. The researchers looked at thousands of papers reporting new microarray data and thousands of cited instances of data reuse. They found that the citation boost, while more modest than seen in earlier studies (overall, ~9%), was robust to confounding factors, distributed across many archived datasets, continued to grow for at least five years after publication, and was driven to a large extent by actual instances of data reuse. Furthermore, they found that the intensity of dataset reuse has been rising steadily since 2003.

Heather, a post-doc based in Vancouver, may be known to readers of this blog for her earlier work on data sharing, her blog, her role as cofounder of ImpactStory, or her work to promote access to the literature for text mining. Recently Tim Vines, managing editor of Molecular Ecology and a past member of Dryad’s Consortium Board, managed to pull Heather briefly away from her many projects to ask her about her background and latest passions:

TV: Your research focus over the last five years has been on data archiving and science publishing- how did your interest in this field develop?

HP: I wanted to reuse data.  My background is electrical engineering and digital signal processing: I worked for tech companies for 10 years. The most recent was a biotech developing predictive chemotherapy assays. Working there whetted my appetite for doing research, so I went back to school for my PhD to study personalized cancer therapy.

My plan was to use data that had already been collected, because I’d seen first-hand the time and expense that goes into collecting clinical trials data.  Before I began, though, I wanted to know if the stuff in NCBI’s databases was good quality, because highly selective journals like Nature often require data archiving, or was it instead mostly the dregs of research because that was all investigators were willing to part with.  I soon realized that no one knew… and that it was important, and we should find out.  Studying data archiving and reuse became my new PhD topic, and my research passion.

My first paper was rejected from a High Profile journal.  Next I submitted it to PLOS Biology. It was rejected from there too, but they mentioned they were starting this new thing called PLOS ONE.  I read up (it hadn’t published anything yet) and I liked the idea of reviewing only for scientific correctness.

I’ve become more and more of an advocate for all kinds of open science as I’ve run into barriers that prevented me from doing my best research.  The barriers kept surprising me. Really, other fields don’t have a PubMed? Really, there is no way to do text mining across all scientific literature?  Seriously, there is no way to query that citation data by DOI, or export it other than page by page in your webapp, and you won’t sell subscriptions to individuals?  For real, you won’t let me cite a URL?  In this day and age, you don’t value datasets as contributions in tenure decisions?  I’m working for change.

TV: You’ve been involved with a few of the key papers relating data archiving to subsequent citation rate. Could you give us a quick summary of what you’ve found?

HP: Our 2007 PLOS ONE paper was a small analysis related to one specific data type: human cancer gene expression microarray data.  About half of the 85 publications in my sample had made their data publicly available.  The papers with publicly available data received about 70% more citations than similar studies without available data.

I later discovered there had been an earlier study in the field of International Studies — it has the awesome title “Posting your data: will you be scooped or will you be famous?”  There have since been quite a few additional studies of this question, the vast majority finding a citation benefit for data archiving.  Have a look at (and contribute to!) this public Mendeley group initiated by Joss Winn.

There was a significant limitation to these early studies: they didn’t control for several of important confounders of citation rate (number of authors, of example).  Thanks to Angus Whyte at the Digital Curation Centre (DCC) for conversations on this topic.  Todd Vision and I have been working on a larger study of data citation and data reuse to address this, and understand deeper patterns of data reuse.  Our conclusions:

After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported.  We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data.  Other factors that may also contribute to the citation boost are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

TV: Awareness of data archiving and its importance for the progress of science has increased massively over the past five years, but very few organizations have actually introduced mandatory archiving policies. What do you see as the remaining obstacles?

HP: Great question. I don’t know. Someone should do a study!  Several journals have told me it is simply not a high priority for them: it takes time to write and decide on a policy, and they don’t have time.  Perhaps wider awareness of the Joint Data Archiving Policy will help.

Some journals are afraid authors will choose a competitor journal if they impose additional requirements. I’m conducting a study to monitor the attitudes, experiences, and practices of authors in journals that have adopted JDAP policy and similar authors who publish elsewhere.  The study will run for 3 years, so although I have more than 2500 responses there is still another whole year of data collection to go.  Stay tuned :)

Keep an eye on Journal Research Data Policy Bank (JoRD) to stay current on journal policies for data archiving.

Funders, though.  Why aren’t more funders introducing mandatory public data archiving policies (with appropriate exceptions)?  I don’t know.  They should.  Several are taking steps towards it, but golly it is slow.  Is anyone thinking of the opportunity cost of moving this slowly?  More specific thoughts in my National Science Foundation RFI response with coauthor Todd Vision.

TV: You’re a big advocate of ‘open notebook’ science. How did you first get interested in working in this way?

HP: I was a grad student, hungry for information.  I wanted to know if everyone’s science looked like my science.  Was it messy in the same ways?  What processes did they have that I could learn from?  What were they are excited about *now* — findings and ideas that wouldn’t hit journal pages for months or years?

This was the same time that Jean-Claude Bradley was starting to talk about open notebook science in his chemistry lab.  I was part of the blogosphere conversations, and had a fun ISMB 2007 going around to all the publisher booths asking about their policies on publishing results that had previously appeared on blogs and wikis (my blog posts from the time; for a current resource see the list of journal responses maintained by F1000 Posters).

TV: It’s clearly a good way to work for people whose work is mainly analysis of data, but how can the open notebook approach be adapted to researchers who work at the bench or in the field?

HP: Jean-Claude Bradley has shown it can work well very in a chemistry lab.  I haven’t worked in the field, so I don’t want to presume to know what is possible or easy: guessing in many cases it wouldn’t be easy.  That said, more often than not, where there is a will there is a way!

TV: Given the growing concerns over the validity of the results in scientific papers, do you think that external supervision of scientists (i.e. mandated open notebook science) would ever become a reality?

HP: I’m not sure.  Such a policy may well have disadvantages that outweigh its advantages.  It does sound like a good opportunity to do some research, doesn’t it?  A few grant programs could have a precondition that the awardees be randomized to different reporting requirements, then we monitor and see what happens. Granting agencies ought to be doing A LOT MORE EXPERIMENTING to learn the implications of their policies, followed by quick and open dissemination of the results of the experiments, and refinements in policies to reflect this growing evidence-base.

TV: You’re involved in a lot of initiatives at the moment. Which ones are most exciting for you? 

HP: ImpactStory.  The previous generation of tools for discovering the impact of research are simply not good enough.  We need ways to discover citations to datasets, in citation lists and elsewhere.  Ways to find blog posts written about research papers — and whether those blog posts, in turn, inspire conversation and new thinking.  We need ways to find out which research is being bookmarked, read, and thought about even if that background learning doesn’t lead to citations.  Research impact isn’t the one dimensional winners-and-losers situation we have now with our single-minded reliance on citation counts: it is multi-dimensional — research has an impact flavour, not an impact number.

Metrics data locked behind subscription paywalls might have made sense years ago, when gathering citation data required a team of people typing in citation lists.  That isn’t the world we live in any more: keeping our evaluation and discovery metrics locked behind subscription paywalls is simply neither necessary nor acceptable.  Tools need to be open, provide provenance and context, and support a broad range of research products.

We’re realizing this future through ImpactStory: a nonprofit organization dedicated to telling the story of our research impact.  Researchers can build a CV that includes citations and altmetrics for their papers, datasets, software, and slides: embedding altmetrics on a CV is a powerful agent of change for scholars and scholarship.  ImpactStory is co-founded by me and Jason Priem, funded by the Alfred P. Sloan Foundation while we become self-sustaining, and is committed to building a future that is good for scholarship.  Check it out! and contact if you want to learn more: team@impactstory.org

Thanks for the great questions, Tim!

Read Full Post »

If you have data packages in Dryad, consider adding a button like this next to each one on the publication list of your website or your electronic CV.

You can make a link between the button and the individual data package page on Dryad to enrich your publication list and make it easy to find your data.

Props to our early adopters below.  Check out their pages for some examples.

For other ways to show your support, please visit our page of publicity material on the Dryad wiki.  Let us know if you come up with creative ways to promote your data in Dryad. And additional suggestions are always welcome at help@datadryad.org.

Have at it!

Read Full Post »

We are happy to have the opportunity to reproduce here, with permission, the full text of the recent editorial by Trish Groves and Fiona Godlee in BMJ entitled “Open Science and Reproducible Research” [BMJ 2012; 344:e4383], which also announces an expanded partnership between Dryad and BMJ, a leading publisher of biomedical research journals.

New reports call for scientists to share data and publishers to embrace open access

by Trish Groves, deputy editor, BMJ, and Fiona Godlee, editor in chief, BMJ. Published 26 June 2012

“Scientists should communicate the data they collect and the models they create, to allow free and open access, and in ways that are intelligible, assessable and usable for other specialists . . . Where data justify it, scientists should make them available in an appropriate data repository.” [1]

So said the Royal Society last week, in its report Science as an Open Enterprise: Open Data for Open Science. The report calls for more openness among scientists and with the public and media; greater recognition of the value of data gathering, analysis, and communication; common standards for sharing information to make it widely usable; mandatory publishing of data in a reusable form to support findings; more expertise in managing and supporting the use of digital data; and new software tools to analyse data. It is time for a big shift, says the report, from the status quo where “many scientists still pursue their research through the measured and predictable steps in which they communicate their thinking within relatively closed groups of colleagues; publish their findings, usually in peer reviewed journals; file their data and then move on.”

A few days earlier the UK government’s working group on expanding access to published research findings, chaired by Janet Finch, recommended a “clear policy direction to support publication in open access or hybrid journals, funded by article processing charges, as the main vehicle for the publication of research, especially when it is publicly funded.” [2, 3]  The Finch report urges funders to establish more effective and flexible arrangements to meet the costs of publishing in open access and hybrid journals; publishers to minimise restrictions on the rights of use and reuse of text and other content, especially for non-commercial purposes; funds to be found to extend and rationalise licences and subscription arrangements for research generated in the United Kingdom and published in pay walled journals; and repositories to be developed to complement formal publishing. But the report warns that the transition to widespread open access publishing will take time and money, and meanwhile the effects of the transition on subscription based journals (which still provide the bulk of peer review and set standards for high quality publishing) must be carefully considered to minimise damage to the learned societies and publishers that run them.

As Finch explains in a podcast interview with BMJ editor Fiona Godlee, access to published articles and access to data are separate matters, but both can potentially benefit the public. Indeed, major funders—including the Wellcome Trust, US National Institutes of Health, and UK Medical Research Council—have jointly stated their belief that “making research datasets available to investigators beyond the original research team in a timely and responsible manner, subject to appropriate safeguards, will generate three key benefits: faster progress in improving health, better value for money, and higher quality science.” [4]

These funders do not yet, however, mandate data sharing. They should. The ability of doctors to make the right decisions with patients about the benefits, harms, and costs of treatments and tests depends increasingly on high quality learning and guidance, which, in turn, depend on a robust evidence base that is as complete and as transparent as possible. We cannot rely only on results in published research articles and trial registries because they are often incompletely and selectively reported [5].  Moreover, drug regulators often lack access to full data reported in confidence, let alone to publicly accessible data [6].

Data sharing can greatly increase dissemination, meta-analysis, and understanding of research results; it can also aid confirmation or refutation of research through replication [7],  allow better implementation of research findings [8], and increase transparency about the quality and integrity of research. It does bear some technical challenges and risks: these include potential invasion of participants’ privacy and breaking of patients’ confidentiality, inappropriate data manipulation, compromised academic or commercial primacy, and breach of intellectual property rights and journal copyright, but none of these should be insurmountable [9].

So let’s get on with it. Since 2009 the BMJ has asked authors to state at the end of their article whether they will allow their data to be accessed or even reanalysed by others [10]. Many authors have agreed to share their anonymised data. To make it easy for authors to do this, the BMJ is partnering with the Dryad online repository (http://datadryad.org/), something that our sister journal BMJ Open has been doing for some time. Fifteen datasets from BMJ Open articles are already posted, as well as one from the BMJ [11].

Meanwhile, we are stepping up the BMJ’s commitment to open access. After the success of last year’s pilot, we have introduced article processing fees for all published research articles. Fee waivers and discounts are available for authors who are unable to pay, and editors will be unaware of whether a fee has been paid when making their decision on publication.

With these latest high level UK reports, and the growing support of research funders around the world [4], the move towards open access has reached a tipping point. The BMJ was the first major general medical journal to make research articles freely available online and has maintained its commitment to open access ever since. We will continue to debate, test, implement, and promote new ways to support authors in the publication of their work, and to achieve worldwide access to research results and data.

References

  1. Royal Society. Science as an open enterprise: open data for open science. 2012. http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE.pdf
  2. Working Group on Expanding Access to Published Research Findings: the Finch group. 2012. Accessibility, sustainability, excellence: how to expand access to research publications. www.researchinfonet.org/wp-content/uploads/2012/06/Finch-Group-report-FINAL-VERSION.pdf
  3. Hawkes N. Open access to research findings will deliver huge benefits but will not be cost free, report says. BMJ 2012; 344:e4248.
  4. Wellcome Trust. Sharing research data to improve public health: full joint statement by funders of health research. www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-epidemiology/WTDV030690.htm.
  5. Lehman R, Loder E. Missing clinical trial data. BMJ 2012; 344:d8158
  6. Hart B, Lundh A, Bero L. Effect of reporting bias on meta-analyses of drug trials: reanalysis of meta-analyses. BMJ 2012; 344:d7202.
  7. Peng RD, Domenici F, Zeger SL. Reproducible epidemiologic research. Am J Epidemiol 2006; 163:783-9
  8. European Medical Research Councils. Implementation of medical research in clinical practice. Forward look. 2011. www.esf.org/publications.html.
  9. Groves T. BMJ Group online evidence to Royal Society call for evidence on science as an open enterprise 2011. http://royalsociety.org/policy/projects/science-public-enterprise/call-for-evidence/
  10. Groves T. BMJ policy on data sharing. BMJ 2010; 340:c564.
  11. Prayle AP, Hurley MN, Smyth AR. Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: cross sectional study. BMJ 2012; 343:d7373

Cite this as: BMJ 2012;344:e4383

Competing interests: Both authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; both BMJ (where TG is deputy editor and FG is editor in chief) and BMJ Open (where TG is editor in chief) levy article processing fees to support open access to published research, and at both journals data sharing is strongly encouraged; no other relationships or activities that could appear to have influenced the submitted work.

Provenance and peer review: Commissioned; not externally peer reviewed.

[End BMJ editorial]

We invite you to take a look at some of the data packages in Dryad linked to articles published in BMJ journals, and look forward to seeing many more!

Read Full Post »

A number of enhancements to the repository have been made in recent months, including these three that were in high demand from users:

  • First, we have modified our submission process to enable the data to be deposited prior to editorial review of the manuscript. Journals that integrate manuscript and data submission at the review stage can now offer their editors and peer reviewers anonymous access to the data in Dryad while the manuscript is in review. This option is currently being used by several of our partner journals, BMJ Open, Molecular Ecology, and Systematic Biology, and is available to any existing or future integrated journal. Note: authors still begin their data deposit process at the journal.
  • Second, when authors submit data associated with previously published articles, they can pull up the article information using the article DOI or its PubMed ID, greatly simplifying the deposition process for legacy data.
  • Third, Dryad now supports versioning of datafiles. Authors can upload new versions of their files to correct or update the original file. Once logged in to their Dryad account, the My Submissions option appears under My Account in the left side-menu. Prior unfinished and completed submissions are listed; selecting an archived submission allows the author to add a new file.  Note that the earlier versions of the file will still be available to users, but the metadata may be modified to reflect the reason for the update. The DOIs will be appended with a number (e.g., “.1”, “.2”) so that each version can be uniquely referenced.  By default, users will be shown the most current version of each datafile.  They will be notified of the existence of any previous/subsequent versions.
  • Access and download statistics have been displayed for content in the repository since late 2010; Dryad now displays the statistics for an article’s data together on one page so you can see at a glance how many times the page has been viewed and how many times each component data file has been downloaded. Check out this example from Evolutionary Applications.

Read Full Post »

In recent months, more journals have implemented submission integration with Dryad to make data archiving easier for authors.  Technically, the process entails setting up semi-automated communications between Dryad and the manuscript submission system of the journal.  Currently 24 journals have implemented submission integration. Journals that have been added in the past year include:

  • BMJ Open, published by the BMJ Group
  • Ecological Monographs, published by the Ecological Society of America
  • Evolutionary Applications, published by Wiley-Blackwell
  • Heredity, published by the Genetics Society with Nature Publishing Group
  • Journal of Fish and Wildlife Management, published by the US Fish and Wildlife Service
  • Journal of Paleontology and Paleobiology, both published by The Paleontological Society with Allen Press
  • PLoS Biology, published by the Public Library of Science
  • Systematic Biology, published by the Society of Systematic Biologists with Oxford University Press
  • ZooKeys, along with seven other journal titles from Pensoft Publishers.

Thanks to the growing number of integrated journals, growing awareness of Dryad, and the importance of data archiving, the rate at which we are receiving deposits continues to grow steadily.  Dryad currently holds over 1700 data packages, associated with articles in well over 100 different journals.  About three quarters of submissions are from the minority of journals for which submission integration is in place.

Editors and publishers interested in implementing integration may review our documentation and contact Dryad or fill out our Pre-Integration Questionnaire to begin the integration process. There is no charge for implementing integration with Dryad.

Read Full Post »

doctor silencedA recent issue of BMJ highlighted the problem of missing clinical trial data from medical research, exploring both the causes and consequences of unpublished evidence.  One of the articles, from Andrew Prayle and colleagues [1], examined compliance with the US Food and Drug Administration’s ostensibly mandatory requirement that clinical trials report their results in ClinicalTrials.gov, as required by the the FDA Amendments Act (FDAAA) of 2007. Alarmingly, they found that only 22% of trials that should have reported results had actually done so.  Interestingly, industry-funded trials reported results at a higher frequency than other funders.  They conclude:

If the reporting rate does not increase, the laudable FDAAA legislation will not achieve its goal of improving the accessibility of trial results.

Fortunately for those interested in this research, the authors have ensured that their own data are available by depositing them in Dryad, where they have already been downloaded by over 100 users.

For more on the disturbing state of affairs in reporting of clinical trial data, we offer the irrepressible Ben Goldacre speaking at the Strata 2012 conference in February.

[1] Prayle AP, Hurley MN, Smyth AR (2012) Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: cross sectional study. BMJ 343: d7373. doi:10.1136/bmj.d7373

Read Full Post »

Until recently, Mark Hahnel was a PhD student in stem cell biology. Frustrated by seeing how much of his own research output didn’t make it to publications, he endeavored to do something about it by developing a scientific file sharing platform called FigShare. Recently, Mark and FigShare were taken under the wing of Digital Science, a Nature Publishing Group spinoff, and a sleek new FigShare was relaunched in January 2012 with many more features and an ambitious scope.

FigShare allows researchers to publish all of their research outputs in seconds in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. By opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient.

Users do not have to pay for access to the content: public data is made available under the terms of a CC0 waiver and other content under CC-BY.  And FigShare is currently providing unlimited public space and 1GB of private storage space for free.

This is a promising solution for getting negative and otherwise unpublished results out into the world (figures, tables, data, etc.) in a way that is discoverable and citable.  Importantly, much of this content would not be appropriate for Dryad, since it is not associated with (and not documented by) an authoritative publication.

There are clearly some challenges to the FigShare model.  A big one, shared with many other Open Science experiments that disseminate prior to peer review, is ensuring that there is adequate documentation for users to assess fitness for reuse.  Another challenge that Dryad is greatly concerned about is guaranteeing that the content will still be usable, and there will be the means to host it, ten or twenty years down the road.  These are reflections of larger unanswered questions about how the research community can best take advantage of the web for scholarly communication, and how to optimize filtering, curating or preserving such communications. To answer these questions, the world of open data needs many more more innovative projects like FigShare.

Considering FigShare’s relaunch suggests a few strengths of the Dryad model:

  • Dryad works with journals to integrate article and data submission, streamlining the deposit process.
  • Dryad curators review files for technical problems before they are released, and ensure that their metadata enables optimal retrieval.
  • Dryad’s scope is focused on data files associated with published articles in the biosciences (plus software scripts and other files important to the article.)
  • Dryad can make data securely available during peer review, at the request of the journal.
  • Dryad is community-led, with priorities and policies shaped by the members of the Dryad Consortium, including scientific societies, publishers, and other stakeholder organizations.
  • Dryad can be accessed programmatically through a sitemap or OAI-PMH interface.
  • Dryad content is searchable and replicated through the DataONE network, and it handshakes with other repositories to coordinate data submission.

For more about Dryad, browse the repository or see Why Should I Choose Dryad for My Data?

A file sharing platform and a data repository are different animals, to be sure; both have a place in a lively open data ecosystem. We wish success to the Digital Science team, and look forward to both working together, and challenging each other, to better meet the needs of the research community.  To see what other options are out there for different disciplines and types of data, DataCite provides an updated list of list of research data repositories.

Read Full Post »

Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 6,789 other followers