Feeds:
Posts
Comments

OSTP homepageOn Friday, the Obama administration made a long-awaited announcement regarding public access to the results of federally funded research in the United States.

There has been considerable attention given to the implications for research publications (a concise analysis here).  Less discussed so far — but just as far reaching — the new policy also has quite a lot to say about research data, a topic on which the White House solicited, and received, an earful of input just over a year ago.

What does the directive actually require?  All federal government agencies with at least $100M in R&D expenditures must develop, in the next six month, policies for digital data arising from non-classified research that address a host of objectives, including:

  • to “maximize access, by the general public and without charge, to digitally formatted scientific data created with federal funds” while recognizing that there are cases in which preservation and access may not be desirable or feasible.
  • to promote greater use of data management plans for both intramural and extramural grants and contracts, including review of such plans and mechanisms for ensuring compliance
  • to allow inclusion of appropriate costs for data management and access in grants
  • to promote the deposit of data in publicly accessible databases
  • to address issues of attribution to scientific data sets
  • to support training in data management and stewardship
  • to “outline options for developing and sustaining repositories for scientific data in digital formats, taking into account the efforts of public and private sector entities”

Interestingly, the directive is silent on the issue of embargo periods for research data, neither explicitly allowing or disallowing them.

In the words of White House Science Advisor John Holdren

…the memorandum requires that agencies start to address the need to improve upon the management and sharing of scientific data produced with Federal funding. Strengthening these policies will promote entrepreneurship and jobs growth in addition to driving scientific progress. Access to pre-existing data sets can accelerate growth by allowing companies to focus resources and efforts on understanding and fully exploiting discoveries instead of repeating basic, pre-competitive work already documented elsewhere.

The breadth of research impacted by this directive is notable.  Based on the White House’s proposed 2013 budget, the covered agencies would spend more then $60 billion on R&D.  A partial list includes:

  • The National Institutes of Health (NIH)
  • The National Science Foundation (NSF)
  • The National Aeronautics and Space Administration (NASA)
  • The Department of Energy (DOE)
  • The Department of Agriculture (USDA)
  • The National Oceanic and Atmospheric Administration (NOAA)
  • The National Institutes for Standards and Technology (NIST)
  • The Department of the Interior (which includes the Geological Survey)
  • The Environmental Protection Agency (EPA)
  • and even the Smithsonian Institution

We applaud OSTP for moving to dramatically improve the availability of research data collected in the public interest with federal funds.

You can read the full memo here: the data policies are covered in Section 4.

Photo by DAVID ILIFF. License: CC-BY-SA 3.0

Mark Your Calendar!

The 2013 Dryad Membership Meeting

St Anne’s College, Oxford, UK

24 May 2013


The Dryad Membership Meeting will cap off a series of separate but related events spotlighting trends in scholarly communication and research data.  Highlights include:

  • A data publishing symposium on May 22 – Featuring new initiatives and current issues in data publishing (open to the public, nominal registration fee may apply).
  • A Joint Dryad-ORCID Symposium on Research Attribution on May 23 - On the changing culture and technology of how credit is assigned and tracked for data, software, and other research outputs (Public).
  • Dryad Membership Meeting on May 24 - Help chart the course for the organization’s future (Dryad Members only).

More details to be announced soon.

The following guest post is from Tim Vines, Managing Editor of Molecular Ecology and Molecular Ecology Resources.  ME and MER have among the most effective data archiving policies of any Dryad partner journal, as measured by the availability of data for reuse [1].  In this post, which may be useful to other journals figuring out how to support data archiving, Tim explains how Molecular Ecology’s approach has been refined over time.

newsman

Ask almost anyone in the research community, and they’ll say that archiving the data associated with a paper at publication is really important. Making sure it actually happens is not quite so simple. One of the main obstacles is that it’s hard to decide which data from a study should be made public, and this is mainly because consistent data archiving standards have not yet been developed.

It’s impossible for anyone to write exhaustive journal policies laying out exactly what each kind of study should archive (I’ve tried), so the challenge is to identify for each paper which data should be made available.

Before I describe how we currently deal with this issue, I should give some history of data archiving at Molecular Ecology. In early 2010 we joined with the five other big evolution journals in adopting the ‘Joint Data Archiving Policy’, which mandates that “authors make all the data required to recreate the results in their paper available on a public archive”. This policy came into force in January 2011, and since all five journals brought it in at the same time it meant that no one journal suffered the effects of bringing in a (potentially) unpopular policy.

To help us see whether authors really had archived all the required datasets, we started requiring that authors include ‘Data Accessibility’ (DA) section in the final version of their manuscript. This DA section lists where each dataset is stored, and normally appears after the references.  For example:

Data Accessibility:

  • DNA sequences: Genbank accessions F234391-F234402
  • Final DNA sequence assembly uploaded as online supplemental material
  • Climate data and MaxEnt input files: Dryad doi:10.5521/dryad.12311
  • Sampling locations, morphological data and microsatellite genotypes: Dryad doi:10.5521/dryad.12311

We began back in 2011 by including a few paragraphs about our data archiving policies in positive decision letters (i.e. ‘accept, minor revisions’ and ‘accept’), which asked for a DA section to be added to the manuscript during their final revisions. I would also add a sticky note to the ScholarOne Manuscripts entry for the paper indicating which datasets I thought should be listed. Most authors added the DA, but generally only included some of the data. I then switched to putting my list into the decision letter itself, just above the policy itself. For example:

“Please don’t forget to add the Data Accessibility section- it looks like this needs a file giving sampling details, morphology and microsatellite genotypes for all adults and offspring. Please also consider providing the input files for your analyses.”

This was much more effective than expecting the authors to work out which data we wanted. However, it still meant that I was combing through the abstract and the methods trying to work out what data had been generated in that manuscript.

We use ScholarOne Manuscripts’ First Look system for handling accepted papers, and we don’t export anything to be typeset until we’re satisfied with the DA section. Being strict about this makes most authors deal with our DA requirements quickly (they don’t want their paper delayed), but a few take longer while we help authors work out what we want.

The downside of this whole approach is that it takes me quite a lot of effort to work out what should appear in the DA section, and would be impossible in a journal where an academic does not see the final version of the paper. A more robust long-term strategy has to involve the researcher community in identifying which data should be archived.

I’ll flesh out the steps below, but simply put our new approach is to ask authors to include a draft Data Accessibility section at initial submission. This draft DA section should list each dataset and say where the authors expect to archive it. As long as the DA section is there (even if it’s empty) we send the paper on to an editor. If it makes it to reviewers, we ask them to check the DA section and point out what datasets are missing.

A paper close to acceptance can thus contain a complete or nearly complete DA section. Furthermore, any deficiencies should have been pointed out in review and corrected in revision. The editorial office now has the much easier task of checking over the final DA section and making sure that all the accession numbers etc. are added before the article is exported to be typeset.

The immediate benefit is that authors are encouraged to think about data archiving while they’re still writing the paper – it’s thus much more an integral part of manuscript preparation than an afterthought. We’ve also found that a growing proportion of papers (currently about 20%) are being submitted with a completed DA section that requires no further action on our part. I expect that this proportion will be more like 80% in two years, as this seems to be how long it takes to effect changes in author or reviewer behavior.

Since the fine grain of the details may be of interest, I’ve broken down the individual steps below:

1) The authors submit their paper with a draft ‘Data Accessibility’ (DA) statement in the manuscript; this lists where the authors plan to archive each of their datasets. We’ve included a required checkbox in the submission phase that states ‘A draft Data Accessibility statement is present in the manuscript’.

2) Research papers submitted without a DA section are held in the editorial office checklist and the authors contacted to request one. In the first few months of using this system we have found that c. 40% of submissions don’t have the statement initially, but after we request it the DA is almost always emailed within 3-4 days. If we don’t hear for five working days we unsubmit the paper; this has happened to about only 5% of papers.

3) If the paper makes it out to review, the reviewers are asked to check whether all the necessary datasets are listed, and if not, request additions in the main body of their review. Specifically, our ‘additional questions’ section of the review tab in S1M now contains the question: “Does the Data Accessibility section list all the datasets needed to recreate the results in the manuscript? If ‘No’, please specify which additional data are needed in your comments to the authors.”  Reviewers can choose ‘yes’, ‘no’ or ‘I didn’t check’; the latter is important because reviewers who haven’t looked at the DA section aren’t forced to arbitrarily click ‘yes’ or ‘no’.

4) The decision letter is sent to the authors with the question from (3) included. Since we’re still in the early days of this system and less than a quarter of our reviewers understand how to evaluate the DA section, I am still checking the data myself and requesting that any missing datasets be included in the revision. This is much easier than before as there is a draft DA section to work with and sometimes some feedback from the reviewers.

5) The editorial office then makes sure that any deficiencies identified by myself or the reviewers are dealt with by the time the paper goes to be typeset; this is normally dealt with at the First Look stage.

I’d be very happy to help anyone that would like to know more about this system or its implementation – please contact me at managing.editor@molecol.com

[1] Vines TH, Andrew RL, Bock DG, Franklin MT, Gilbert KJ, Kane NC, Moore JS, Moyers BT, Renaut S, Rennison DJ, Veen T, Yeaman S. Mandated data archiving greatly improves access to research data. FASEB J. 2013 Jan 8. Epub ahead of print.  Update: Also available from arXiv.

If you have data packages in Dryad, consider adding a button like this next to each one on the publication list of your website or your electronic CV.

You can make a link between the button and the individual data package page on Dryad to enrich your publication list and make it easy to find your data.

Props to our early adopters below.  Check out their pages for some examples.

For other ways to show your support, please visit our page of publicity material on the Dryad wiki.  Let us know if you come up with creative ways to promote your data in Dryad. And additional suggestions are always welcome at help@datadryad.org.

Have at it!

Lee Dirks

We are profoundly saddened by the untimely and tragic death of our dear friend and colleague Lee Dirks, who was killed together with his wife Judy Lew in a road accident in the Peruvian Andes.

Lee had recently been elected to the Board of Directors for Dryad.  He also served on the Board of Visitors for the UNC School of Information Sciences (of which he was a proud alumnus) and was a member of the Board of the SILS Metadata Research Center.  Lee made a named for himself in recent years as Director of Education and Scholarly Communication at Microsoft.

Lee was a visionary information scientist, a warm and generous personality, and a man who loved adventure.  The number of people whose lives he touched in his own short life was staggeringly large.

Lee and his wife are survived by their two young daughters, who were at home in Seattle at the time of the accident.  Our thoughts are with them.  And we will miss Lee greatly.

Our guest post today is from Mohamed Noor of Duke University, president of the American Genetic Association. The AGA is a scholarly society dating back to 1903.  AGA, together with Oxford University Press, publishes the Journal of Heredity, which is a charter member in the Dryad organization and one of the first journals to integrate manuscript and data submission with the repository.  The society just held their annual symposium in Durham, North Carolina, not so far from Dryad’s NESCent headquarters, and has some excellent news to report from the Council meeting.

The American Genetic Association is pleased to announce that it has now fully adopted the Joint Data Archiving Policy (JDAP) for the Journal of Heredity.  The Journal of Heredity had previously required that newly reported nucleotide or amino acid sequences, and structural coordinates, be submitted to appropriate public databases. For other forms of data, the Journal “endorsed the principles of the Joint Data Archiving Policy (JDAP) in encouraging all authors to archive primary datasets in an appropriate public archive, such as Dryad, TreeBASE, or the Knowledge Network for Biocomplexity.”

This voluntary archiving policy was facilitated by the direct link between the Journal of Heredity and Dryad, in effect since February 2010.

To further support data-sharing and data access, in July 2012, the AGA Council voted unanimously to make data archiving a requirement for publication, under the terms specified in the JDAP.

The requirement will take effect by January 1, 2013. The American Genetic Association also recognizes the vast investment of individual researchers in generating and curating large datasets. Consequently, we recommend that this investment be respected in secondary analyses or meta-analyses in a gracious collaborative spirit.

Many other leading journals in ecology and evolutionary biology have adopted policies modeled on JDAP over the past two years, and other journals are invited to consider it as a policy that has attracted wide support among scientists.

Dryad is delighted to join with PLOS today to announce our partnership with PLOS Biologyas described here on the official PLOS Biology blog, Biologue.  As the first Public Library of Science (PLOS) journal to partner with Dryad to integrate manuscript submission, “PLOS Biology can offer authors a seamless tying together of an article with its underlying data; [and] can also provide confidential access for editors and reviewers to data associated with articles under review.”
PLoS Biology - www.plosbiology.org

Here’s how it works: During manuscript evaluation, PLOS Biology invites authors to deposit the underlying data files in Dryad, sending them a link to Dryad which enables a streamlined upload process (no need to enter the article details).  Authors may deposit complex and varied data types in multiple formats, and these files are then accessible to editors and reviewers by anonymous and secure access during the manuscript review process.  Behind the scenes, the journal’s editorial system and the Dryad repository exchange metadata, ensuring that upon publication, the article links to the associated data in Dryad, and permanently connecting the published article with its securely archived, publicly available data.

Dr. Theodora Bloom, Chief Editor, PLOS Biology, mentions that journals “are uniquely well-placed to help researchers ensure that all data underlying a study are made available alongside any published articles.”

We welcome PLOS Biology authors and editors to Dryad, and look forward to extending this partnership to other PLOS journals.

Follow

Get every new post delivered to your Inbox.

Join 6,789 other followers