Archive for the ‘Policy’ Category

BostonPanPlain2We’re pleased to announce that our 2014 Community Meeting will be held on May 28 at the Institute for Quantitative Social Science at Harvard University.  This year’s meeting is being held jointly with the Dataverse Network Project, and the theme is Working Together on Data Discovery, Access and Reuse.

Many actors play a role in ensuring that research data is available for future knowledge discovery, including individual researchers, their institutions, publishers and funders. This joint community meeting will highlight existing solutions and emerging issues in the discovery, access and reuse of research data in the social and natural sciences.

BourneSmallKeynote speaker Dr. Phil Bourne is the first and newly appointed Associate Director for Data Science at the National Institutes of Health and a pioneer in furthering the free dissemination of science through new models of publishing. Prior to his NIH appointment, he was a Professor and Associate Vice Chancellor at the University of California San Diego.  He has over 300 papers and 5 books to his credit. Among his diverse contributions, he was the founding Editor-in-Chief of PLOS Computational Biology, has served as Associate Director of the RCSB Protein Data Bank, has launched four companies, most recently SciVee, and is a Past President of the International Society for Computational Biology. He is an elected fellow of the American Association for the Advancement of Science, the International Society for Computational Biology and the American Medical Informatics Association. Other honors he has received include the Benjamin Franklin Award in 2009 and the Jim Gray eScience Award in 2010.

The meeting will run from 9:00 am – 2:15 pm, including a catered lunch.  It will be followed by a Dryad Members Meeting, open to all attendees, from 2:30 – 3:30 pm.

There is no cost for registration, but space is limited. Onsite registration will be made available if space allows, and the proceedings will also be simulcast online.  Please see the meeting page for details.

This year’s Community Meeting has been scheduled for the convenience of those attending the Society for Scholarly Publishing Annual Meeting from May 28-30 in Boston.  SSP attendees may also wish to attend the session “The continuum from publishers to data repositories: models to support seamless scholarship”  May 29th from 10:45am-12:00pm.

For inquiries, please contact Laura Wendell (lwendell@datadryad.org) or Mercè Crosas (mcrosas@iq.harvard.edu).

Read Full Post »

The Data Citation Synthesis Group has released a draft Declaration of Data Citation Principles and invites comment.

This has been a very interesting and positive collaborative process and has involved a number of groups and committed individuals. Encouraging the practice of data citation, it seems to me, is one of the key steps towards giving research data its proper place in the literature.

As the preamble to the draft principles states:

Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice.

In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data citation.

Please do comment on these principles. We hope that with community feedback and support, a finalised set of principles can be widely endorsed and adopted.

Discussion on a variety of lists is welcome, of course. However, if you want the Synthesis Group to take full account of your views, please be sure to post your comments on the discussion forum.

Some notes and observations on the background to these principles

I would like to add here some notes and observations on the genesis of these principles. As has been widely observed there have been a number of groups and interested parties involved in exploring the principles of data citation for a number of years. Mentioning only some of the sources and events that affected my own thinking on the matter, there was the 2007 Micah Altman and Gary King article, in DLib, which offered ‘A Proposed Standard for the Scholarly Citation of Quantitative Data’ and Toby Green’s OECD White Paper ‘We need publishing standards for datasets and data tables’ in 2009. Micah Altman and Mercè Crosas organised a workshop at Harvard in May 2011 on Data Citation Principles. Later the same year, the UK Digital Curation Centre published a guide to citing data in 2011.

The CODATA-ICSTI Task Group on Data Citation Standards and Practices (co-chaired by Christine Borgman, Jan Brase and Sara Callaghan) has been in existence since 2010. In collaboration with the US National CODATA Committee and the Board on Research Data and Information, a major workshop was organised in August 2011, which was reported in ‘For Attribution: Developing Data Attribution and Citation Practices and Standards’.

The CODATA-ICSTI Task Group then started work on a report covering data citation principles, eventually entitled ‘Out of Cite, Out of Mind’ – drafts were circulated for comment in April 2013 and the final report was released in September 2013.

Following the first ‘Beyond the PDF’ meeting in Jan 2011 participants produced the Force11 Manifesto ‘Improving Future Research Communication and e-Scholarship’ which places considerable weight on the availability of research data and the citation of those data in the literature. At ‘Beyond the PDF II’ in Amsterdam, March 2013, a group comprising Mercè Crosas, Todd Carpenter, David Shotton and Christine Borgman produced ‘The Amsterdam Manifesto on Data Citation Principles’. In the very same week, in Gothenburg, an RDA Birds of a Feather group was discussing the more specific problem of how to support, technologically, the reliable and efficient citation of dynamically changing or growing datasets and subsets thereof. And the broader issues of the place of data and research publication were being considered in the ICSU World Data Service Working Group on Data Publication. This group has, in turn, formed the basis for an RDA Interest Group.  Oooffff!

How great a thing is collaboration?

From June 2013, as the Force11 Group was preparing its website and activities to take forward the work on the Amsterdam Manifesto, calls came in from a number of sources for these various groups and initiatives to coordinate and collaborate. This was admirably well-received and from July the ‘Data Citation Synthesis Group’ had come into being with an agreed mission statement:

The data citation synthesis group is a cross-team committee leveraging the perspectives from the various existing initiatives working on data citation to produce a consolidated set of data citation principles (based on the Amsterdam Manifesto, the CODATA and other sets of principles provided by others) in order to encourage broad adoption of a consistent policy for data citation across disciplines and venues. The synthesis group will review existing efforts and make a set of recommendations that will be put up for endorsement by the organizations represented by this synthesis group.

The synthesis group will produce a set of principles, illustrated with working examples, and a plan for dissemination and distribution. This group will not be producing detailed specifications for implementation, nor focus on technologies or tools.

As has been noted elsewhere , the group comprised 40 individuals and brought together a large number of organisations and initiatives. What followed over the summer was a set of weekly calls to discuss and align the principles. I must say, I thought these were admirably organised and benefitted considerably from participants’ efforts to prepare documents comparing the various groups’ statements. The face-to-face meeting of the group, in which a lot of detailed discussion to finalise the draft was undertaken, was hosted (with a funding contribution from CODATA) at the US National Academies of Science between the 2nd RDA Plenary and the DataCite Summer Meeting (which CODATA also co-sponsored). It has been intellectually stimulating and a real pleasure to contribute to these discussions and to witness so many informed and engaged people bashing out these issues.

The principles developed by the Synthesis Group are now open for comment and I urge as many people, researchers, editors and publishers as possible who believe that data has a place in scholarly communications to comment on them and, in due course, to endorse them and put them into practice.

Are we finally at the cusp of real change in practice? Will we now start seeing the practice of citing data sources become more and more widespread? It’s soon to say for sure, but I hope these principles, and the work on which they build, have got us to a stage where we can start really believing the change is well underway.

Simon Hodson is Executive Director of CODATA and a member of the Dryad Board of Directors.  This post was originally published on the CODATA blog.

Read Full Post »


As we announced earlier, Dryad will be introducing data publishing fees at the beginning of September. Here’s why we are doing this, and what it will mean for you as a submitter.


The Data Publishing Charge (DPC) is a modest fee that recovers the basic costs of curation and preservation, and allows Dryad to make its contents freely available for researchers and educators at any institution anywhere in the world.  DPCs provide a broad and fair revenue stream that scales with the costs of maintaining the repository, and helps ensure that Dryad can keep its commitment to long-term accessibility.

Who pays?

There are three cases:

  1. The DPC is waived in its entirety if the submitter is based in a country classified by the World Bank as a low-income or lower-middle-income economy.
  2. For many journals, the society or publisher will sponsor the DPC on behalf of their authors; you can see whether this applies to your journal here (the list is growing quickly, so be sure to check back when you are ready to submit new data).
  3. In the absence of a waiver or a sponsor, the DPC is US$80, payable by the submitter.  Payment details are accepted upon submission, but the fee will not be charged unless and until the data package is accepted for publication.

Two additional fees may apply. Submitters will be charged for data packages in excess of 10GB (US$15 for the first additional GB and US$10 for each GB thereafter), to cover additional storage costs.  If there is no sponsor, and the data package is associated with a journal lacking integrated data and manuscript submission, the submitter will be charged US$10 to cover the additional curation costs.

Submitters may use grant funds, institutional funds, or any other source, as long as payment can be made using a credit card or PayPal.  We regret that submitters cannot be invoiced for single submissions – but please do contact us if you are interested in purchasing a larger group of vouchers for future use.  We encourage researchers to inquire with librarians at their institution about available funding sources, and to budget data publication funds for future submissions into their grants, as part of their data management plan.

Note that there will be no charges for submissions made before the introduction of DPCs in September, regardless of when the data package is accepted for publication.

Help us spread the word

If your organization does not yet sponsor Data Publication Charges, or is not yet a Member, you may wish to let them know that you feel data archiving deserves their financial support.  Dryad offers a variety of flexible payment plans that provide for volume discounts, and there are additional discounts for Member organizations.  Organizations need not be publishers. Universities, funders, libraries and even individual research groups can purchase bundles of single-use vouchers that will cover the DPCs for data packages associated with publications appearing in any journal, as well as other publication types such as monographs and theses.  Prospective sponsors and Members may contact director@datadryad.org to figure out what will work best for their circumstances.

We are grateful for all the input we have received into our sustainability planning, and look forward to the continued support of our community in carrying out our nonprofit mission for many long years to come.  If you have questions or suggestions, please leave a comment or contact us here.

Read Full Post »

The following guest post is from Tim Vines, Managing Editor of Molecular Ecology and Molecular Ecology Resources.  ME and MER have among the most effective data archiving policies of any Dryad partner journal, as measured by the availability of data for reuse [1].  In this post, which may be useful to other journals figuring out how to support data archiving, Tim explains how Molecular Ecology’s approach has been refined over time.


Ask almost anyone in the research community, and they’ll say that archiving the data associated with a paper at publication is really important. Making sure it actually happens is not quite so simple. One of the main obstacles is that it’s hard to decide which data from a study should be made public, and this is mainly because consistent data archiving standards have not yet been developed.

It’s impossible for anyone to write exhaustive journal policies laying out exactly what each kind of study should archive (I’ve tried), so the challenge is to identify for each paper which data should be made available.

Before I describe how we currently deal with this issue, I should give some history of data archiving at Molecular Ecology. In early 2010 we joined with the five other big evolution journals in adopting the ‘Joint Data Archiving Policy’, which mandates that “authors make all the data required to recreate the results in their paper available on a public archive”. This policy came into force in January 2011, and since all five journals brought it in at the same time it meant that no one journal suffered the effects of bringing in a (potentially) unpopular policy.

To help us see whether authors really had archived all the required datasets, we started requiring that authors include ‘Data Accessibility’ (DA) section in the final version of their manuscript. This DA section lists where each dataset is stored, and normally appears after the references.  For example:

Data Accessibility:

  • DNA sequences: Genbank accessions F234391-F234402
  • Final DNA sequence assembly uploaded as online supplemental material
  • Climate data and MaxEnt input files: Dryad doi:10.5521/dryad.12311
  • Sampling locations, morphological data and microsatellite genotypes: Dryad doi:10.5521/dryad.12311

We began back in 2011 by including a few paragraphs about our data archiving policies in positive decision letters (i.e. ‘accept, minor revisions’ and ‘accept’), which asked for a DA section to be added to the manuscript during their final revisions. I would also add a sticky note to the ScholarOne Manuscripts entry for the paper indicating which datasets I thought should be listed. Most authors added the DA, but generally only included some of the data. I then switched to putting my list into the decision letter itself, just above the policy itself. For example:

“Please don’t forget to add the Data Accessibility section- it looks like this needs a file giving sampling details, morphology and microsatellite genotypes for all adults and offspring. Please also consider providing the input files for your analyses.”

This was much more effective than expecting the authors to work out which data we wanted. However, it still meant that I was combing through the abstract and the methods trying to work out what data had been generated in that manuscript.

We use ScholarOne Manuscripts’ First Look system for handling accepted papers, and we don’t export anything to be typeset until we’re satisfied with the DA section. Being strict about this makes most authors deal with our DA requirements quickly (they don’t want their paper delayed), but a few take longer while we help authors work out what we want.

The downside of this whole approach is that it takes me quite a lot of effort to work out what should appear in the DA section, and would be impossible in a journal where an academic does not see the final version of the paper. A more robust long-term strategy has to involve the researcher community in identifying which data should be archived.

I’ll flesh out the steps below, but simply put our new approach is to ask authors to include a draft Data Accessibility section at initial submission. This draft DA section should list each dataset and say where the authors expect to archive it. As long as the DA section is there (even if it’s empty) we send the paper on to an editor. If it makes it to reviewers, we ask them to check the DA section and point out what datasets are missing.

A paper close to acceptance can thus contain a complete or nearly complete DA section. Furthermore, any deficiencies should have been pointed out in review and corrected in revision. The editorial office now has the much easier task of checking over the final DA section and making sure that all the accession numbers etc. are added before the article is exported to be typeset.

The immediate benefit is that authors are encouraged to think about data archiving while they’re still writing the paper – it’s thus much more an integral part of manuscript preparation than an afterthought. We’ve also found that a growing proportion of papers (currently about 20%) are being submitted with a completed DA section that requires no further action on our part. I expect that this proportion will be more like 80% in two years, as this seems to be how long it takes to effect changes in author or reviewer behavior.

Since the fine grain of the details may be of interest, I’ve broken down the individual steps below:

1) The authors submit their paper with a draft ‘Data Accessibility’ (DA) statement in the manuscript; this lists where the authors plan to archive each of their datasets. We’ve included a required checkbox in the submission phase that states ‘A draft Data Accessibility statement is present in the manuscript’.

2) Research papers submitted without a DA section are held in the editorial office checklist and the authors contacted to request one. In the first few months of using this system we have found that c. 40% of submissions don’t have the statement initially, but after we request it the DA is almost always emailed within 3-4 days. If we don’t hear for five working days we unsubmit the paper; this has happened to about only 5% of papers.

3) If the paper makes it out to review, the reviewers are asked to check whether all the necessary datasets are listed, and if not, request additions in the main body of their review. Specifically, our ‘additional questions’ section of the review tab in S1M now contains the question: “Does the Data Accessibility section list all the datasets needed to recreate the results in the manuscript? If ‘No’, please specify which additional data are needed in your comments to the authors.”  Reviewers can choose ‘yes’, ‘no’ or ‘I didn’t check’; the latter is important because reviewers who haven’t looked at the DA section aren’t forced to arbitrarily click ‘yes’ or ‘no’.

4) The decision letter is sent to the authors with the question from (3) included. Since we’re still in the early days of this system and less than a quarter of our reviewers understand how to evaluate the DA section, I am still checking the data myself and requesting that any missing datasets be included in the revision. This is much easier than before as there is a draft DA section to work with and sometimes some feedback from the reviewers.

5) The editorial office then makes sure that any deficiencies identified by myself or the reviewers are dealt with by the time the paper goes to be typeset; this is normally dealt with at the First Look stage.

I’d be very happy to help anyone that would like to know more about this system or its implementation – please contact me at managing.editor@molecol.com

[1] Vines TH, Andrew RL, Bock DG, Franklin MT, Gilbert KJ, Kane NC, Moore JS, Moyers BT, Renaut S, Rennison DJ, Veen T, Yeaman S. Mandated data archiving greatly improves access to research data. FASEB J. 2013 Jan 8. Epub ahead of print.  Update: Also available from arXiv.

Read Full Post »

Our guest post today is from Mohamed Noor of Duke University, president of the American Genetic Association. The AGA is a scholarly society dating back to 1903.  AGA, together with Oxford University Press, publishes the Journal of Heredity, which is a charter member in the Dryad organization and one of the first journals to integrate manuscript and data submission with the repository.  The society just held their annual symposium in Durham, North Carolina, not so far from Dryad’s NESCent headquarters, and has some excellent news to report from the Council meeting.

The American Genetic Association is pleased to announce that it has now fully adopted the Joint Data Archiving Policy (JDAP) for the Journal of Heredity.  The Journal of Heredity had previously required that newly reported nucleotide or amino acid sequences, and structural coordinates, be submitted to appropriate public databases. For other forms of data, the Journal “endorsed the principles of the Joint Data Archiving Policy (JDAP) in encouraging all authors to archive primary datasets in an appropriate public archive, such as Dryad, TreeBASE, or the Knowledge Network for Biocomplexity.”

This voluntary archiving policy was facilitated by the direct link between the Journal of Heredity and Dryad, in effect since February 2010.

To further support data-sharing and data access, in July 2012, the AGA Council voted unanimously to make data archiving a requirement for publication, under the terms specified in the JDAP.

The requirement will take effect by January 1, 2013. The American Genetic Association also recognizes the vast investment of individual researchers in generating and curating large datasets. Consequently, we recommend that this investment be respected in secondary analyses or meta-analyses in a gracious collaborative spirit.

Many other leading journals in ecology and evolutionary biology have adopted policies modeled on JDAP over the past two years, and other journals are invited to consider it as a policy that has attracted wide support among scientists.

Read Full Post »

An important milestone was reached when the Dryad organization officially recently adopted a cost recovery plan to ensure Dryad’s sustainability.  The plan was the result of several years of deliberation among Dryad’s Interim Partners, experts in sustainability, and many prospective Member organizations.

The plan identifies three primary funding sources. First are deposit fees, and there are several ways in which they may be paid:

  • A journal or publisher may agree to pay an annual fee based on the number of articles it publishes annually, in anticipation that a substantial fraction will have data deposited in Dryad.
  • An organization (typically, but not necessarily, a journal or publisher) may pay a fixed fee per data deposited. Vouchers may be purchased for bulk purchases in advance, or organizations may be regularly billed after deposits are received.
  • If the fee is not paid through the journal, society, publisher, or other organization, authors may pay the deposit fee at the time of deposit.

The deposit fee will vary among these options depending on transaction costs. It is expected that a Member-discounted prepaid voucher will cost approximately $50 USD.  Members will be entitled to receive a 10% discount.

The rationale for deposit-fees is several-fold.  First, collecting revenue upfront allows Dryad to make the data freely available to users and ensure that preservation costs will not be lacking down the road.  With a repository of sufficient size, most non-fixed costs are due to new deposits, and are incurred at the time of deposit.  Charging deposit fees ensures that revenues will scale with expenses and that funds are available to the repository when they are needed.  Furthermore, there are many different parties making deposits, and the number of deposits from different journals, institutions, investigators, etc., varies widely. Deposit fees have the virtue of distributing the costs among the many parties so that the amount required by each party is relatively small and varies in proportion to usage.

Another source of revenue will be annual membership fees, expected to be $1000 USD annually, which will confer voting rights, discounted deposit fees, participation in Annual Meetings, and other benefits.

Deposit fees and membership fees are intended to cover the operating costs of the repository. The third revenue source, funding from grants and charitable organizations, will be used for research, development, and new initiatives.  It is expected that this plan will be implemented in parallel with an endowment campaign, which may be used to reduce deposit fees, invest in new technologies, and help assure long-term sustainability. More details about the plan are available at http://wiki.datadryad.org/Business_Plan_and_Sustainability.

Read Full Post »

Dryad’s new governance structure and cost recovery plan emerged from a consultation process that culminated in a meeting of the Dryad Interim Board in Vancouver, Canada in July 2011.  This was the third and final meeting of this temporary governing body. Over 25 representatives from a diversity of journals, societies, publishers and other organizations met at the University of British Columbia to review progress and chart the next steps for Dryad.

Vancouver maple tree, courtesy of Marcel Holyoak, via Flickr

In addition to the governance and sustainability plans, participants also made progress on a number of important policy issues. Several of these bear on what content Dryad will accept:

  • Software: Dryad is intended to provide a repository for code only where it does not otherwise have a better home. It is expected that Dryad will be used primarily for snapshots or “one-off” scripts that would otherwise be lost, rather than the maintenance of ongoing software projects that would be better hosted by a public version control system.
  • Other integral and supplementary materials:  Dryad will accept the full range of content that is currently hosted by the journal/publisher as Supplemental Online Material, and not restrict the repository contents strictly to data. This option will be provided to those journals or publishers that wish to take advantage of it.  Whether it be software, data, or other material, authors will still be asked to release rights to the content under the terms of CCZero.
  • Qualifying publications:  All content in Dryad must be documented by a publication. The Interim Board expanded the definition of qualifying publications to include not just those that have undergone peer review, but any legitimate publication with expert vetting, such as a doctoral thesis.

The report of the meeting is available here.   We extend particular thanks for the success of the meeting to the members of the interim Executive Committee: Marcel Holyoak, William Michener, Allen Moore and Michael Whitlock (chair and host at UBC).

Read Full Post »


Get every new post delivered to your Inbox.

Join 6,212 other followers