Feeds:
Posts
Comments

Archive for the ‘Uncategorized’ Category

Photo by David Iliff; license: CC-BY-SA 3.0

Dryad invites current members, prospective members, and other interested parties to attend the Annual Membership Meeting in Oxford, UK on the 24th of May.  This is the first open meeting of the newly incorporated organization and will be the last membership meeting before the introduction of deposit fees in September.  Attendees will learn about recent developments, get a preview of upcoming features, have a say in the governance of the organization, and weigh in on topics of relevance to the future of Dryad, its members and partner journals.  Speakers scheduled to present emerging issues include:

  • Marianne Bamkin of JoRD – Model journal policies and implementation
  • Jonathan Tedds  of PREPARDE - Review of data associated with publications
  • Simon Hodson of JISC – The use of grant funds for data archiving costs
  • Sarah Callaghan of the CODATA-ICSTI Task Group on Data Citation - Data citation principles
  • Martin Fenner of PLOS ALM – Tracking data usage and impact
  • Eefke Smit of STM – The how and why of repository certification
  • Susanna Assunta-Sansone of ISA and BioSharing - Helping researchers to collect, curate, analyse, share and publish data.
  • Bill Michener of DataONE – Relevance of the DataNet program to Dryad

The Membership Meeting will cap off a series of exciting events spotlighting trends in scholarly communication and research data:

  • The Now and Future of Data Publishing on 22 May – A daylong program featuring new initiatives and current issues in data publishing. Organized by the JISC together with a range of organizations including BioSharingDataONESTM and Wiley-Blackwell.
  • The ORCID Outreach meeting on the morning of 23 May and ORCID CodeFest from 23-24 May
  • A joint Dryad-ORCID Symposium on Research Attribution on the afternoon of 23 May.  The symposium will address the changing culture and technology of how credit is assigned and tracked for data, software, and other research outputs.  Keynote speakers Johanna McEntyre (Europe PubMed Central) and David DeRoure (Oxford eResearch Centre) will be joined by panelists Liz Allen (Wellcome Trust), Christine Borgmann (UCLA), Martin Fenner (PLOS), Neil Chue Hong (Software Sustainability Institute), Trish Groves (BMJ), John Kaye (British Library) and moderator Cameron Neylon (PLOS) to address the many faces of the issue.

You may register for events separately here and here through May 13th.  A block of rooms has been set aside at the Malmaison Hotel; enter corporate code OXER900 to receive a discounted rate. Please consult the Dryad membership meeting website closer to the event if you are interested in viewing the webcast.

We hope to see you there!

Read Full Post »

seed-1

Dryad is a nonprofit organization fully committed to making scientific and medical research data permanently available to all researchers and educators free-of-charge without barriers to reuse.  For the past four years, we have engaged experts and consulted with our many stakeholders in order to develop a sustainability plan that will ensure Dryad’s content remains free to users indefinitely.  The resulting plan allows Dryad to recoup its operating costs in a way that recovers revenues fairly and in a scalable manner.  The plan includes revenue from submission fees, membership dues, grants and contributions.

A one-time submission fee will offset the actual costs of preserving data in Dryad.  The majority of costs are incurred at the time of submission when curators process new files, and long-term storage costs scale with each submission, so this transparent one-time charge ensures that resources scale with demand.  Dryad offers a variety of pricing plans for journals and other organizations such societies, funders and libraries to purchase discounted submission fees on behalf of their researchers.  For data packages not covered by a pricing plan, the researcher pays upon submission.  Waivers are provided to researchers from developing economies.  See Pricing Plans for a complete list of fees and payment options.  Submission fees will apply to all new submissions starting September 2013.

Membership dues will supplement submission fees, allowing Dryad to maintain its strong ties to the research community through its volunteer Board of Directors, Annual Membership Meetings, and  other outreach activities to researchers, educators and stakeholder organizations.  See Membership Information.

Grants will fund research, development and innovation.

Donations will support all of the above efforts.  In addition, Dryad will occasionally appeal to donors to fund special projects or specific needs, such as preservation of valuable legacy datasets and deposit waivers for researchers from developing economies.

We are grateful for all the input we have received into our sustainability plan, and look forward to your continued support in carrying out our nonprofit mission for many long years to come.

Read Full Post »

OSTP homepageOn Friday, the Obama administration made a long-awaited announcement regarding public access to the results of federally funded research in the United States.

There has been considerable attention given to the implications for research publications (a concise analysis here).  Less discussed so far — but just as far reaching — the new policy also has quite a lot to say about research data, a topic on which the White House solicited, and received, an earful of input just over a year ago.

What does the directive actually require?  All federal government agencies with at least $100M in R&D expenditures must develop, in the next six month, policies for digital data arising from non-classified research that address a host of objectives, including:

  • to “maximize access, by the general public and without charge, to digitally formatted scientific data created with federal funds” while recognizing that there are cases in which preservation and access may not be desirable or feasible.
  • to promote greater use of data management plans for both intramural and extramural grants and contracts, including review of such plans and mechanisms for ensuring compliance
  • to allow inclusion of appropriate costs for data management and access in grants
  • to promote the deposit of data in publicly accessible databases
  • to address issues of attribution to scientific data sets
  • to support training in data management and stewardship
  • to “outline options for developing and sustaining repositories for scientific data in digital formats, taking into account the efforts of public and private sector entities”

Interestingly, the directive is silent on the issue of embargo periods for research data, neither explicitly allowing or disallowing them.

In the words of White House Science Advisor John Holdren

…the memorandum requires that agencies start to address the need to improve upon the management and sharing of scientific data produced with Federal funding. Strengthening these policies will promote entrepreneurship and jobs growth in addition to driving scientific progress. Access to pre-existing data sets can accelerate growth by allowing companies to focus resources and efforts on understanding and fully exploiting discoveries instead of repeating basic, pre-competitive work already documented elsewhere.

The breadth of research impacted by this directive is notable.  Based on the White House’s proposed 2013 budget, the covered agencies would spend more then $60 billion on R&D.  A partial list includes:

  • The National Institutes of Health (NIH)
  • The National Science Foundation (NSF)
  • The National Aeronautics and Space Administration (NASA)
  • The Department of Energy (DOE)
  • The Department of Agriculture (USDA)
  • The National Oceanic and Atmospheric Administration (NOAA)
  • The National Institutes for Standards and Technology (NIST)
  • The Department of the Interior (which includes the Geological Survey)
  • The Environmental Protection Agency (EPA)
  • and even the Smithsonian Institution

We applaud OSTP for moving to dramatically improve the availability of research data collected in the public interest with federal funds.

You can read the full memo here: the data policies are covered in Section 4.

Read Full Post »

Lee Dirks

We are profoundly saddened by the untimely and tragic death of our dear friend and colleague Lee Dirks, who was killed together with his wife Judy Lew in a road accident in the Peruvian Andes.

Lee had recently been elected to the Board of Directors for Dryad.  He also served on the Board of Visitors for the UNC School of Information Sciences (of which he was a proud alumnus) and was a member of the Board of the SILS Metadata Research Center.  Lee made a named for himself in recent years as Director of Education and Scholarly Communication at Microsoft.

Lee was a visionary information scientist, a warm and generous personality, and a man who loved adventure.  The number of people whose lives he touched in his own short life was staggeringly large.

Lee and his wife are survived by their two young daughters, who were at home in Seattle at the time of the accident.  Our thoughts are with them.  And we will miss Lee greatly.

Read Full Post »

We are experimenting with a nimble new format for our newsletter, in which each item consists of an individual blog post.  All the news items are also available in one PDF document if you’d prefer.

  1. Stakeholder governance.  “The scientific, educational, and charitable mission of Dryad is to promote the availability of data underlying findings in the scientific literature for research and educational reuse. The vision of Dryad is a scholarly communication system in which learned societies, publishers, institutions of research and education, funding bodies and other stakeholders collaboratively sustain and promote the preservation and reuse of data underlying the scholarly literature.”  This Mission Statement is from Dryad’s new Bylaws, which were approved this month by a vote of its Interim Partners. Since its inception, Dryad been guided by the idea that an enduring community resource requires stakeholder governance…
  2. Sustainability planning.  Another important milestone was reached when the organization officially adopted a cost recovery plan to ensure Dryad’s sustainability.  The plan was the result of several years of deliberation among Dryad’s Interim Partners, experts in sustainability, and many prospective Member organizations…
  3. Summer 2011 Interim Board meeting. The governance and cost recovery plan emerged from a consultation process that culminated in a meeting of the Dryad Interim Board in Vancouver, Canada in July 2011. In addition to the governance and sustainability plans, participants also made progress on a number of important policy issues. Several of these bear on what content Dryad will accept…
  4. New funding from the US National Science Foundation. Earlier this year, the NSF, through its Advances in Biological Informatics program, announced a new award of $2.4M over four years to enable Dryad to scale up its technical infrastructure to support the rapidly expanding user base of journals and researchers, ensure that the repository is meeting the needs of that user base…
  5. New integrated journals.  In recent months, more journals have implemented submission integration with Dryad to make data archiving easier for authors.  Technically, the process entails setting up semi-automated communications between Dryad and the manuscript submission system of the journal.  Currently 24 journals have implemented submission integration…
  6. New features. A number of enhancements to Dryad have been made in recent months, including these three that were in high demand from users…

If you do not yet receive our newsletters by email and would like to, please sign up for our low traffic Dryad-announcements mailing list.

Read Full Post »

What matters to you when looking for research data in a repository? UK based Digital Curation Centre is looking for Dryad users to complete a 10 minute questionnaire on this. Results will contribute to an assessment framework for Dryad, and the questionnaire includes entry to a competition for $80/ £50 Amazon tokens. DCC are carrying this out as part of the Dryad UK project, which also involves the British Library and Oxford University’s Image Bioinformatics Lab.

Read Full Post »

A new study in PLoS ONE by Heather Piwowar, a postdoctoral associate affiliated with DataONE, Dryad, and NESCent, reveals interesting trends in the archiving of data underlying published microarray results.  From the press release:

By querying the full text of the scientific literature through websites like Google Scholar and PubMed Central, Piwowar identified eleven thousand studies that collected a particular type of data about cellular activity, called gene expression microarray data. Only 45% of recent gene expression studies were found to have deposited their data in the public databases developed for this purpose. The rate of data publication has increased only slightly from 2007 to 2009. Data is shared least often from studies on cancer and human subjects: cancer studies make their data available for wide reuse half as often as similar studies outside cancer.

“It was disheartening to discover that studies on cancer and human subjects were least likely to make their data available. These data are surely some of the most valuable for reuse, to confirm, refute, inform and advance bench-to-bedside translational research,” Piwowar said.

“We want as much scientific progress as we can get from our tax and charity dollars. This requires increased access to data resources. Data can be shared while maintaining patient privacy,” Piwowar added, noting that patient re-identification is rarely an issue for gene expression microarray studies.

Reference:  Piwowar, H. (2011). “Who shares? Who doesn’t? Factors associated with openly archiving raw research data.” PLoS ONE 6(7): e18657. doi:18610.11371/journal.pone.0018657

“In the spirit of the topic”, the data behind the study are publicly available in Dryad at doi:10.5061/dryad.mf1sd

Read Full Post »

Behind a scientific finding, in addition to unique data, there is often unique software. If Dryad archives data in part to allow others to validate the findings reported in the literature, then should we not also enable researchers to archive the software that was used to process, analyze and, in the case of simulations — create those data?

Some users have already deposited software source code alongside their data (e.g. doi:10.5061/dryad.8384, doi:10.5061/dryad.18) [1]. If users are willing and able to release their code under a CC-Zero waiver [2], then there is nothing stopping this practice. In fact, Creative Commons and the Free Software Foundation have recently stated that CC-Zero is appropriate for release of software to the public domain [3].

Yet, a number of journal partners and users have requested that Dryad provide more, or different, options for software, and that authors should not be required to waive legal rights with CC-Zero. Since software is clearly a creative work, source code unambiguously carries copyrightable intellectual property. Enabling a greater range of licensing options could open the door to more authors archiving software that is integral to their paper, and this would further Dryad’s mission of enabling scientists to validate and build upon previously work. So, how should we do that?

One important consideration is that we aim to make the submission process as easy as possible for users. This would be compromised by presenting a confusing array of licensing options, and having those differ between types of files.

The principle desiderata of a license for deposited software are more or less the same as for data: freedom to reuse, modify (analogous to the “recombine” for data), and redistribute (in original or modified form), with no more than attribution expected or required. It turns out that these are also the principles common to all licenses approved by the Open Source Initiative, or OSI [4].

So, could we just pick one of the minimally restrictive OSI-approved licenses (since we want to facilitate reuse rather than hamper it), and require release of software under those terms? We are currently of the opinion that the answer is “no”, for a couple of reasons:

(1) Some, though not all, software will already be licensed. Asking a user to choose a different one would clearly be a burden, since changing a license requires express consent from all copyright holders, including possibly the employer or funder.

(2) If the software includes third-party code to which a ‘share-alike’ license has been assigned (e.g. the GNU Public License, or GPL [5]) , then the user is required to release the code under equivalent licensing terms. Unlike for data, it would be highly unusual to combine software source code from many different sources, and so this does not pose an insurmountable barrier to archiving and reuse for scientific purposes.

Given the above, our current thinking is that Dryad should enable users to select any OSI-approved license they deem appropriate. However, we also wish to strongly guide users, when there is no prior license assigned to any part of their software, to choose either a non-share alike OSI license or a CC-Zero waiver. It is currently unclear whether dedicating software to the public domain with CC-Zero would be of as much value as it is for data [6]. We’d welcome your thoughts on that.

There are some other considerations on our plate, as well:

  • We want to be careful to avoid steering users away from using a public source code repository when that is more appropriate [7]. Is it better for Dryad to host code snapshots, or to direct users to specific versions of software in a public code repository?
  • Some users bundle software and data together in tarballs or zip archives. Since we cannot easily assign different terms to the data and software within such a combined file, it could increase the burden on users to separate these components out.
  • In addition to software, there is other content that publishers host in Supplemental Materials that some of our partner journals would like Dryad to host, instead. To the extent that some of this content is neither data nor software, should we be recognizing a third category of intellectual property, to which a license such as CC-BY [8] would be assigned?

If you have opinions or ideas, we would like to encourage you to share them with us as public comments on this blog. What’s the best way to accommodate software (and other non-data material) within Dryad?

Notes

[1] Some software source code in Dryad is already available under grandfathered license terms, such as in doi:10.5061/dryad.18.

[2] Dryad currently requires users to assign CC-Zero to all archived files. This waives all copyright and related rights in the data (to the extent legally possible in an author’s jurisdiction), effectively dedicating the data to the public domain. The use of CC-Zero is predicated on most data being “facts”, and facts in most jurisdictions cannot be copyrighted, although this not universally true (e.g. photographs). Note that Dryad has a policy that the original article and the data package are to be cited when the data are reused, but we feel that this is most appropriately enforced through scholarly practice, not through a license.

[3] According to Creative Common’s FAQ, CC-Zero “is suitable for dedicating your copyright and related rights in computer software to the public domain, to the fullest extent possible under law. Unlike CC licenses, which should not be used for software, CC0 is compatible with many software licenses, including the GPL“.

[4] http://www.opensource.org/

[5] http://www.gnu.org/licenses/gpl.html

[6] For the motivation behind the recommended use of CC-Zero for data, see the Science Commons Protocol for Implementing Open Access Data

[7] Public open source code repositories include generic ones, such as Sourceforge, as well as those specific to particular types of code, such as R-forge for R, and CPAN for Perl. For more about best practices in scientific software development, see Baxter SM, Day SW, Fetrow JS, Reisinger SJ (2006) Scientific Software Development Is Not an Oxymoron. PLoS Comput Biol 2(9): e87. doi:10.1371/journal.pcbi.0020087

[8] http://creativecommons.org/licenses/by/3.0

[9] Many thanks to H. Lapp for starting this post. I (T. Vision) take responsibility for the opinions expressed here, as well as any sins of omission or commission.

Read Full Post »

It’s January 2011– do you know where your data are? 

It would be a good idea to know and be ready to deposit your files in a data repository, because this month marks the implementation of the Joint Data Archiving Policy.  The policy, endorsed by a consortium of prominent journals and societies, states that journals will require

as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive.

The policy can be customized by each journal, and enables both embargoes and editorial discretion to make special exceptions. Blanket exemptions apply to sensitive data such as identifiable human records and endangered species localities.

The journals (and corresponding societies) implementing the policy this month are:

  • The American Naturalist (American Society of Naturalists)
  • Evolution (Society for the Study of Evolution)
  • Evolutionary Applications
  • Heredity (The Genetics Society)
  • Journal of Evolutionary Biology (European Society for Evolutionary Biology)
  • Molecular Biology and Evolution (Society for Molecular Biology and Evolution)
  • Molecular Ecology
  • Systematic Biology (Society for Systematic Biology)

A sampling of the revised Instructions to Authors includes:

  • The American Naturalist: “The American Naturalist requires authors to deposit the data associated with accepted papers in a public archive. For gene sequence data and phylogenetic trees, deposition in GenBank or TreeBASE, respectively, is required. There are many possible archives that may suit a particular data set, including the Dryad repository for ecological and evolutionary biology data (http://datadryad.org). All accession numbers for GenBank, TreeBASE, and Dryad must be included in accepted manuscripts before they go to Production. Any impediments to data sharing should be brought to the attention of the editors at the time of submission.”
  • Journal of Evolutionary BiologyThe editors and publisher of this journal expect authors to make the data underlying published articles available. An investigator who feels that reasonable requests have not been met by the authors should correspond with the Editor-in-Chief. Authors must use the appropriate database to deposit detailed information supplementing submitted papers, and quote the accession number in their manuscripts.”
  • Molecular Ecology: “Data Accessibility: To enable readers to locate archived data from Molecular Ecology papers, as of January 2011 we will require that authors include a ‘Data Accessibility’ section after their references. This should list the data base and respective accession numbers for all data from the manuscript that has been made publicly available…. Please note that this section must be complete prior to the submission of the final version of your manuscript. Papers lacking this section will not be sent to Production.”

At Dryad, we have been working for some time now with editors and publishers at these and other partner journals to support the implementation of this policy. If you submit an article to a “JDAP journal,” you will be invited to simultaneously submit your data to Dryad. This may occur either prior to review or, depending on the journal, at the time your article is accepted. Dryad and the journal communicate behind the scenes to make it as easy as possible for you to deposit your data, and also ensure that a permanent, resolvable, and citable data identifier is published in the final article.  That way, in the future, no one need be frightened by the question “do you know where your data are?”

Read Full Post »

Ever wonder what happens to your Dryad data behind the scenes? Here’s a quick overview.

Once a depositor has uploaded their data files and finalized their submission, the Dryad curator is notified of the new content. The curator looks at the uploaded files to make sure they really do contain data (and not, say, the article manuscript or pictures of kittens). The curator then exerts some quality control on the metadata, the description of the article and data files. She corrects errors, such as typos or formatting tags that are displaying incorrectly, and may enrich the metadata, by adding taxon name keywords, for example. Advanced metadata enrichment issues include the tricky realm of name authority control, which ensures that all works by a given author are gathered together despite the varying forms of their name.

Once the curator approves the submission, the metadata description of the data goes live in the repository. The status of the data files themselves depends upon the embargo options selected by the depositor. Dryad DOIs (Digital Object Identifiers) are sent to the depositor and, in the case of our integrated partner journals, to the journal editors, so that they can be included in all forms of the final published article, and allow readers of the article to find the supporting data.

After the article is published, the curator adds complete article citation information, including a hyperlinked article DOI, to the Dryad record, and updates any data file embargoes, if needed.

The outcome is data files, which

  • are securely deposited in the repository, and linked to the journal article,
  • have a unique, permanent identifier that can be cited, and
  • can be discovered independently of the article, as well as through the article.

Additionally, authors can now track the views and downloads of their data files.   Dryad displays the number of times the data package has been viewed, and the number of times each component data file has been both viewed and downloaded.

Read Full Post »

Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 6,212 other followers