Researcher Profile: Zach Gompert

We’re beginning a series highlighting researchers who use Dryad to openly publish their research data. We ask them about their current projects, why they believe in open science, and why they choose Dryad.

photo of Zach Gompert

Zach Gompert

For our first researcher profile, we talked with Dr. Zach Gompert, assistant professor in the Department of Biology at Utah State University, about how his work ties in with open science:

Dryad: What is your area of research and what’s your current focus?

Gompert: The overarching goal in my lab is to advance understanding of the extent, organization, causes, and consequences of variation in nature. Some of the issues were are investigating are:

  • What are the evolutionary consequences of hybridization?
  • How does the evolution of novel ecological interactions affect biodiversity?
  • Is temporal variation in natural selection a key determinant of genetic diversity levels in natural populations?

We address these questions through population genomic analyses of natural and experimental populations, and through development of new theory and statistical methods. Our work on Lycaenid butterflies shows that hybridization can be a key creative force in animal evolution and that evolutionary histories are not always well represented by the ‘evolutionary tree’ metaphor. In other words, lineages don’t just split, they come back together.

We have quite a few datasets in Dryad now, including partial genome sequences from over a thousand butterflies.

butterfly in field

Lycaeides melissa

Dryad: What do you think about open science in general? What are advantages of open science? 

Gompert: Science has always been a communal endeavor. Large-scale collaboration is vital now for a number of reasons:

  • Diverse expertise. Many key questions require a diverse group of investigators. This results in big, multifaceted datasets and necessitates rapid sharing of data, methods, and findings.
  • Re-purposing data. It’s common now for data and methods to have applications beyond those that they were originally collected or developed for. Open science allows these to be used by other investigators, accelerating the rate of discovery.
  • Data integrity. Openness ensures a higher level of quality and integrity. When data and methods are available for scrutiny, possible errors are more likely to be identified and corrected. This is particularly relevant for large-scale, multi-investigator projects.
  • Public funding and access. Since much of science is funded by the public, I think scientists have an ethical duty to make the products of research available to everyone.

Dryad: In your opinion, what are disadvantages or concerns about open science?

Gompert: There are two common concerns:

  • Getting scooped. Researchers can be scooped if another group analyzes and publishes the data they generated. While this has some validity, sufficient safeguards and community standards are in place to minimize this problem, and it’s minor compared to the advantages of openness.
  • Poor documentation. I think data archiving is in better shape than it once was, but much of archived data or code are not sufficiently documented to truly be useful to others. Enhancing documentation of data is a big area where we as a community need to do more.

Dryad: You have over 20 datasets archived in Dryad. What do you see as the benefits of data sharing in Dryad?

Gompert: The primary strength of Dryad is its flexibility, specifically the ability to archive diverse types of data (and computer code) in a single location and to link to other more specialized databases such as NCBI. With Dryad, researchers have a central location where they can find all of the data associated with a publication.

A month of open

We’re coming off of a big month which included a two-day Dryad board meeting, International Data Week in Denver, and the Open Access Publishers meeting (COASP) in Arlington, VA. Combined with Open Access Week, we’ve been basking in all things #openscience at Dryad.

International Data Week 2016

International Data Week was a collection of three different events: SciDataCon 2016International Data Forum, idwlogoand the 8th Research Data Alliance Plenary Meeting. While it was my first time attending RDA and SciDataCon, it wasn’t the first time for the many Dryad board members who have been actively participating in these forums for years.

Dryad staff had the pleasure of participating in a few panels over the week. As part of SciDataCon, Elizabeth Hull discussed protecting human subjects in an open data repository. In another, as part of the RDA 8th Plenary, I participated in a discussion of the challenges surrounding sustainability of data infrastructure. (The talk is available on the RDA website. The panel starts at minute 30).

29822088326_6d9db25bbf_qParticipating in IDW reminded me how important our diverse community of stakeholders and members are to furthering the adoption of open data. Dryad members create a community and support our mission. Our members benefit by receiving discounts on data publication fees and by relying on a repository that stays current in the evolving needs and mandates that surround open data. We work together to help make open data easy and affordable for authors.

Asking OA publishers to be more open

Following International Data Week, I had the opportunity to participate for the first time in the Open Access Scholarly Publishers Association meeting, COASP 2016. Heather Joseph, Executive Director of SPARC kicked off the meeting with a keynote that urged attendees to consider how they would complete the phrase “Open in order to . . .” as a way to ensure that we all keep our sights on working toward something more than just ‘open for the sake of open’. Some of other memorable talks addressed the challenges with mapping connections from articles to other related outputs, and discussed the growing interest in alternative revenue models to article processing charges (APCs). I had the privilege to deliver a keynote entitled “Be More Open” which highlighted the connections between Open Access and Open Data movement, and I encouraged OASPA to add open data policies to their membership requirements.

I’d like to thank the organizers and sponsors of International Data Week and COASP 2016 for making these important conversations possible. In addition, I would also like to encourage any interested stakeholders to join Dryad and support open data.

Making open data useful: A drug safety case study

We’re pleased to present a guest post from data scientist Juan M. Banda, the lead author of an important, newly-available resource for drug safety research. Here, Juan shares some of the context behind the data descriptor in Scientific Data and associated data package in Dryad. – EH

_____

As I sit in a room full of over one hundred bio-hackers at the 2016 Biohackathon in Tsuruoka, Yamagata, Japan, the need to have publicly available and accessible data for research use is acutely evident. Organized by Japan’s National Biosciences Database Center (NBDC) and Databases Center for Life Science (DBLS), this yearly hackathon gathers people from organizations and universities all over the world, including the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI), with the purpose of extending and interlinking resources like PubChem, PhenomeCentral, Bio2RDF, and PubAnnotation.

The end goal: finding better ways to access data that will allow researchers to focus on analysis of the data rather than preparation.

In the same spirit, our publication “A curated and standardized adverse drug event resource to accelerate drug safety research” (doi:10.1038/sdata.2016.26; data in Dryad at http://doi.org/10.5061/dryad.8q0s4) helps researchers in the drug safety domain with the standardization and curation of the freely available data from the Federal Food and Drug Administration (FDA) adverse events reporting system (FAERS).

FAERS collects information on adverse events and medication errors reported to the FDA, and is comprised of over 10 million records collected between 1969 to the present. As one of the most important resources for drug safety efforts, the FAERS database has been used in at least 750 publications as reported by PubMed and was probably manipulated, mapped and cleaned independently by the vast majority of the authors of said publications. This cleaning and mapping process takes a considerable amount of time — hours that could have been spent analyzing the data further.

Our publication hopes to eliminate this needless work and allow researchers to focus their efforts in developing methods to analyze this information.

OHDSIAs part of the Observational Health Sciences Initiative (OHDSI), whose mission is to “Improve health, by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care,” we decided to tackle the task of cleaning and curating the FAERS database for our community, and the wider drug safety community. By providing a general common data model (CDM) and a general vocabulary to standardize how electronic patient data is stored, OHDSI allows its participants to join a research network with over 655 million patients.

With a significant fraction of the community’s research being focused on drug safety, it was a natural decision to standardize the FAERS database with the OMOP vocabulary, to allow all researchers on our network access to FAERS. Since the OMOP vocabulary incorporates general vocabularies such as SNOMED, MeSH, and RxNORM, among others, the usability of this resource is not limited to participants of this community.

In order to curate this dataset, we took the source FAERS data in CSV format and de-duplicated case reports. We then performed value imputation for certain fields that were missing. Drug names were standardized to RxNorm ingredients and standard clinical names (for multi-ingredient drugs). This mapping is tricky because some drug names have spelling errors, and some are non-prescription drugs, or international brand names. We achieved coverage of 93% of the drug names, which in turn cover 95% of the case reports in FARES.

For the first time, the indication and reactions have been mapped to SNOMED-CT from their original MedRA format. Coverage for indications and reactions is around 64% and 80%, respectively. The OMOP vocabulary allows RxNorm drug codes as well as SNOMED-CT codes to reside in the same unified vocabulary space, simplifying use of this resource. We also provide the complete source code we developed in order to allow researchers to refresh the dataset with the new quarterly FAERS data releases and improve the mappings if needed. We encourage users to contribute the results of their efforts back to the OHDSI community.

With a firm commitment to making open data easier to use, this resource allows researchers to utilize a professionally curated (and refreshable) version of the FAERS data, enabling them to focus on improving drug safety analyses and finding more potentially harmful drugs, as a part of OHDSI’s core mission.

OHDSI_still2

Still from OHMSDI video

The data:

http://doi.org/10.5061/dryad.8q0s4

A full description of the dataset in Scientific Data:

http://www.nature.com/articles/sdata201626

 

— Juan M. Banda

Dryad’s first virtual community meeting: members share their expertise

On May 24, we held the first virtual Dryad Community Meeting, which allowed us to connect both with our membership and with the larger open data community, far and wide. The theme was “Leadership in data publishing: Dryad and learned societies.”

Following an introduction and update about Dryad from yours truly, we heard about the experiences from representatives of three of Dryad’s member societies.

All three societies require that data be archived in an appropriate repository as a condition of publication in their journals. Yet, they have each taken considerable time and effort to develop policies that address the needs and concerns of their different communities.

Bruna spoke about working with an audience that routinely gathers data for very long-term studies. For many Biotropica authors, embargoes are seen as an important prerequisite for data publishing. Their data policy “includes a generous embargo period of up to three years to ensure authors have ample time to publish multiple papers from more complex or long-term data sets”. Biotropica’s policy also recommends those “who re-use archived data sets to include as fully engaged collaborators the scientists who originally collected them”. To address initial resistance to data archiving, and to build understanding and consensus, Biotropica “enlisted its critics” to contribute to a paper discussing the pros and cons of data publication. Out of this process emerged an innovative policy that went into effect at the start of 2016.

Meaden, by contrast, noted that only 8% of Proceedings B authors elect to embargo data in Dryad, and the standard embargo is for only one year after publication. She credited clearer author instructions and a data availability statement in the manuscript submission system as key elements that have increased the availability of data associated with Royal Society publications.

Newton discussed BES’ move from “encouraging data publication” in 2012 to requiring it in 2014. As shown below, this resulted in an impressive increase in the availability of data. Next, the society is looking to develop guidance on data reuse etiquette. Newton noted that this effort would “need to be community-led.”

BES_data_preservation

Slide from Erika Newton’s presentation, illustrating the rise in data deposits for BES journals as associated with changing data policy.

While each speaker reported on unique challenges, all shared commonalities, such as:

  • involving the specific community in policy decisions
  • incrementally increasing efforts to make data available
  • the importance of clear author instructions 

We greatly appreciate the excellent contributions from the panelists, as well all the members and other attendees who participated and contributed to the lively Q&A.

We are also pleased that the virtual format was well received. In our follow-up survey, many of the attendees said they found it easy to ask questions and appreciated the ability to join remotely.

Our aim is that these meetings continue to be a valued forum for our diverse community of stakeholders to share knowledge and discuss emerging issues. If you have suggestions on topics for future meetings, or an interest in becoming a member, please reach out to me at director@datadryad.org.

dryad_members

 

Sci-Hub stories: Digging into the downloads

The following is a guest post from science journalist John Bohannon. We asked him to give us some background on his recent dataset in Dryad and the analysis of that data in Science. What stories will you find in the data? – EH

_______

Scihub_raven

Sci-Hub is the world’s largest repository of pirated journal articles. We will probably look back and see it as inevitable. Soon after it became possible for people to share copyrighted music and movies on a massive scale, technologies like Napster and BitTorrent arrived to make the sharing as close to frictionless as possible. That hasn’t made the media industry collapse, as many people predicted, but it certainly brought transformation.

Unlike the media industry, journal publishers do not share their profits with the authors. So where will Sci-Hub push them? Will it be a platform like iTunes, with journals selling research papers for $0.99 each? Or will Sci-Hub finally propel the industry into the arms of the Open Access movement? Will nonprofit scientific societies and university publishers go extinct along the way, leaving just a few giant, for-profit corporations as the caretakers of scientific knowledge?

There are as many theories and predictions about the impact of Sci-Hub as there are commentators on the Internet. What is lacking is basic information about the site. Who is downloading all these Sci-Hub papers? Where in the world are they? What are they reading?

48 hours of Sci-Hub downloads. Each event is color-coded by the local time: orange for working hours (8am-6pm) and blue for the night owls working outside those hours.

Sometimes all you need to do is ask. So I reached out directly to Alexandra Elbakyan, who created Sci-Hub in 2011 as a 22 year-old neuroscience graduate student in Kazakhstan and has run it ever since. For someone denounced as a criminal by powerful corporations and scholarly societies, she was quite open and collaborative. I explained my goal: To let the world see how Sci-Hub is being used, mapping the global distribution of its users at the highest resolution possible while protecting their privacy. She agreed, not realizing how much data-wrangling it would ultimately take us.

Two months later, Science and Dryad are publicly releasing a data set of 28 million download request records from 1 September 2015 through 29 February 2016, timestamped down to the second. Each includes the DOI of the paper, allowing as rich a bibliographic exploration as you have CPU cycles to burn. The 3 million IP addresses have been converted into arbitrary codes. Elbakyan converted the IP addresses into geolocations using a database I purchased from the company Maxmind. She then clustered each geolocation to the coordinates of the nearest city using the Google Maps API. Sci-Hub users cluster to 24,000 unique locations.

The big take-home? Sci-Hub is everywhere. Most papers are being downloaded from the developing world: The top 3 countries are India, China, and Iran. But the rich industrialized countries use Sci-Hub, too. A quarter of the downloads came from OECD nations, and some of the most intense download hotspots correspond to the campuses of universities in the US and Europe, which supposedly have the most comprehensive journal access.

But these data have many more stories to tell. How do the reading habits of researchers differ by city? What are the hottest research topics in Indonesia, Italy, Brazil? Do the research topics shift when the Sci-Hub night owls take over? My analysis indicates a bimodal distribution over the course of the day, with most locations surging around lunchtime, and the rest peaking at 1am local time. The animated map above shows just 2 days of the data.

Something everyone would like to know: What proportion of downloaded articles are actually unavailable from nearby university libraries? Put another way: What is the size of the knowledge gap that Sci-Hub is bridging?

Download the data yourself and let the world know what you find.

The data:

http://dx.doi.org/10.5061/dryad.q447c

My analysis of the data in Science:

http://www.sciencemag.org/news/2016/04/whos-downloading-pirated-papers-everyone

 

 — John Bohannon

2015 stats roundup

2015While gearing up for the Dryad member meeting (to be held virtually on 24 May – save the date!) and publication of our annual report, we’re taking a look at last year’s numbers.

2015 was a “big” year for Dryad in many respects. We added staff, and integrated several new journals and publishing partners. But perhaps most notably, the Dryad repository itself is growing very rapidly. We published 3,926 data packages this past year — a 44% increase over 2014 — and blew past the 10,000 mark for total data packages in the repository.

Data package size

Perhaps the “biggest” Dryad story from last year is the increase in the mean size of data packages published. In 2014, that figure was 212MB. In 2015, it more than doubled to 481MB, an increase of a whopping 127%.

This striking statistic is part of the reason we opted at the beginning of 2016 to double the maximum package size before overage fees kick in (to 20GB), and simplified and reduced our overage fees. We want researchers to continue to archive more (and larger) data files, and to do so sustainably. Meanwhile, we do continue to welcome many submissions on the smaller end of the scale.

boxplot_logscale_labels

Distribution of Dryad data package size by year. Boxplot shows median, 1st and 3rd quartiles, and 95% confidence interval of median. Note the log scale of the y-axis.

In 2015, the mean number of files in a data package was about 3.4, with 104 as the largest number of files in any data package. To see how times have changed, compare this to a post from 2011 (celebrating our 1,000th submission), where we noted:

Interestingly, most of the deposits are relatively small in size. Counting all files in a data package together, almost 80% of data packages are less than one megabyte. Furthermore, the majority of data packages contain only one data file and the mean is a little less than two and a half. As one might expect, many of the files are spreadsheets or in tabular text format. Thus, the files are rich in information but not so difficult to transfer or store.

We have yet to do a full analysis of file formats deposited in 2015, but we see among the largest files many images and videos, as would be expected, but also a notable increase in the diversity of DNA sequencing-related file formats.

So not only are there now more and bigger files in Dryad, there’s also greater complexity and variety. We think this shows that more people are learning about the benefits of archiving and reusing multiple file types, and that researchers (and publishers) are broadening their view of what qualifies as “data.”

Download counts

2015speciesSo who had the biggest download numbers in 2015? Interestingly, nearly all of last year’s most-downloaded data packages are from genetics/genomics. 3 of the top 5 are studies of specific wild populations and how they adapt to changing circumstances — Sailfin Mollies (fish), blue tits (birds), and bighorn sheep, specifically.

Another top package presents a model for dealing with an epidemic that had a deadly impact on humans in 2015. And rounding out the top 5 is an open source framework for reconstructing the relationships that unite all lineages — a “tree of life.”

In 5th place, with 367 downloads:

In 4th place, with 601 downloads:

In 3rd place, with 1,324 downloads:

In 2nd place, with 1,868 downloads:

And this year’s WINNER, with 2,678 downloads:

The above numbers are presented with the usual caveats about bots, which we aim to filter out, but cannot do with perfect accuracy. (Look for a blog post on this topic in the near future).

As always, we owe a huge debt to our submitters, partners, members and users for supporting Dryad and open data in 2015!

New partnership with The Company of Biologists

We are delighted to announce the launch of a new partnership with The Company of Biologists to support their authors in making the data underlying their research available to the community.

COBNewLogo300dpiThe Company of Biologists is a not-for-profit publishing organization dedicated to supporting and inspiring the biological community. The Company publishes five specialist peer-reviewed journals:

The Company of Biologists offers further support to the biological community by facilitating scientific meetings, providing travel grants for researchers and supporting research societies.

Manuscript submission for all COB journals is now integrated with data submission to Dryad, meaning COB authors can conveniently submit their data packages and manuscripts at the same time. Dryad then makes the data securely available to journal reviewers, and releases them to the public if/when the paper is published.

We congratulate The Company of Biologists on taking this important step to help facilitate open data. To learn more about how your organization or journal can partner with Dryad, please contact us.

A snapshot of life on the savannah

Our latest featured data package is from Alexandra Swanson and colleagues at the Snapshot Serengeti project, and accompanies their peer-reviewed article in Scientific Data.  It provides a unique resource for studying one of the world’s most extraordinary mammal assemblages and also for studies of computer vision and machine learning. In addition, data from Snapshot Serengeti is already being used in biology and computer science classrooms to enable students to work on solving real problems with authentic research data.

 lion

Snapshot Serengeti, CC BY-NC-SA 3.0

The raw data (which are being made available from the University of Minnesota Supercomputing Institute) consist of 1.2 million sets of images collected between February 2011 and May 2013 from 225 heat and motion triggered cameras, operating day and night, distributed over 1,135 sq. km. in Serengeti National Park in Tanzania.  This staggering trove of images was classified by 28,040 registered and ~40,000 unregistered volunteers on Snapshot Serengeti (a Zooniverse project) according to the species present (if any), the number of individuals, the presence of young, and what behaviors were being displayed, such as standing, resting, moving, eating, or interacting.

Remarkably, this vast army of citizen scientists was classifying the images faster than they were being produced, and each image set was classified on average by nine different volunteers.  This led to consensus classifications with high accuracy, 96.6% for species identifications relative to an expert-classified gold set.  Of the more than 300,000 image sets that contain animals, 48 different species were seen, including rare mammals such as the aardwolf and the zorilla.

zorilla

zorilla (image from Snapshot Serengeti CC BY-NC-SA 3.0)

The Dryad data package includes the classifications from all the individual volunteers, the consensus classifications, information about when each camera was operational, and the expert classification of 4,149 image sets as a gold standard.

References:

  • Swanson et al. (2015) Snapshot Serengeti, high frequency annotated camera trap images of 40 mammalian species in an African savannah. Scientific Data.  http://dx.doi.org/10.1038/sdata.2015.26
  • Swanson et al. (2015) Data from: Snapshot Serengeti, high frequency annotated camera trap images of 40 mammalian species in an African savannah. Dryad Digital Repository http://doi.org/10.5061/dryad.5pt92

What were the most downloaded data packages in 2014?

The reason why Dryad is in the business of archiving, preserving, and providing access to research data is so that it will be reused, whether for deeper reading of the publication, for post-publication review, for education, or for future research. While it’s not yet as easy as we would like to track data reuse, one metric that is straightforward to collect is the number of times a dataset has been downloaded, and this is one of two data reuse statistics reported by our friends at ImpactStory and Plum Analytics.

2014 with fireworks

The numbers are very encouraging. There are already over a quarter million downloads for the 8,897 data files released in 2014 (from 2,714 data packages). That’s over 28 downloads per data file. While there is always the caveat that some downloads may be due to activity from newly emerged bots that we have yet to recognize and filter out, we think it is safe to say that most of these downloads are from people.

To celebrate, we would like to pay special tribute to the top five data packages from 2014, as measured by the maximum number of downloads for any single file (since many data packages have more than one) at the time of writing. They cover a diversity of topics from livestock farming in the Paleolithic to phylogenetic relationships among insects. That said, we are struck by the impressively strong showing for plant science — 3 of the top 5 data packages.

In 5th place, with 453 downloads

In 4th place, with 581 downloads

In 3rd place, with 626 downloads

In 2nd place, with 4,672 downloads

And in 1st place, with a staggering 34,879 downloads

Remarkably, given the number of downloads, this last data package was only released in November.

We’d like to thank all of our users, whether you contribute data or reuse it (or both), for helping make science just a little more transparent, efficient, and robust this past year. And we are looking forward to finding out some more of what you did with all those downloads in 2015!

 

 

 

 

Enhanced integration of manuscript and data submission with PLOS

Dryad has been proud to support integrated data and manuscript submission for PLOS Biology since 2012, and for PLOS Genetics since 2013.  Yet there are over 400 data packages in Dryad from six difFeatured imageferent PLOS journals in addition to two research areas of PLOS Currents. Today, we are pleased to announce that we have expanded submission integration to cover all seven PLOS journals, including the two above plus PLOS Computational BiologyPLOS MedicinePLOS Neglected Tropical DiseasesPLOS ONE, and PLOS Pathogens.  

PLOS received a great deal of attention when they modified their Data Policy in March providing more guidance to authors on how and where to make their data available and introducing Data Availability Statements. Dryad’s integration process has been enhanced in a few ways to support this policy and also the needs of a megajournal like PLOS ONE.  We believe these modifications provide an attractive model for integration that other journals may wish to follow. The key difference for authors who wish to deposit data in Dryad is that you are now asked to deposit your data before submitting your manuscript.

  1. PLOS authors are now asked to provide a Data Availability Statement during initial manuscript submission, as shown in the screenshot below. There is evidence that introducing a Data Availability Statement greatly reinforces the effectiveness of a mandatory data archiving policy, and so we expect this change will substantially increase the availability of data for PLOS publications.  PLOS authors using Dryad are encouraged to provide the provisional Dryad DOI as part of the Data Availability Statement.
  2. PLOS authors are now also asked to provide a Data Review URL where reviewers can access the data, as shown in the second screenshot. While Dryad has offered secure, anonymous reviewer access for some time, the difference now is that PLOS authors using Dryad will be able to enter the Data Review URL  at the time of initial manuscript submission.
  3. In addition to these visible changes, we have also introduced an Application Programming Interface (API) to facilitate behind-the-scenes metadata exchange between the journal and the repository, making the process more reliable and scalable. This was critical for PLOS ONE, which published 31,500 articles in 2013.  Use of this API is now available as an integration option to all journals as an alternative to the existing email-based process, which we will continue to support.

PLOS Data Availability Statement interface

PLOS Data Review URL interface

The manuscript submission interface for PLOS now includes fields for a Data Availability Statement and a Data Review URL.

If you are planning to submit a manuscript but are unsure about the Dryad integration options or process for your journal, just consult this page. For all PLOS journals, the data are released by Dryad upon publication of the article.  Should the manuscript be rejected, the data files return to the author’s private workspace and the provisional DOI is not registered.  Authors are responsible for paying Data Publication Charges only if and when their manuscript is accepted.

Jennifer Lin from PLOS and Carly Strasser from the California Digital Library recently offered a set of community recommendations for ways that publishers could promote better access to research data:

  • Establish and enforce a mandatory data availability policy.
  • Contribute to establishing community standards for data management and sharing.
  • Contribute to establishing community standards for data preservation in trusted repositories.
  • Provide formal channels to share data.
  • Work with repositories to streamline data submission.
  • Require appropriate citation to all data associated with a publication—both produced and used.
  • Develop and report indicators that will support data as a first-class scholarly output.
  • Incentivize data sharing by promoting the value of data sharing.

Today’s expanded and enhanced integration with Dryad, which inaugurates the new Data Repository Integration Partner Program at PLOS, is an excellent illustration of how to put these recommendations into action.