Our latest featured data package is from Alexandra Swanson and colleagues at the Snapshot Serengeti project, and accompanies their peer-reviewed article in Scientific Data. It provides a unique resource for studying one of the world’s most extraordinary mammal assemblages and also for studies of computer vision and machine learning. In addition, data from Snapshot Serengeti is already being used in biology and computer science classrooms to enable students to work on solving real problems with authentic research data.
The raw data (which are being made available from the University of Minnesota Supercomputing Institute) consist of 1.2 million sets of images collected between February 2011 and May 2013 from 225 heat and motion triggered cameras, operating day and night, distributed over 1,135 sq. km. in Serengeti National Park in Tanzania. This staggering trove of images was classified by 28,040 registered and ~40,000 unregistered volunteers on Snapshot Serengeti (a Zooniverse project) according to the species present (if any), the number of individuals, the presence of young, and what behaviors were being displayed, such as standing, resting, moving, eating, or interacting.
Remarkably, this vast army of citizen scientists was classifying the images faster than they were being produced, and each image set was classified on average by nine different volunteers. This led to consensus classifications with high accuracy, 96.6% for species identifications relative to an expert-classified gold set. Of the more than 300,000 image sets that contain animals, 48 different species were seen, including rare mammals such as the aardwolf and the zorilla.
The Dryad data package includes the classifications from all the individual volunteers, the consensus classifications, information about when each camera was operational, and the expert classification of 4,149 image sets as a gold standard.
- Swanson et al. (2015) Snapshot Serengeti, high frequency annotated camera trap images of 40 mammalian species in an African savannah. Scientific Data. http://dx.doi.org/10.1038/sdata.2015.26
- Swanson et al. (2015) Data from: Snapshot Serengeti, high frequency annotated camera trap images of 40 mammalian species in an African savannah. Dryad Digital Repository http://doi.org/10.5061/dryad.5pt92