The Best Free Datasets for Human Pose Estimation
Human Pose Estimation (HPE) is a powerful way to use computer vision models to track, annotate, and estimate movement patterns for humans, animals, and vehicles.
It takes enormous computational powers and sophisticated algorithms, such as trained Computer vision models, plus highly accurate and detailed annotations and labels, for a computer to understand and track what the human eye and mind can do in a fraction of a second.
And yet, for us to accurately track, extrapolate, and interpret the movement of people, animals, or vehicles, we need computer vision models trained to measure and understand human pose estimation.
An essential part of training models and making them production-ready is getting the right datasets. One way is to use open-source human pose estimation and detection image and video datasets for machine learning (ML) and computer vision (CV) models.
This article covers the importance of open-source datasets for human pose estimation and detection and gives you more information on 8 of the top free, open-source pose estimation datasets.
The Importance of Open-Source Pose Detection Data in Machine Learning
Human pose estimation (HPE) and detection are challenging for machine learning and computer vision models. Humans and animals move dynamically.
Other factors and objects in images and videos can add layers of complexity to this movement, such as clothing, fabric, lighting, arbitrary occlusions, the viewing angle, background, and whether there’s more than one person or animal being tracked in a video.
Computer vision models that use human pose estimation datasets are most commonly applied in healthcare, sports, retail, security, intelligence, and military settings. In any scenario, it’s crucial to train an algorithmically-generated model on the broadest possible range of data, accounting for every possible use and edge case required to achieve the project's goals.
One way to do this, especially when timescales or budgets are low, is to use a free, open-source image or video-based datasets. Open-source datasets are almost always as close to training and production-ready as possible, with annotations and labels already applied.
Plus, if you need to source more data, you could use several open-source datasets or blend these images and videos with proprietary ones. As with any computer vision model, the quality, accuracy, and diversity of images and videos within a dataset can significantly impact a project's outcomes.
To find out more, read our Complete Guide to Pose Estimation for Computer Vision.
Let's dive into the top 8 free, open-source human pose estimation datasets.
Top 8 Free Pose Estimation Datasets
MPII Human Pose Dataset
The MPII Human Pose dataset is “a state of the art benchmark for evaluation of articulated human pose estimation.” It includes over 25,000 images of 40,000, with annotations covering 410 human movements and activities.
Images were extracted from YouTube videos, and the dataset includes unannotated frames preceding and following the movement within the annotated frames. The test set data includes richer annotation data, body movement occlusions, and 3D head and torso orientations.
A team of data scientists at the Max Planck Institute for Informatics in Germany and Stanford University in America created this dataset.
3DPW - 3D Poses in the Wild
The 3D Poses in the Wild dataset (3DPW) used IMUs and video footage from a moving phone camera to accurately capture human movements in public.
This dataset includes 60 video sequences, 3D and 2D poses, 3D body scans, and people models that are all re-poseable and re-shapeable.
3DPW was also created at the Max Planck Institute for Informatics, with other data scientists supporting it from the Max Planck Institute for Intelligent Systems, and TNT, Leibniz University of Hannover.
LSPe - Leeds Sports Pose Extended
The Leeds Sports Pose extended dataset (LSPe) contains 10,000 sports-related images from Flickr, with each image containing up to 14 joint location annotations.
Accuracy is not guaranteed as this was done using Amazon Mechanical Turk as a Ph.D. project in 2016, and AI-based (artificial intelligence) annotation tools are more advanced. However, it’s a useful potential starting point for anyone training in a sports movement-based computer vision model.
DensePose-COCO is a “Dense Human Pose Estimation In The Wild” dataset containing 50,000 manually annotated COCO-based images (Common Objects in Context).
Annotations in this dataset align dense “correspondences from 2D images to surface-based representations of the human body” using the SMPL model and SURREAL textures for annotations and manual labeling.
DensePose-COCO is a part of the COCO and Mapillary Joint Recognition Challenge Workshop at ICCV 2019.
AMASS: Archive of Motion Capture as Surface Shapes
AMASS is “a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization.”
It includes 3D and 4D body scans, 40 hours of motion data, 300 human subjects, and over 11,000 human motions and movements, all carefully annotated.
AMASS is another open-source dataset that started as an ICCV 2019 challenge, created by data scientists from the Max Planck Institute for Intelligent Systems.
VGG Human Pose Estimation Datasets
The VGG Human Pose Estimation Datasets include a range of annotated videos and frames from YouTube, the BBC, and ChatLearn.
It includes over 70 videos containing millions of frames, with manually annotated labels covering an extensive range of human upper-body poses.
This vast HPE dataset was created by a team of researchers and data scientists, with grants provided by EPSRC EP/I012001/1 and EP/I01229X/1.
This dataset contains 6 million video frames of Synthetic hUmans foR REAL tasks (SURREAL).
SURREAL is a “large-scale person dataset to generate depth, body parts, optical flow, 2D/3D pose, surface normals ground truth for RGB video input.” It now contains optical flow data.
CMU Panoptic Studio Dataset
The Panoptic Studio dataset is a “Massively Multiview System for Social Motion Capture.”
It’s a dataset designed to capture and accurately annotate human social interactions. Annotations and labels were applied to over 480 synchronized video streams, capturing and labeling the 3D structure of movement of people engaged in social interactions.
This dataset was created by researchers at Carnegie Mellon University following an ICCV 2015, with funding from the National Science Foundation.
Rapidly Develop and Deploy AI for Computer Vision
One of the most cost and time-effective ways to develop and deploy human pose estimation models for computer vision is to use an AI-assisted active learning platform, such as Encord.
At Encord, our active learning platform for computer vision is used by many sectors - including healthcare, manufacturing, utilities, and smart cities - to annotate human pose estimation videos and accelerate their computer vision model development.
Encord is a comprehensive AI-assisted platform for collaboratively annotating data, orchestrating active learning pipelines, fixing dataset errors, and diagnosing model errors & biases. Try it for free today.
March 20, 2023
20 min read
March 15, 2023
5 min read
March 14, 2023
5 min read