
The World’s Largest Multimodal Dataset | Episode 3
Post-training 1 million AI data annotations
Frederik, Head of Machine Learning at Encord, explains the second stage of building the world's largest multimodal AI dataset. In this episode, Fred details how they elevated data quality via hierarchical clustering, and used more than 6000 hours of human annotation to add 1 million labels - increasing data diversity and the quality of the overall dataset.
Speakers

Frederik Hvilshøj
ML Lead @ Encord
The World's Largest Multimodal AI Dataset
The open-source E-MM1 dataset has 100+ million groups of images, videos, text, audio and 3D point clouds, giving AI teams more training data for their AI models.