The World’s Largest Multimodal Dataset | Episode 2

Pre-training 107 million AI data groups

Frederik, Head of Machine Learning at Encord, explains the first stage of building the world's largest multimodal AI dataset. In this episode, Frederik details how they used full automation by taking 6 million captions and matching each one to its 16 nearest neighbors (images, videos, audios, 3D point clouds) based on their embeddings to create 107 million multimodal pairs.

Speakers

Frederik Hvilshøj

ML Lead @ Encord

The World's Largest Multimodal AI Dataset

The open-source E-MM1 dataset has 100+ million groups of images, videos, text, audio and 3D point clouds, giving AI teams more training data for their AI models.