Contents
Exploring E-MM1 on the Encord Platform
Built for Research and Real-World Systems
Get Started with E-MM1
Encord Blog
How to Explore E-MM1: The World’s Largest Multimodal AI Dataset
5 min read

In October, Encord released E-MM1, the world’s largest open-source multimodal AI dataset. Built for AI teams working at the frontier of multimodal AI, E-MM1 brings together 107+ million images, video, audio, text, and 3D point clouds into a single, unified dataset.
As models increasingly move beyond single-modality inputs, the need for large, well-aligned multimodal data has become one of the biggest bottlenecks in AI development. E-MM1 was created to address that challenge head-on.
In the demo video below, Felix, ML Scientist at Encord, walks through how to explore and curate the dataset directly on the Encord platform. Below, we’ll unpack what makes E-MM1 different, how it was designed, and how to explore it within the Encord platform.
Exploring E-MM1 on the Encord Platform
The demo video shows how E-MM1 can be explored using Encord’s data curation tooling, making it easier to navigate a dataset of this scale without losing context.
When opening a single data group, you can immediately inspect how each modality corresponds to the same idea. An image, a video clip, a short audio segment, a textual description, and a 3D point cloud are presented together, giving a complete multimodal view of the object or scene.
This side-by-side inspection is particularly useful when validating data quality or developing intuition for how different modalities reinforce each other.
Finding Similar Concepts with Multimodal Similarity Search
Beyond manual exploration, the platform enables multimodal similarity search across the entire dataset. Starting from a single data group, you can retrieve others that are semantically similar.
This allows teams to surface related examples captured under different conditions or in different formats, which is critical for tasks such as retrieval benchmarking, dataset expansion, or identifying edge cases.
Rather than searching within one modality at a time, similarity is evaluated across the multimodal representation of each data group.
Refining Data with Cross-Modality Metadata
E-MM1 also includes metadata that captures relationships between modalities. In the demo, Felix shows how this can be used to filter data groups based on how strongly different modalities correspond, for example, how closely an image aligns with its associated 3D point cloud.
By filtering on these scores, teams can focus on subsets of the dataset that meet specific quality or alignment thresholds. Curated selections can then be saved into collections, making it easier to iterate on experiments or reuse datasets across projects.
This kind of fine-grained control is essential when working with multimodal systems at scale.
Understanding the Dataset with UMAP Visualizations
To help make sense of a dataset this large, E-MM1 can be explored through a UMAP-based dimensionality reduction plot. In this 2D visualization, distance between points reflects semantic similarity across modalities.
As shown in the video, natural clusters emerge across the dataset. One region may group together vehicles such as cars and tractors, while another captures animals and natural environments. These visualizations provide an intuitive way to understand how concepts are distributed and related, without needing to inspect individual samples one by one.
Built for Research and Real-World Systems
E-MM1 was built to support both academic research and production-scale AI development. Whether you’re training multimodal foundation models, evaluating embedding alignment, or curating high-quality datasets for downstream applications, E-MM1 provides the scale, structure, and openness needed to move faster.
By making the dataset open-source, we aim to lower the barrier to experimentation and enable the community to push multimodal AI forward together.
Get Started with E-MM1
You can explore the dataset directly on the Encord platform, dive into the technical release for full details on dataset construction and retrieval models, or watch the full video series on how E-MM1 was built.
If you’re interested in real-world applications, watch the live session with Frederik Hvilshøj, Machine Learning Lead at Encord, for a deeper discussion on how teams are using E-MM1 today.
👉 Explore E-MM1 and start building with the world’s largest multimodal AI dataset.
Explore the platform
Data infrastructure for multimodal AI
Explore product
Explore our products


