GET READY FOR ICLR
What do you need to know about ICLR?
Now in its 14th year, ICLR has evolved from a niche gathering of neural network pioneers into the world’s most influential forum for representation learning.
• Dates: April 24 - April 28, 2026
• Location: Rio de Janeiro, Brazil (Riocentro Convention Center)
ICLR is unique for its OpenReview process, providing total transparency into the peer-review "fight." For engineers, it is the best place to find high-signal research before it hits the mainstream.

Data is the key to success at ICLR
Early review data shows average scores have dipped significantly, as reviewers struggle to find the signal amidst a sea of incremental improvements. Early review data shows average scores have dipped to 4.2, signaling that simply "scaling up" is no longer enough to earn a spotlight at ICLR.
The conversation at International Conference on Learning Representations 2026 looks to be dominated by the governance and application of these models. The goal is no longer just "how to build a model," but how to make it useful, safe, and robust.
At the heart of this challenge is a return to basics: the recognition that data is the key to steerable intelligence. Whether it is aligning a model to human values through RLHF or training an agent to navigate complex environments, the bottleneck is now the curation, annotation, and labeling of the high-fidelity training distributions that define modern AI.
3 mega-trends reshaping ICLR 2026
1)
Agentic AI as middleware
The most pervasive keyword in this year's submission pool is "Agent." We have officially moved past the era of the chatbot as research focuses on Autonomous Agents that can plan, use tools, and reason through multi-step tasks.
2)
Aligning to multimodal data to the physical world
While text may seem solved to many, the new frontier at ICLR is grounding that intelligence in the physical world. The buzz ahead of ICLR 2026 centers on unified "omni-modal" models like SAM 3 (Segment Anything Model) and Mamba-3. These architectures aim to align vision, audio, and video into a single, cohesive latent space, allowing models to perceive time and space as humans do.
3)
Data-Centric Governance and SLMs
Perhaps the most surprising shift is the rise of Small Language Models (SLMs) and the corresponding explosion in datasets and benchmarks as a primary research category.
In conversation with a ML data expert: James Clough, VP of Engineering at Encord
To get a better sense of how teams attending ICLR 2026 are navigating these hurdles on the ground, we sat down with James Clough, Encord’s VP of Engineering, to discuss the growing prevalence of data in the research papers at ICLR.
1. At ICLR 2026, the focus has shifted from passive chatbots to agentic AI. From a systems perspective, why is annotating active trajectories and reasoning chains so much more difficult than standard supervised learning, and how does this change the way we build data engines for ICLR 2026-level agents?
It's fair to say that a couple of years ago people were mainly thinking about passive chat bots, meaning things like ChatGPT. You ask a question, you get an answer. You ask another question, you get another answer. Nowadays, there's a lot more focus on agents that can be given a task and go and work for a longer period of time without needing to just be in a constant conversation. Claude Code is a very good example of where you can give AI a task and it might go and work away for 10 minutes or 20 minutes on its own without needing you to give any further input. It's agentic because it's just going and doing stuff on its own.
I think the reason that makes the data you need to create those systems more complicated to acquire is that, when you're getting human feedback in a chat and that is your dataset, it's relatively simple to acquire because you have human intervention very frequently. You ask a question, you get an answer ,and then you can give that answer a thumbs up or a thumbs down, and then you say something else. You've always got a human kind of steering the conversation and keeping it on the topic the human wants the conversation to be on.
Whereas if I ask Claude Code to go and do something and it goes off for half an hour to work on it and it'll have lots of ideas and it'll make a plan and then it will start implementing something and that won't work and it'll do something else. It could be doing loads of stuff in that half an hour and it’s very possible that I don't like the result at the end. Well, how do we know where it went wrong? Did it go wrong in the first minute? Did it do everything right and go wrong at minute 29? Was the plan bad? Was the plan good and the execution was bad? There are so many different things that could have gone wrong to annotate that, you have this complex set of thoughts and actions and inner pieces of reasoning from the model that a person has to annotate. That's a lot more complicated to annotate than a chat with a series of responses because there's a lot more data there and there's less human intervention in the middle.
The second reason why it's more complicated is that, with an agentic system, you don't have a person looking at all of the output of the model. In ChatGPT, you read everything ChatGPT says in the conversation and course-correct it when needed. But with agents, they think a lot on the inside before you see what it's done. And, as a human, you're not looking at that and giving it the natural annotation of responding to its errors.
2. Papers on unified omni-modal models like SAM 3 and Mamba-3. are dominating the ICLR 2026 pool. What are the fundamental curation challenges in aligning vision, audio, and video into a single latent space? Why is high-fidelity curation the only way to ground these models in the physical world?
Why is it hard to train these models to align different modalities into a single latent space? Well, the reason is that you have to have data that is also aligned across different modalities. That's difficult to acquire because, typically, the data that people use to train these models is scarce. It's not like the machine learning researchers are going out there with video cameras taking pictures of things in the outside world, they're scraping this data from the internet, from videos and pictures and audio clips that already exist. But those pieces of data on the internet aren’t typically multimodal. You don't have an audio clip associated with every photograph on the internet. You might have some text associated with it, but most of them won't have an audio clip associated with it. They won't have a 3D point cloud.
There's a lot more work that has to go into creating a multimodal data set where there are links between data of different modalities because it doesn't come for free. Whereas, text annotations for lots of datapoints do come for free because lots of videos and videos have a description in text already.
A lot of that work you just have to do from scratch and that takes a long time. And then there are also curation challenges in the sense that there's a lot of judgment that has to go into whether an audio clip corresponds to an image, for example. What does that actually mean? There are lots of different ways that you could do that alignment which will lead to different results in your model.
So, there's more judgment that has to be taken in the curation and the annotation steps. But, you need to do that well to actually make the different modalities of the model aligned into that latent space properly. And that's important if you want to do things like have someone drop in an image and then get all of the audio clips which are most similar to that image, which is a really useful bit of functionality.
That’s actually one of the reasons Encord created our E-MM1 data set. We released this data set at the end of 2025 and it is the world's largest open source multimodal data set of this kind - it’s designed to make it easier to train models on multimodal data by removing the challenge of acquiring the data in the first place.
Check out Encord's open-source dataset of images, video, text, audio, and point cloud embeddings for AI teams to use – more than 10x the size of previous multimodal datasets - here.
3. We’re seeing a major theme at ICLR 2026 around Small Language Models (SLMs) and their ability to rival giants when trained on 'textbook-quality' data. Why has data governance become the priority, and what's the strategy for ensuring training data actually teaches the model something new instead of just recycling AI-generated noise?
The traditional way that large language models trained was with very large amounts of data scraped from the internet, which was typically not textbook quality data. It was things like Reddit comments which are of a varied quality, but most of which are not written by world leading professors who are experts in their field.
They’re written by random people on the internet. It turned out that if you train your model on enough data then this all averages out and the outputs are okay. But, if you only had a small data set, then you might be worried that the problem of untrustworthy data remains.
So, if you want to train a smaller language model which can be useful for cost reasons or for doing inference on small devices on edge, for example, then one good way of doing that is to make sure that the data that you are using is all extremely high quality rather than random Reddit comments. In that sense, having a very solid understanding of what's in your small textbook quality data set is important because you're not just going to be able to hope that everything averages out. Additionally, any bias, or over representation and under representation, of certain items in your data set will be amplified by the fact that there's not much data in there. One problem in a thousand items is a much bigger problem than one problem in a billion items, or even two problems in a billion items.The smaller the overall data set, the more potential risk you have from bad data poisoning the model.
I think the other reason is that data governance is becoming more and more important for the kinds of companies that train small language models. Often you train small language models for cost reasons. For example, Anthropic paid out in a legal settlement back in September 2025 to a group of authors for copyright infringement. They’d used half a million books to train their model without their permission. Anthropic can afford to do that but, if you're a small startup training small language models because they're cheaper, you can't afford to pay over a billion dollars. Often, the kinds of people who train small language models are the kinds of people for whom governance is more important for cost reasons.
4. The conversation at ICLR is no longer just "how to build a model that works" and scale it, but how to make it useful, safe, and robust. What do you predict the biggest (data-focussed) challenges will be for the AI community?
It's only a few years ago that this sort of scaling hypothesis became a very key part of everybody's thinking here. This is the idea that you don't need to continually make fancier model architectures and have new tricks for training your models. You can just use bigger models and more data and train them for longer and then performance gets better in a very predictable way. The predictability of these laws became very clear around five or six years ago and this thinking has led to GPT-3 and models since then which are larger and larger, bigger and bigger. People are spending crazy amounts of money training models now, or crazy compared to what they would have spent five years ago. Five years ago, the idea of spending thousands of dollars on training a single model might have seemed a bit expensive and now people will spend billions of dollars doing just that. So, the scientific fact that scaling improves performance is now pretty well established and there's less scope to write a paper at a conference that sort of demonstrates that fact. Everybody already knows it.
But, the kind of scale that is necessary to reach state-of-the-art performance via scaling is now extremely large. There are only a handful of companies in the world that can do that. So, if you want to, from a scientific perspective, write an interesting paper or give an interesting talk then just having more scale is not a very feasible way of doing that. Now you need to have something more interesting.
One of the most interesting things is how do we use higher quality data, or higher quality human feedback, or new kinds of data (like trajectories or reasoning chains I discussed earlier) to demonstrate both better model performance and show you can have more control over the kind of model that you've created at the end?
Making models capable by scaling is something we already know how to do, but how to make them safe, how to make them reliable, how to make them robust to adversarial conditions is a new avenue where researchers can show a lot of impact and can do a lot of cool things.
This is also where data can become very important as well. It's data where you have to actually actively do stuff to generate the data sets. So, again, it's no longer the case that you can just mine the internet and not care about what's on there. If you want to build and curate specific data sets to make your model safer, you need to really think about what that means and to make sure you're building the right data set to achieve that. I think if there are lots of researchers now who will be, whether working in industry or in academia, thinking about the kind of work they want to do in that regard. I think a lot of them will not be thinking about just scaling things up, but they'll be thinking about this other stuff as well.
What does Encord predict the future of AI is at ICLR?
The research emerging from ICLR 2026 makes one thing clear: the competitive advantage in AI has shifted from architectural novelty to data orchestration. As foundation models become increasingly commoditized, the ability to build a high-fidelity, recursive data loop is the only way to move from a research paper to a production-grade system.
At Encord, our philosophy is built on the reality that for frontier AI (like agentic systems, multimodal foundation models, and SLMs) data cannot be a passive asset. It must be an active, governed engine.
The International Conference on Learning Representations is more than just a gathering of minds in Rio, it is a preview of an AI industry that is maturing beyond the "scaling laws" hype.
The shift toward agentic AI, multimodal alignment, and data-centric governance signals a new era where the ML Engineer is the orchestrator of data, not just the builder of models.
To succeed in this landscape, you need more than a cloud bucket: you need a Data Engine. By applying the principles of active intelligence, teams can ensure that their research doesn't just result in a high-scoring paper, but in a robust, steerable system that performs in the wild.
Preparing for ICLR 2026? The most valuable research begins with the best data. Explore the Encord Data Engine and see how we help AI teams curate, annotate, and govern the datasets that are defining the future of representation learning.
The Fourteenth International Conference on Learning Representations (ICLR 2026) will be held from April 24 to April 28, 2026, in Rio de Janeiro, Brazil. The primary venue is the Riocentro Convention and Event Center. For those unable to travel, the conference maintains a robust virtual attendance option through its official portal.
Initial statistics for ICLR 2026 show a significant drop in average scores, falling from 5.12 in 2025 to 4.21 in 2026. This is largely attributed to the record-breaking 19,797 submissions, which have strained the peer-review system. Reviewers have reported a high volume of "low-signal" papers and a surge in AI-generated content, leading to more conservative scoring across the board.
In late 2025, a security flaw in the OpenReview platform briefly exposed the identities of authors and reviewers for over 10,000 submissions. The ICLR Program Chairs responded by freezing discussions, reassigning Area Chairs (ACs), and implementing strict disciplinary actions for anyone attempting to use the leaked data for collusion or harassment. This incident has sparked a massive debate about the future of double-blind peer review in the age of AI.
Yes. SAM 3 is one of the most cited and discussed submissions this year. Unlike its predecessors, SAM 3 moves beyond simple segmentation toward "Conceptual Comprehension," integrating reasoning capabilities that allow it to segment objects based on complex, multimodal prompts. It represents a significant leap in Multimodal Alignment, a core theme of this year's conference.
Agentic AI has overtaken LLMs as the dominant research category. The shift is from "Passive Representations" to "Active Trajectories." Researchers are focusing on how agents use tool-calling, long-term memory, and reasoning chains to solve multi-step problems. For engineers, this has shifted the focus from static dataset curation to the collection and annotation of complex agentic workflows and trajectory-based data.