The Eval Stack Top AI Teams Are Building Right Now

Thu, Jul 16, 04:00 PM - 04:30 PM UTC

Speakers

Martin FischHead of Machine LearningEncord

Wei-Yin KoMember of Technical StaffAdaption

Jesse WillmanEngineering Program Management, MLCohere

Live Panel with in collaboration with AI Circle & Cohere

As models get more capable, automated evals stop telling you much. The signal that's left, whether the model is actually improving, comes from structured human judgment at scale.

Most teams don't have the infrastructure to produce it; this session is about how those that do have built it.

What we'll cover:

Where automated evals fall short and what that tells us about what human feedback still needs to do
What separates a rigorous human eval pipeline from ad-hoc annotation
The failure modes teams keep hitting when they try to scale human feedback, and how to avoid them
Where this is all heading as models get more capable

Get the data right.

300+ of the best AI teams in the world use Encord.

Take a tour Book a demo

The Eval Stack Top AI Teams Are Building Right Now

Speakers

Live Panel with in collaboration with AI Circle & Cohere

Get the data right.

Subscribe to our newsletter

Platform

Solutions

Resources