Trajectory Annotation
Encord Computer Vision Glossary
When a robot learns to move through the world, it doesn't learn from instructions; it learns from examples. Trajectory annotation is the process of turning those examples into structured training data: labeling the path a robot took, when it took each action, and why. It's one of the most labour-intensive parts of building a physical AI system, and one of the most consequential.
What Is Trajectory Annotation?
A trajectory is the sequence of actions a robot moves through to complete a task, from the moment it starts reaching for an object to the moment it places it down. Trajectory annotation means labeling that sequence: marking key waypoints, assigning action labels at each timestep, and capturing the intent behind each movement.
Without this structure, raw sensor recordings are just video and data streams. Annotation is what makes them usable for training.
What Trajectory Annotation Actually Captures?
Trajectory annotation typically involves several layers of information captured simultaneously:
- Waypoints: the key positions a robot passes through during a movement
- Action labels: what the robot is doing at each timestep (reaching, grasping, releasing, repositioning)
- Phase boundaries: where one distinct phase of a task ends and another begins
- Object interactions: contact points, grasp types, and object state changes
- Natural language captions: timestamped descriptions of actions, required for VLA training
The more precisely these are labeled, the more signal a model has to learn from. Poor annotations produce Poor results.
That precision is hard to achieve. Robot movements happen fast and across multiple camera feeds simultaneously, a single 30-second demonstration might require labeling across hundreds of frames, three or four camera angles, and a depth stream, all kept in sync. Annotators also need to understand what the robot is trying to do, not just what it's doing. Two trajectories that look nearly identical on video might represent very different strategies. Capturing that distinction is what separates training data that teaches generalisation from data that teaches imitation of surface patterns.
Trajectory Annotation and Model Quality
Models trained on trajectory data learn to predict the next action given what the robot currently sees. The quality of the trajectory labels entirely bounds the quality of that prediction. Mislabeled phase boundaries cause the model to mistime transitions. Imprecise action captions produce imprecise outputs. Missing contact annotations leave gaps that the model has to guess at.
This is why trajectory annotation isn't just a data preprocessing step; it's a core part of model development. Teams that invest in annotation quality ship better policies, faster.
Encord for Trajectory Annotation
Encord supports trajectory annotation natively across multi-view video, depth frames, and sensor fusion inputs in a single workspace. Annotators can label action sequences with frame-level precision, add timestamped captions, mark phase boundaries, and track objects across hundreds of frames without switching tools. Automated pre-labelling and interpolation reduce the manual workload on repetitive motions, while quality review workflows keep consistency high across large annotation teams.
→ Explore Encord for Physical AI
→ Explore Annotation & Labeling
Related Terms
See also: Phase Annotation · Action Segmentation · Behaviour Cloning · Teleoperation · Vision-Language-Action Model (VLA) · Imitation Learning
Related Resources
Informational Guides:
- Accelerating Robotics VLA Segmentation with SAM 3
- Gemini Robotics — Advancing Physical AI with VLA Models
Technical Documentations:
Webinars and video content: