Autonomous Vehicle (AV)
Encord Computer Vision Glossary
What Is an Autonomous Vehicle?
An autonomous vehicle (AV) is a ground vehicle capable of navigating from point A to point B without human input. It does this by fusing data from multiple sensors, cameras, LiDAR, radar, and ultrasonic sensors into a real-time model of the world, then planning and executing a path through it.
The SAE autonomy scale runs from Level 0 (no automation) to Level 5 (fully driverless in all conditions). Most production systems today operate at Level 2–3, with limited self-driving capability in defined environments. True Level 4 and 5 systems, fully autonomous in open-world conditions, remain an active engineering and data challenge.
How AV Perception Works
AV perception is a multi-stage pipeline. Raw sensor data comes in from cameras, LiDAR, and radar simultaneously. Sensor fusion combines these streams into a unified environmental model. Object detection and tracking identify vehicles, pedestrians, cyclists, and obstacles. Prediction models anticipate how detected objects will move. Planning and control, then compute a trajectory and issue commands to the vehicle.
Each stage depends on models trained on annotated data, and each stage compounds errors from the previous one. A mislabeled pedestrian in training data doesn't just affect detection; it affects prediction, planning, and ultimately whether the vehicle makes a safe decision.
What Makes AV Data Unique
AV datasets are among the largest and most complex in physical AI. A single vehicle generates gigabytes of multimodal data per hour of driving. That data needs to be annotated across multiple sensor modalities, camera images, LiDAR point clouds, radar returns, with consistent labels across all of them.
The edge cases are what make it hard. A system that performs well in clear daylight on a well-marked road can fail on a rainy night, at an unusual intersection, or when a pedestrian behaves unexpectedly. Building robustness means systematically finding and labeling those edge cases — not just collecting more of the same data.
Encord for AV Data Pipelines
Encord supports the full AV annotation stack — camera image labeling, 3D LiDAR point cloud annotation, radar overlays, and sensor fusion views in a single workspace. Teams use it to annotate at scale, run active learning to surface underrepresented edge cases, and maintain label consistency across large distributed annotation teams. The data flywheel routes low-confidence model predictions back to annotation queues automatically, so datasets improve continuously with each deployment cycle.
→ Explore Encord for Physical AI
→ Explore Annotation & Labeling
Related Terms
See also: ADAS · Sensor Fusion · LiDAR · 3D Bounding Box / Cuboid · Bird's Eye View (BEV) · Object Detection · Point Cloud
Related Resources
Informational Guides:
Technical Documentations:
Webinars and video content:
Frequently Asked Questions:
Q1: What's the difference between an AV and a car with driver assistance?
Driver assistance systems (ADAS) support a human driver, they warn, assist, or intervene in specific situations. An autonomous vehicle is designed to operate without a human driver at the wheel. The distinction is about who (or what) is ultimately responsible for the driving task.
Q2: Why is AV development taking longer than expected?
The long tail of edge cases. A system that handles 99% of scenarios reliably still fails on the 1%, and that 1% includes rare but high-consequence situations like unusual road conditions, unexpected pedestrian behavior, or sensor degradation. Collecting, labeling, and training on those scenarios is a fundamentally slow process.
Q3: How much data does it take to train an AV perception system?
Orders of magnitude more than most other ML applications. Production AV systems have been trained on hundreds of millions of miles of driving data, with annotation pipelines processing millions of labeled frames. Data quality and edge case coverage matter as much as volume.