3D Bounding Box annotation
Encord Computer Vision Glossary
What Is a 3D Bounding Box?
A 3D bounding box (also called a cuboid) is a rectangular prism placed around an object in 3D space. It's defined by the object's center position (x, y, z), its dimensions (length, width, height), and its orientation (yaw, pitch, roll). Together, these six degrees of freedom describe not just where the object is, but how it's oriented, which direction a car is facing, for instance, or whether a pedestrian is walking toward the vehicle or away from it.
In AV datasets, cuboids are primarily annotated on LiDAR point clouds, where the 3D structure of the scene is directly observable. They're then projected back into camera images for cross-sensor consistency.
Why 3D Orientation Matters
A 2D bounding box around a car tells you it exists. A 3D cuboid with heading angle tells you which direction it's moving, critical information for predicting its future trajectory and planning a safe path around it. The difference between a parked car and one about to pull out is almost entirely captured in orientation and velocity, not just position.
This is why cuboid annotation is significantly harder than 2D box annotation. Getting the heading angle right, especially for objects that are partially occluded or at long range, requires annotators to reason carefully about 3D geometry from sparse point cloud data.
Challenges with Cuboid Annotation
Point clouds are sparse compared to images, a distant pedestrian might be represented by only a handful of points. Annotating a precise cuboid around a sparse cluster requires understanding the object's likely shape and orientation from limited evidence. Occlusion compounds this: two cars partially overlapping in a point cloud need to be annotated as separate objects with individual cuboids.
Consistency across frames is another challenge. Objects need to be tracked with consistent cuboid sizes and orientations as they move through the scene, drifting annotation quality across a sequence introduces noise that models learn from.
Encord for Cuboid Annotation
Encord's 3D annotation workspace supports cuboid labeling on LiDAR point clouds with tools designed for the specific challenges of sparse, high-density 3D data — including multi-view camera projection for cross-sensor verification, object tracking across frames, and automated pre-labeling to seed cuboid positions. Quality review workflows flag inconsistent annotations across sequences before they reach training.
→ Explore Encord for Physical AI
→ Explore Annotation & Labeling
Related Terms
See also: LiDAR · Point Cloud · Sensor Fusion · Bird's Eye View (BEV) · Object Detection (AV context) · Autonomous Vehicle (AV)
Related Resources
Informational Guides:
Technical Documentations:
Webinars and video content:
Frequently Asked Questions
Q1: What's the difference between a 2D bounding box and a 3D bounding box?
A 2D bounding box localises an object in image space; it gives you pixel coordinates. A 3D bounding box localises an object in world space; it gives you real-world position, dimensions, and orientation. The 3D version requires understanding the geometry of the scene, not just the image.
Q2: Why are 3D bounding boxes primarily used on LiDAR data?
Because LiDAR directly measures 3D geometry, each point has an (x, y, z) coordinate in real-world space. Camera images are 2D projections; recovering 3D structure from them requires additional depth estimation. LiDAR makes 3D annotation directly tractable.
Q3: How are cuboids used in model training?
Cuboids provide the ground truth that 3D object detection models train against. The model learns to predict cuboid parameters, position, size, and orientation for each detected object. Downstream tasks like trajectory prediction and motion planning then use these cuboid detections as input.