Back to Blogs

Vision-based Localization: A Guide to VBL Techniques for GPS-denied Environments

June 14, 2024
6 mins
blog image

One significant use case involves building intelligent navigation frameworks that use vision-based techniques to localize objects in GPS-denied environments. These Vision-Based Localization (VBL) methods are becoming increasingly critical with the surge in demand for Unmanned Aerial Vehicles (UAVs).

The global UAV market is projected to grow by 18.2% annually between 2024 and 2033. UAVs are gaining popularity for various applications, including surveillance, agriculture, mapping, and aerial photography. However, their dependence on satellite-based GPS poses challenges in areas with signal obstruction or interference.

VBL solutions using onboard cameras are emerging as a promising alternative, enabling robust localization and navigation of UAVs in complex environments.

In this article, we will discuss the role of VBL in modern UAV navigation systems, including their types, key techniques, applications, benefits, and open challenges. Understanding these aspects is crucial as vision-based methods become the cornerstone of autonomous UAV operations.

Curate Data for AI Models with Encord
medical banner

What is Vision-based Localization (VBL)?

Vision-based localization (VBL) is a navigation technique that uses cameras and computer vision algorithms to estimate a UAV's position and orientation based on visual data. Unlike GPS-based methods, VBL systems can operate in environments where satellite signals are unavailable or unreliable, such as indoors or in urban canyons.

VBL systems typically consist of one or more cameras mounted on the UAV, which capture images of the surrounding environment. 

Onboard computers process these images to extract features and landmarks, which they then compare to a pre-built map or database to determine the UAV's position and orientation in real time.

This real-time operation allows efficient data collection in dynamic environments, enabling optimal control and collision avoidance.

Types of Cameras Used in VBL

Several types of cameras are used in VBL systems, each with its strengths and weaknesses:

  • Monocular Cameras: They are suitable for small-scale applications due to their compact size and low weight but lack depth information.
  • Stereo Cameras: Provide depth maps without requiring infrared sensors, enabling 3D localization.
  • RGB-D Cameras: Can estimate depth maps and create visible images, offering both color and depth information.
  • Fisheye Cameras: Have a wide viewing angle, making them ideal for long-range applications and reducing the number of cameras needed.

Different camera types

Camera types

Sensor Fusion in VBL

In addition to cameras, VBL systems often incorporate data from other sensors like Inertial Measurement Units (IMUs) and LiDAR. This sensor fusion approach combines the strengths of different sensor modalities for more robust and accurate localization estimates.

VBL versus GPS

Another popular localization method involves using a Global Positioning System (GPS) to navigate UAVs. GPS-based methods offer advantages such as:

  • Cheaper equipment
  • Better coverage and accuracy in clear outdoor environments
  • Cost-effectiveness for large-scale deployments

However, GPS-based methods have several limitations:

  • Reliance on satellite signals makes them unsuitable for indoor environments
  • Poor performance in urban environments with signal obstruction from buildings, trees, and mountains
  • Vulnerability to signal interference and jamming

In contrast, vision-based systems are ideal for indoor and GPS-denied environments. They can capture high-resolution images through low-cost cameras, enabling precise localization in complex spaces. VBL systems are also robust to signal disruption, making them suitable for operation in cluttered and confined areas.

However, VBL systems face challenges such as varying lighting conditions, feature-poor environments, and high computational requirements. Despite these challenges, VBL remains a promising solution for UAV localization in GPS-denied scenarios where traditional methods fail.

VBL’s Role in a Navigation System

Localization is a crucial component of a complete navigation system, but it works with mapping and path planning to enable autonomous UAV operation.

The following list discusses the three steps in more detail to understand how vision-aided methods help build autonomous navigation systems:

  • Pose Estimation (Localization): Pose estimation involves computing the UAV's position and orientation using data from sensors such as GPS, inertial measurement units (IMUs), cameras, and lidars. Vision-based methods play a significant role in pose estimation, especially in GPS-denied environments, by analyzing visual features to determine the UAV's location relative to its surroundings.
  • Mapping: The visual system creates a map of the environment by processing image data. Depending on the specific technique used (e.g., SLAM), this map can be a sparse collection of landmarks or a dense 3D reconstruction. The map serves as a reference for localization and path planning.
  • Obstacle Detection and Avoidance: The visual system processes the image data cameras collect to detect and segment environmental obstacles. This information is used to construct and update a map of the surroundings, which is essential for safe path planning and collision avoidance.
  • Path Planning and Visual Servoing: Path planning algorithms use the map generated in the previous step to identify the optimal route to the target location while avoiding obstacles. Visual servoing techniques then use visual feedback to control the UAV's motion along the planned path, ensuring it stays on course and adapts to any environmental changes.

light-callout-cta Curious to know how computer vision works in robotics? Learn how computer vision helps control a robotic arm

Types of Vision-based Localization

Researchers categorize VBL techniques into relative visual localization (RVL) and absolute vision localization (AVL).

RVL uses a dynamic method, analyzing a previous frame to estimate an object's location in the current frame. In contrast, AVL uses matching techniques to search for the object’s current location using a static reference map.

The sections below discuss and compare the two methods to help you choose the most suitable approach for your use case.

Relative Visual Localization (RVL)

RVL approaches involve estimating an object’s location using a frame-by-frame method. The techniques analyze information present in previous frames to estimate the location in the current frame.

Popular RVL methods include visual odometry (VO) and visual simultaneous localization and mapping (VSLAM).

  • Visual Odometry (VO): VO compares the current frame with the previous one using optical flow analysis to estimate the difference between positions in the two frames. It adds the estimated difference to the previous pose to estimate the current location. Nister et al. were the ones who first proposed this method. It estimated the motion of ground vehicles using a 5-point algorithm to find the essential matrix and random sample consensus (RANSAC) for outlier rejection. This made it possible to estimate the motion of ground vehicles reliably.
  • Visual Simultaneous Localization and Mapping (SLAM): Visual SLAM involves localizing a vehicle while constructing a map of its surroundings. The method estimates landmark positions by analyzing multiple previous locations, allowing for dynamic adjustment of both the map and the vehicle's estimated pose.

Absolute Visual Localization (AVL)

AVL approaches involve reference-based localization, which uses precise geo-referenced data to localize a UAV. The reference data can be aerial satellite images or open-source cartographic systems such as Google Earth.

AVL performed by matching UAV with a reference satellite map

AVL performed by matching UAV with a reference satellite map

The methods use the images to construct offline maps and match current images with the information in these maps for localization. Matching techniques involve templates, feature points, deep learning, and visual odometry matching.

  • Template Matching: The technique uses the current UAV location as a template and matches it with a reference map using least squares to assess the similarity between the two and locate the UAV.
  • Feature-points Matching: The methodology involves feature point detection and feature extraction. Feature detection searches for critical environment-invariant features that do not change with different illumination, scale, and rotation scenarios. Feature extraction gets relevant feature vectors around the detected feature points using methods like gradient histograms from scale-invariant feature transform (SIFT).
  • Deep Learning: Deep learning techniques use convolutional neural networks (CNNs) to match current locations with reference maps. The method involves training a neural net on image data containing visual information of multiple locations. Once trained, the model can infer the location of a moving UAV in a particular area.
  • Visual Odometry: VO in AVL differs slightly from VO in RVL. In AVL, VO uses a reference database with multiple images to compare with the current image and localize a UAV.

The Issue of Drift in RVL

RVL methods involve estimating the current location using location estimates from previous frames. The process results in considerable error accumulation and causes drift.

Methods involving the combination of inertial data using inertial navigation systems (INS) and visual odometry help reduce drift through precise visual-inertial odometry (VIO).

For example, Mouriks and Roumeliotis introduced the Multi-State Constraint Kalman Filter (MSCKF), a VIO algorithm that uses an Extended Kalman Filter (EKF) to estimate visual features and filter states. The EKF recursively estimates the system state by combining predictions from a motion model with updates from visual measurements, helping to reduce drift.

Zhang et al. further enhanced the MSCKF by modeling the covariance as an inverse Wishart distribution with a harmonic mean, developing a variational Bayesian adaptive algorithm that improves the filter's robustness.

Despite these advancements, RVL algorithms remain vulnerable to error accumulation and may not be suitable for applications demanding strict error bounds, such as security and defense.

AVL as an Alternative

AVL techniques generally outperform RVL methods in terms of localization accuracy, as they rely on fixed reference maps to compute the current location. This approach mitigates drift in state estimation and provides a deterministic error range.

However, AVL faces challenges in storing and managing the extensive datasets required for offline map registration and dealing with image variations caused by seasonal and environmental changes.

Researchers have proposed methods to address these challenges. Techniques like hierarchical localization and map compression can be employed to reduce the search space and manage large datasets efficiently.

Extracting features invariant to seasonal and environmental shifts has proven effective regarding image variations. Fragoso et al. proposed using deep learning to extract invariant features, which are then used to train traditional localization algorithms.

Evaluation Pipeline for Feature-based Image Registration

Evaluation Pipeline for Feature-based Image Registration

Experimental results show the approach significantly removes errors and performs well in locations with seasonal variations.

Vision-Based Localization Techniques

VBL techniques usually fall into three categories: map-dependent, map-independent, and map-building methods. Moreover, as discussed earlier, obstacle avoidance and path planning are crucial components of a robust navigation system.

Vision-based UAV Navigation Methods and Systems

Vision-based UAV Navigation Methods and Systems

The sections below discuss the three categories of VBL techniques and briefly overview the methods involved in obstacle detection and path planning.

Map-independent Navigation

Map-independent navigation systems work without a reference map and include two sub-methods: optical flow and feature tracking.

Optical Flow

Optical flow approaches can be further divided into global and local techniques. Global techniques require the smooth movement of neighboring image pixels, meaning the pixels in a local neighborhood should move coherently. In contrast, local optical flow methods assume the image flow is constant for all pixels within a small window.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santos-Victor et al. first described the optical flow method in a 1993 paper. In this study, they observed a UAV mimicking a bee's flying pattern to estimate an object's position.


As discussed earlier, feature-tracking methods fall under AVL approaches. They involve matching invariant features in reference images with features in the current location to localize objects.

Cho et al. used feature-based tracking to navigate a robotic spacecraft. A probabilistic graphical model and two feature detectors with a feature-matching algorithm were used to find the speeds of six-degree-of-freedom (6-DOF) state gauges. These gauges show where and how the spacecraft is positioned in three-dimensional space.

Map-dependent Navigation

Map-dependent methods rely on spatial maps of the surrounding environment to navigate UAVs and direct their movements. Octree and occupancy grid maps are two ways to build 3D models with all the interconnections between entities in an environment.

Experts use 3D data as 2D maps for indoor environments such as halls, office rooms, and other plain environments. 3D occupancy models are helpful in complex and unstructured environments, using dynamic probability distributions to update height information.

For instance, a paper published by Dryanovski et al. in Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems used a 3D occupancy grid to model information regarding barriers and open spaces. They used the model to navigate autonomous micro-air vehicles (MAVs).

Map-building Navigation

While building a map offline is a viable solution to navigate UAVs, the method fails when the environment changes significantly due to natural disasters.

Solutions that build maps while navigating in real time are better in such scenarios. Additionally, the advancements of VSLAM algorithms to simultaneously recognize camera poses and study environmental structures make map-building methods more accessible and efficient.

Map-building solutions have four types: indirect, direct, hybrid, and multi-sensor map-building approaches.

Indirect Approach

Indirect approaches use feature detectors to extract rotation and perspective-invariant features for localization.

One example of an indirect approach is Han and Chen’s innovative multi-camera method. It estimates position and builds a map in parallel, using ego-motion estimates (i.e., the camera's motion relative to the environment) from multiple cameras. The research also introduced a parametric calibration technique to ensure the cameras’ field of view does not overlap.

Similarly, Davison solved the problem of real-time feature initialization with a single camera by using a top-down Bayesian framework to develop a sparse map in real time. The method allowed for robust localization when only a few features were available.

Direct Approach

Indirect techniques do not perform well in environments without any textures. Instead, direct approaches are more helpful as they can construct dense maps by identifying more interconnections between environmental elements than indirect approaches.

Additionally, direct methods use intensity information to optimize geometric parameters. The technique allows them to navigate UAVs in environments with different geometry and lighting conditions.

Hybrid Approach

As the name implies, the hybrid approach combines direct and indirect methods for localization. First, it initializes feature-based maps using indirect techniques. Second, it uses direct methods to refine the map for more optimal results.

For instance, Scaramuzza and Siegwart proposed a hybrid VO method that uses feature-based and appearance-based VO to achieve ground vehicle translation (position) and rotation (orientation).

Feature-based VO involves extracting unique features such as lines, edges, and curves from image frames. At the same time, appearance-based VO analyzes all geometric information in an image by analyzing the intensity of each image pixel.

Multi-sensor Approach

The multi-sensor approach involves using many sensors, such as cameras, Light Detection and Ranging (LiDAR), and GPS sensors, to improve localization.

A lidar map of Lynnhaven Inlet, Virginia

A lidar map of Lynnhaven Inlet, Virginia

The method is inspired by ground-based mobile robots using laser scanners to generate 3D point-cloud data. As UAVs decrease in size, experts can equip them with different sensors to generate more accurate data.

Siegwart et al. used EKF with multi-sensor fusion to build a navigation system that can work despite sensor input delays. The system also provides accurate altitude measurements for precise UAV control.

Obstacle Detection

Obstacle detection involves avoiding obstacles to prevent collision by changing course whenever an obstacle approaches. The approach uses two types of techniques: optical-flow and SLAM-based methods.


Optical-flow approaches use a single camera to process images and detect potential obstacles. Research by Lin and Peng uses the optical flow method to construct depth information from the captured images. Their approach reduces the payload of a quadcopter UAV and increases its flight endurance.

Bionic camera embedded in a bee.

Bionic camera

Many methods also draw inspiration from bionic insect vision, as insects can quickly recognize nearby objects based on light intensity. Bionic insect vision builds upon an insect's ability to generate a visual flow signal using the image motion on the retina to capture spatial information.

SLAM-based Methods

Optical-flow methods often compute distances with low precision. In contrast, SLAM-based methods can generate accurate metric maps, allowing for a better understanding of the environment for navigation.

Esrafilian and Taghirad proposed a monocular vision-based autonomous flight and collision avoidance system for a quadrotor UAV called oriented fast and rotated brief SLAM (ORB-SLAM). The method computes a sparse point-cloud map of the UAV's location in 3D. It generates a collision-free path using the potential field approach and the rapidly exploring random trees (RRT) algorithm.

The potential field approach represents obstacles as repulsive forces and the target as an attractive force, guiding the UAV toward the goal while avoiding obstacles. The RRT algorithm efficiently explores the environment and finds a feasible path by incrementally building a tree of possible trajectories.

Path Planning

The last component in a navigation system concerns path planning, which involves computing the optimal path from the destination point to the target location. The method relies on multiple measures, such as flight duration, cost of work, and route length.

We can categorize path planning methods as global and local.

Global Path Planning

Global path planning algorithms create a static global map and generate a path from the starting point to the target location. Two methods within global path planning involve heuristic and intelligence-based approaches.

  • Heuristic Approach: This approach often uses the A-star algorithm, which computes a cost function to determine the optimal path. Kim et al. proposed a hybrid hardware and software framework that selects a suitable path using a modified A-star algorithm.
  • Intelligence-based Approach: This method uses smart search algorithms to find an optimal path. Genetic and simulated annealing algorithms are two popular techniques for global path planning.

Local Path Planning

Local path planning techniques estimate a local path for collision avoidance. The method is suitable for dynamic environments where the elements change frequently, and computing a static global map is infeasible.

Local path-planning approaches update multiple environmental properties through sensors, such as unknown objects' shape, size, and height. Search methods include artificial potential fields, fuzzy logic, and neural networks.

For instance, Liu and Xu used a Hopfield network to build a neural network for route planning. Similarly, Theile et al. published a paper in Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), which used reinforcement learning (RL) to create a control policy for UAVs. 

The method allowed them to use a limited battery budget to navigate UAVs involving multiple start and landing locations.

Applications of Vision-based Localization

The advancements in VBL techniques have increased adoption across various industries, enabling autonomous navigation and enhancing user experiences. 

As the technology continues to evolve, the potential for new and innovative applications of VBL grows. The following subsections discuss some key domains that significantly benefit from VBL.

Autonomous Vehicles

VBL plays a crucial role in localizing autonomous vehicles by estimating their position and orientation in real time. Waymo, a subsidiary of Alphabet Inc., is a prime example of a VBL application in self-driving cars. 

Waymo's autonomous vehicles use a combination of LiDARs (Light Detection and Ranging sensors), GPS, and information from Google Maps to construct accurate environmental maps for localization

LiDARs use laser beams to measure distances and create 3D representations of their surroundings. Additionally, Waymo's vehicles employ recurrent neural networks to predict human behavior and avoid accidents on the road.

Augmented Reality

Augmented Reality (AR) applications heavily rely on VBL to provide users with a seamless and immersive experience by overlaying digital content in the real world. 

Google's Visual Positioning System (VPS) is an excellent example of how VBL enhances real-time navigation in AR. The app uses images the user's camera captures to determine their location and orientation, comparing features of buildings, bridges, and other elements with a pre-built VPS index. 

This enables the app to provide accurate directions and location-specific information. Other popular AR applications that use VBL include Pokemon Go, which overlays virtual creatures in real-world environments, and Snapchat filters, which track facial features to apply digital masks and effects.

Robotics Navigation in GPS-denied Environments

Navigating robots in environments where GPS signals cannot reach requires robust VBL systems for efficient localization and collision avoidance.

For instance, Amazon warehouses employ industrial and autonomous mobile robots to navigate around the warehouse floor using VBL techniques.

Vision-based Target Localization from Drones

Drones equipped with VBL capabilities enable users to accurately localize specific targets in remote or hard-to-reach locations, making them valuable tools for surveillance, security, and reconnaissance missions. 

Picture of RQ-11B Raven

RQ-11B Raven

The U.S. military, for instance, uses the RQ-11B Raven UAV for surveillance and target acquisition in conflict zones. The UAV's onboard cameras and image processing algorithms allow military forces to accurately identify and track desired targets.


Beyond military applications, VBL-enabled drones have significant potential in civilian domains, such as search and rescue operations, where they can help locate missing persons in challenging terrains, or wildlife conservation, where they can monitor and track animal populations in vast habitats.

VBL can also assist surgeons and other medical professionals by precisely localizing anatomical structures during image-guided procedures.

The leading training data platform for surgical video
brand logo brand logo brand logo brand logo brand logo brand logo brand logo brand logo

Advantages of Vision-Based Localization

The sections above demonstrate the significant role of VBL in modern technology. The following list summarizes the key benefits of VBL, highlighting its main contributions and advantages over traditional localization methods:

  • GPS-Independent: VBL systems can provide accurate localization even in environments with weak or no GPS signals, such as indoor spaces, rough terrains, and urban areas with tall buildings, bridges, and subways. This makes VBL a reliable alternative to GPS-based navigation in challenging environments.
  • Rich Data Source: VBL systems use multiple sensors and cameras to capture high-resolution images, providing a wealth of visual information. The rich features and context extracted from these images enable advanced computer vision techniques, such as object detection and semantic segmentation, which can greatly improve the navigation capabilities of autonomous vehicles.
  • Scalability: The low cost of cameras makes VBL systems highly scalable and accessible to a wide range of applications and users. This allows for the rapid deployment of VBL frameworks on small UAVs and mobile robots, enabling their use in various scenarios, from industrial inspections to agricultural monitoring.
  • Adaptability: VBL techniques are designed to handle dynamic environments, adapting to changes in lighting conditions, weather, and scene structure. This adaptability makes VBL systems more robust and reliable than other localization methods that may struggle in varying environmental conditions.

Challenges of Vision-Based Localization

Although VBL is a promising technology, users still face challenges when implementing a VBL navigation system to operate an autonomous vehicle.

The sections below highlight a few prominent issues and mitigation strategies to help you develop an effective VBL solution.

Lighting Variations

Changing lighting conditions due to seasonal variations, glare, shadow, and occlusions can significantly impact VBL performance. Sensors cannot accurately capture image pixels and features, leading to poor localization output.

Using sensors with high dynamic ranges and employing advanced algorithms with adaptive histogram equalization can help mitigate issues arising from unfavorable lighting conditions.

Dynamic Environments

VBL may perform poorly in environments with moving objects and people. Localization in these dynamic scenarios becomes challenging as the scene changes frequently, making estimating a UAV’s position difficult.

Optical flow approaches modeled on a bee’s visual flow work well in these situations as they can quickly detect moving objects. In addition, optical flow methods can help with real-time analysis in situations where static ground truth design information is missing.

Multi-sensor fusion techniques also help improve localization in dynamic scenes as multiple sensors capture rich information regarding static and moving entities.

Computational Cost

Efficient VBL systems must quickly process high-quality images with complex visual information. However, processing power becomes an issue with small drones and mobile robots that cannot carry large payloads.

A mitigation strategy can involve using GPUs to accelerate processing speed and accuracy. Edge computing techniques with multiple robots can also help, as many connected devices can distribute workloads for quick processing.

Vision-based Localization: Key Takeaways

VBL is still a developing field, with researchers trying to build powerful frameworks and algorithms for better navigational flexibility and robustness. Below are a few key points to remember regarding VBL.

  1. Vision-based Localization (VBL) vs Global Positioning Systems (GPS): Localization based on GPS only performs well in plain indoor or outdoor environments. VBL techniques are more efficient for localization in unstructured environments with rough terrain and multiple objects.
  2. Relative Vision Localization (RVL): RVL approaches use frame-by-frame analysis to localize objects. RVL approaches include visual odometry (VO) and visual simultaneous localization and mapping (V-SLAM) techniques. RVL’s most significant limitation is drift or error accumulation.
  3. Absolute Vision Localization (AVL): AVL methods use a fixed reference map to localize objects. They involve template and feature matching, deep learning, and VO approaches. AVL techniques do not produce drift. However, processing large reference maps and matching different images captured at different times with a static reference image is challenging.
  4. VBL Techniques: Besides RVL and AVL, researchers further categorize VBL techniques as map-independent, map-dependent, and map-building methods.
  5. VBL Challenges: Implementing VBL systems involves lighting variations, dynamic environments, and computational costs. However, high-range cameras, multi-sensor fusion, and GPUs can mitigate these issues.

Build Better Models, Faster with Encord's Leading Annotation Tool

sideBlogCtaBannerMobileBGencord logo

Power your AI models with the right data

Automate your data curation, annotation and label validation workflows.

Get started
Written by

Haziqa Sajid

View more posts
Frequently asked questions
  • VBL is a technique for estimating an object’s position and orientation using visual data captured by cameras and sensors.

  • VBL works by analyzing images from cameras and sensors to extract features or process image pixels to compute an object’s position.

  • RVL methods analyze previous frames to calculate an object’s location in the current frame. Dynamic environments make estimation challenging and result in considerable error accumulation.

  • AVL methods match current images to a reference map. Variations in weather and lighting can significantly alter the current image from the one in the reference map, making it challenging to match features between the two.

  • Yes. Warren et al. introduced a method that uses visual odometry in an AVL system to estimate accurate locations at very low altitudes.

  • VBL can estimate locations in closed GPS-denied environments. It collects rich visual data for computing position and orientation, and VBL systems are more scalable due to low-cost cameras and sensors.

  • Convolutional neural networks (CNNs) are the most common framework used for VBL. Scale-invariant feature transform (SIFT), Features from Accelerated Segments Test (FAST), and Binary Robust Independent Elementary Features (BRIEF) are a few algorithms used for feature extraction.

  • Research is currently underway to integrate deep learning frameworks, edge computing, and advanced sensor fusion techniques into vision-based localization.