Ulrik Stig Hansen
Published April 14, 2023Edited May 18, 2023 4 min read

Object Detection: Models, Use Cases, Examples

blog image

Object detection uses a range of ML-based models to identify where objects or instances of objects are within images or videos. It’s a vital part of computer vision and is useful for anything from healthcare to surveillance. 

In this article, we are going to cover: 

  • What is object detection? 
  • How does object detection work? 
  • Object detection models, use, cases, examples, and more!

Let’s dive in . .

What is Object Detection?

Object detection is a valuable component of computer vision. Making it possible using algorithms and machine learning-based models to identify objects and instances of objects in images and videos. 

Objects can include anything from people, faces, buildings, animals, cars, and millions of other individual physical entities. 

The aim of object detection is simple: Identify what objects are in an image or video and where they are. 

Before we go in-depth on everything you need to know about object detection, it’s important to clarify what it isn’t. We need to compare it with image classification and image segmentation. 

Object Detection vs. Image Classification

Unlike object detection, image classification puts an entire image (or video frame) through a machine learning classifier, such as a deep neural network. This process generates tags or labels, and then human annotators or AI-based automated labeling software must apply those labels to the correct object in the images. 

Object detection is more advanced, giving annotation teams the advantage of placing bounding boxes around classified objects. 

Object detection vs. Image classification vs. Image segmentation

Object Detection vs. Image Segmentation

Segmentation is a way of defining the pixels of an object class within images or video frames in computer vision datasets

With semantic image segmentation, every pixel belonging to a tag or label will be identified. However, this approach won’t define the boundaries of the objects in an image. 

Object detection, sometimes known as object recognition, won’t segment objects based on pixels. But it will pinpoint the location of objects or object instances within boxes. 

You can combine semantic image segmentation with object detection, creating instance segmentation. This way, object instances are detected, and each is then segmented within bounding boxes or other annotation approaches. 

Fbackground image

Software to help you annotate less data.

Why is Object Detection Important in Computer Vision?

Object detection plays a valuable role in computer vision and artificial intelligence. In CV projects and applications, object detection is used to detect objects or instances of objects in image or video-based datasets. 

Pros of Object Detection 

Object detection models and algorithms can detect any object that occupies between 2% and 60% of the space within an image or video frame. 

You can use it to detect objects with clear boundaries, identify objects within clusters, and localize objects at up to 15 fps (frames per second) within videos. Object detection is an incredibly useful and versatile way to detect objects in images and videos. 

Cons of Object Detection

However, there are certain use cases whereby a different method is required, such as: 

  • Elongated objects: Use instance segmentation instead
  • Objects that lack a physical presence but are in an image (e.g., a dark day): Use classification
  • Objects that lack clear boundaries or are at difficult angles: Semantic segmentation would be better. 

The reason other methods are required is that object detection does have its limitations. It’s more effective with objects that are definable, solid, and take up anything from 2% to 60% of an image or frame in a video. 

Apart from these examples, object detection is a valuable component of computer vision for thousands of real-world use cases and applications. 

Deep Learning and Object Detection 

Deep learning has made modern object detection models, algorithms, and most real-world applications possible. 

Before deep learning, object detection was less advanced. Now, with the influence of deep learning algorithms and models (such as YOLO, SSD, R-CNN, etc.), we have one and two-stage object detection algorithms. Making the use cases and applications for object detection much broader and deeper, including countless examples in computer vision. 

Deep learning and other advances in machine learning (ML) and AI have accelerated and improved object detectors. In computer vision, this has shaped how object detection can be used in supervised and unsupervised annotation and data labeling. 

Object detection in action, YOLOv8 used in Encord

(Source: Encord and the Airbus Aircraft Detection dataset)

Person Detection

Person detection is an important use case, or variant, of object recognition in action. It’s also a real-time object detection use case, especially if deployed in a travel or security environment. 

Identifying a “person” as a primary object class in video surveillance, satellite imagery (e.g., SAR), and facial recognition is a valuable example of an object detection use case. 

Most person detection models are trained to identify people based on front-facing and asymmetric images or video frames. 

Latest Object Detection Advances in Computer Vision

As with any technological innovation, advances in object detection for computer vision continue to accelerate based on hardware, software, and algorithmic model developments. 

Now, it’s easier for object detection or object recognition’s use to be more widespread. Mainly thanks to continuous advancements in AI imaging technology, platforms, software, and open-source tools. Computing power keeps increasing, with multi-core processor technology, AI accelerators, Tensor Processing Units (TPUs), and Graphical Processing Units (GPUs) supporting advances in computer vision. 

A combination of hardware, software, and algorithmic updates has taken object detection to the edge: Edge AI, also known as the Intelligent Edge or Distributed Edge. In other words, it’s easier to incorporate object detection with on-device machine learning (also known as AIoT) and computer vision models. 

For example; a model within a camera correctly identifies someone or something (an object) against a database and sends an automatic alert based on pre-programmed processes. Edge AI is making real-time object detection more affordable as a commercial application. 

Object detection, as deployed for facial recognition 

How Does Object Detection Work?

Traditional object detection uses image processing, so they don’t need historical data or supervision. OpenCV, an open-source computer vision application, is an example of this method and is used for facial recognition and other projects. 

In most cases, one or two-stage deep learning methods are used to implement object detection (more about one-stage vs. Two-stage below). 

When image or video datasets are involved, this can make object detection an integral part of supervised or semi-supervised, automated, or manual data labeling and annotation work

Deep learning object detection works in numerous ways, depending on the particular algorithms and models being deployed (e.g., YOLO, SSD, R-CNN, etc.). 

In most cases, these models are integrated into other systems and are only one part of the overall process of detecting, labeling, and annotating objects in images or videos. And this includes multi-object tracking for computer vision projects. 

One-stage vs. Two-stage Deep Learning Object Detectors

There are two main approaches and ways to implement object detection: using one-stage or two-stage object detectors. 

Both approaches find the number of objects in an image or video frame and classify those objects or object instances while estimating size and positions using bounding boxes. 

To put it simply, one-stage models group those tasks in a single pass. Taking one attempt to produce the desired outcomes. Popular one-stage detectors include YOLO (including v8), RetinaNet, and SSD. 

Two-stage object detection models take those two steps separately, hence the two-stage approach. Popular two-stage detectors include R-CNN, Faster R-CNN, Mask R-CNN, and the latest model, G-RCNN. 

One-stage vs. two-stage object detection 

Object Detection Use Cases and Applications

Object detection has numerous real-world applications and use cases. 

It’s an integral part of computer vision (CV), ML, and AI projects and software worldwide in dozens of sectors, including healthcare, automotive (self-driving cars), retail, eCommerce, art, ecology, agriculture, zoology, travel, satellite imagery, and surveillance. 

Some of the most common real-world object detection use cases include: 

  • Scanning and verifying faces against passports at airports (using a one-stage, one-shot learning model)
  • Detecting objects to guide the autopilot algorithmic software of self-driving cars
  • Animal monitoring in agriculture, zoos, and in the wild; 
  • Ensuring people on the “No Fly” list can’t get through security gates at airports; 
  • Detecting objects and how customers spend their time in retail stores; 
  • Detecting branded products mentioned on social media; an AI-based system known as “Visual Listening”; 
  • Object detection is even used in art galleries, where visitors can use apps to scan a picture and learn everything about it from the history to the most recent valuation. 

With modern advances, such as Edge AI, real-time object detection is possible, including for commercial applications such as security and travel. 

Object Detection Development Milestones 

Object detection isn’t as new as most people imagine. It’s an AI discipline that’s evolved over the last 20 years. 

Object detection changed in 2014, once the evolution of Deep Learning Detection started to shape the models that make object detection possible. 

Before 2014, the traditional approaches started in 2001 with the Viola-Jones Detector, a pioneering machine learning algorithm, and model that makes object detection possible. In 2006, the HOG Detector was launched, and then DPM in 2008, introducing bounding box regression. 

Once Deep Learning Detection got involved in 2014, a series of two-stage object detection algorithms and models were developed over the years. These models include RCNN and R-CNN and various iterations on these (Fast, Faster, Mask, and G-RCNN). We cover those in more detail below. 

One-stage object detection algorithms, such as YOLO (and subsequent iterations, up to version 8), SSD (in 2016), and RetinaNet, in 2017. 

Most Popular Object Detection Algorithms 

In this section, we will review some of the most popular object detection models, including YOLO, SSD, and R-CNN.

YOLO: You Only Look Once

YOLO (You Only Look Once) is a popular set of real-time object detection models used for computer vision. It was developed by Joseph Redmon, Ali Farhadi, and Santosh Divvala, aiming to achieve highly-accurate object detection results faster than other models. 

YOLO uses one-stage detection models, processing images in a single pass, unlike other approaches, such as R-CNN that take a two-step process. Hence the name, you only look once. YOLO is part of a family of one-stage object detection models that process images following convolutional neural network (CNN) patterns. 

The most recent version of YOLO is YOLOv8, developed by Ultralytics. It makes sense to use this model if you prefer YOLO over other object detection models. Its performance (measured in Mean average precision (mAP)), speed (In fps), and accuracy is better, while its computing cost is lower.

Different YOLO Object Detection Models

(Source: Youtube )

SSD: Single-shot Detector

Single-shot Detector (SSD) is another one-stage detector model that can identify and predict different classes and objects. 

SSD does this using a deep neural network (DNN), adapting the output spaces of bounding boxes in images and video frames and then generating scores for object categories in default boxes. 

SSD has a high accuracy score, is easy to train, and can integrate with software and platforms that need an object detection feature. It was first developed and released in 2015 by a team of academics and data scientists

R-CNN: Region-based Convolutional Neural Networks

Region-based convolutional neural networks, or regions/models that use CNN features, known as R-CNNs, are innovative ways to use deep learning models for object detection. 

An R-CNN works by selecting several regions from an image, such as an anchor box. A pre-defined class of labels is given to the model first. It uses these to label the categories of objects in the images and other offsets, such as bounding boxes. 

R-CNN models can also divide images into almost 2,000 region sections and then apply a convolutional neural network across every region in the image. It’s a faster way to train a model or apply object detection. However, it isn’t a one-stage process, so might take slightly more time. 

The development of R-CNN models started in 2013, and one application of this approach is to enable object detection in Google Lens. 

Fbackground image

Software to help you annotate less data.

How to Implement More Effective Object Detection With Encord 

With Encord and Encord Active, object detection comes as standard within our toolkit. Encord’s AI-powered automated annotation and labeling tools are trusted by world-leading AI teams, making it faster and easier to develop a production-ready CV, AI, or ML model. 

Encord was created to improve the efficiency of image and video data labeling for computer vision projects, including one-stage and two-stage object detection. Our solution makes managing a team of annotators easier, saving time and money while reducing errors, bugs, and bias. 

Make data labeling more collaborative and easier to manage with an interactive dashboard and customizable annotation toolkits. Improve the quality of your computer vision datasets, and enhance model performance

Object Detection Key Takeaways 

Object detection is a fundamental cornerstone of computer vision. It’s become even more valuable with advances in deep learning in recent years. 

Detecting objects continues to be a challenge that data science, AI, and ML is getting better at. As new approaches, models, and algorithms evolve, so will the applications and real-world use cases in that object detection can be deployed. 

Ready to accelerate and automate your data annotation and labeling? 

Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams. 

AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today

Want to stay updated?

  • Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.
  • Join the Slack community to chat and connect.
cta banner

Get the latest machine learning news and insights