Software To Help You Turn Your Data Into AI
Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.
Contents
Object detection is a fascinating field in computer vision. It is tasked with locating and classifying objects within an image or video frame. The challenge lies in the model's ability to identify objects of varying shapes, sizes, and appearances, especially when they are partially occluded or set against cluttered backgrounds.
Deep learning has proven highly effective in object detection. Through training, deep learning models extract features like shape, size, and texture from images to facilitate object detection. They can also learn to classify objects based on the extracted features.
One widely used deep learning model for object detection is YOLO (You Only Look Once). YOLO is a single-shot object detection algorithm, meaning it detects objects in a single pass through the image. This makes YOLO very fast, but it can be less accurate than two-stage object detection algorithms.
Another renowned deep learning model for object detection is SSD (Single Shot MultiBox Detector). SSD is similar to YOLO but uses a distinct approach to detecting objects. SSD partitions the image into a grid of cells, and each cell predicts potential bounding boxes for objects that may be present in the cell. This makes SSD more accurate than YOLO, but it is also slower.
Object detection typically involves two primary components:
Assessing the performance, effectiveness, and limitations of object detection models is pivotal. You can employ several evaluation metrics to assess the accuracy and robustness of these models:
The choice of evaluation metric must align with the goals and nuances of the specific application. For instance, in traffic monitoring applications, mAP and IoU might be prioritized. Conversely, in medical imaging, where false alarms and missed detections can have serious implications, metrics such as FPR and FNR become highly significant.
The evaluation of object detection models is critically important for a myriad of reasons:
Mean average precision (mAP) is a metric used to evaluate the performance of object detection models. It is calculated by averaging the precision-recall curves for each object class. Precision quantifies the fraction of true positives out of all detected objects, while recall measures the fraction of true positives out of all actual objects in the image.
The AUC is a measure of the model's overall performance for that class, and it considers both precision and recall. By averaging these areas across all classes, we obtain mAP. The AUC score can be used to calculate the area under the precision-recall curve to get one number that describes model performance.
mAP is a popular metric for evaluating object detection models because it is easy to understand and interpret. It is also relatively insensitive to the number of objects in the image. A high mAP score indicates that the model can detect objects with both high precision and recall, which is critical in applications like autonomous driving where reliable object detection is pivotal to avoiding collisions.
A perfect mAP score of 1.0 suggests that the model has achieved flawless detection across all classes and recall thresholds. Conversely, a lower mAP score signifies potential areas of improvement in the model's precision and/or recall.
1. Generate the prediction scores using the model.
2. Convert the prediction scores to class labels.
3. Calculate the confusion matrix.
4. Calculate the precision and recall metrics.
5. Calculate the area under the precision-recall curve (AUC) for each class.
6. Average the AUCs to get the mAP score.
mAP is a widely used metric for evaluating object detection models in a variety of applications, such as:
Self-driving cars are one of the most promising applications of object detection technology. To safely navigate the road, self-driving cars need to be able to detect and track various objects, including pedestrians, cyclists, other vehicles, and traffic signs. mAP is a valuable metric for evaluating the performance of object detection models for self-driving cars because it takes into account both precision and recall.
Precision is the fraction of detected objects that are actually present in the image or video, i.e., correct detections. Recall, on the other hand, measures how many of the actual objects in the image were successfully detected by the model..
High precision indicates fewer false positives, ensuring that the model isn't mistakenly identifying objects that aren't there. Conversely, high recall ensures the model detects most of the real objects in the scene.For self-driving cars, a high mAP is essential for ensuring safety. If the model is not able to detect objects accurately, it could lead to accidents.
Visual search is a type of information retrieval that allows users to find images or videos that contain specific objects or scenes. It is a practical application of mean average precision (mAP) because mAP can be used to evaluate the performance and reliability of visual search algorithms.
In visual search, the primary objective is to retrieve images or videos that are relevant to the user's query. This can be a challenging task, as there may be millions or even billions of images or sequences of videos available. To address this challenge, visual search algorithms use object detection models to identify the objects in the query image or video.
Object detection models play a pivotal role by identifying potential matches, and generating a list of candidate images or videos that seem to contain the queried objects. The mAP metric can be used to evaluate the performance of the object detection models by measuring the accuracy and completeness of the candidate lists.
mAP is used to evaluate the performance of object detection models in medical image analysis. It is calculated by taking the average of the precision-recall curves for all classes. The higher the mAP, the better the performance of the model.
The following code shows how to calculate mAP in Python:
import numpy as np import pandas as pd import matplotlib.pyplot as plt import sklearn.metrics
This code above imports essential libraries for our machine learning tasks and data visualization. The imported libraries are used for numerical operations (numpy), data manipulation (pandas), model evaluation (`precision_score` and `recall_score` from `sklearn.metrics`), and creating plots (`matplotlib.pyplot`).
Create two different datasets containing binary data. The code below defines two sets of data for binary classification model evaluations. Each set consists of ground truth labels (`y_true_01`) and predicted scores (`pred_scores_01`).
y_true_01 = ["positive", "negative", "positive", "negative", "positive", "positive", "positive", "negative", "positive", "negative"] pred_scores_01 = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.75, 0.2, 0.8, 0.3]
`y_true_02` is a list of ground truth labels for a set of instances. In this case, the labels are either "positive" or "negative," representing the two classes in a binary classification problem.
`pred_scores_02` is a list of predicted scores or probabilities assigned by a classification model to the instances in `y_true_01`. These scores represent the model's confidence in its predictions.
y_true_02 = ["negative", "positive", "positive", "negative", "negative", "positive", "positive", "positive", "negative", "positive"] pred_scores_02 = [0.32, 0.9, 0.5, 0.1, 0.25, 0.9, 0.55, 0.3, 0.35, 0.85]
`y_true_02` is another list of ground truth labels for a different set of instances.
`pred_scores_02` is a list of predicted scores or probabilities assigned by a classification model to the instances in `y_true_02`.
Set a threshold value with a range of 0.2 to 0.9 and a 0.05 step. Setting a threshold value with a range of 0.2 to 0.9 and a 0.05 step is a good practice for calculating mean average precision (mAP) because it allows you to see how the model performs at different levels of confidence.
thresholds = np.arange(start=0.2, stop=0.9, step=0.05)
`precision_recall_curve()` function computes precision and recall ratings for various binary classification thresholds. The function accepts as inputs threshold values, projected scores, and ground truth labels (`y_true`, `pred_scores`, and `thresholds`). The thresholds are iterated through, predicted labels are generated, precision and recall scores are calculated, and the
results are then reported.
Finally, lists of recall and precision values are returned.
def precision_recall_curve(y_true, pred_scores, thresholds): precisions = [] recalls = [] for threshold in thresholds: y_pred = ["positive" if score >= threshold else "negative" for score in pred_scores] precision = sklearn.metrics.precision_score(y_true=y_true, y_pred=y_pred, pos_label="positive") recall = sklearn.metrics.recall_score(y_true=y_true, y_pred=y_pred, pos_label="positive") precisions.append(precision) recalls.append(recall) return precisions, recalls
Calculate the average precision scores for the first dataset (`y_true_01`) and plot out the result.
precisions, recalls = precision_recall_curve(y_true=y_true_01, pred_scores=pred_scores_01, thresholds=thresholds) plt.plot(recalls, precisions, linewidth=4, color="red", zorder=0) #Set the label and the title for the precision-recall curve plot plt.xlabel("Recall", fontsize=12, fontweight='bold') plt.ylabel("Precision", fontsize=12, fontweight='bold') plt.title("Precision-Recall Curve", fontsize=15, fontweight="bold") plt.show() # Append values to calculate area under the curve (AUC) precisions.append(1)recalls.append(0) precisions = np.array(precisions) recalls = np.array(recalls) precisions = np.array(precisions) recalls = np.array(recalls) # Calculate the AP avg_precision_class01= np.sum((recalls[:-1] - recalls[1:]) * precisions[:-1]) print('============================================') print('Average precision score:',np.round(avg_precision_class01,2))
Output:
The AP score of 0.95 is a good score; it indicates that the model performs relatively well in terms of precision when varying the classification threshold and measuring the trade-off between precision and recall.
Now, let’s calculate the average precision scores for the second dataset (`y_true_02`) and plot out the result.
# Calculate precision and recall values for different threshold precisions, recalls = precision_recall_curve(y_true=y_true_02, pred_scores=pred_scores_02, thresholds=thresholds) # Plot the precision-recall curve plt.plot(recalls, precisions, linewidth=4, color="blue", zorder=0) #Set the label and the title for the precision-recall curve plot plt.xlabel("Recall", fontsize=12, fontweight='bold') plt.ylabel("Precision", fontsize=12, fontweight='bold') plt.title("Precision-Recall Curve", fontsize=15, fontweight="bold") plt.show() # Append values to calculate area under the curve (AUC) precisions.append(1) recalls.append(0) #Convert precision and recall lists to Numpy arrays for computation precisions = np.array(precisions) recalls = np.array(recalls) # Calculate the AP avg_precision_class02 = np.sum((recalls[:-1] - recalls[1:]) * precisions[:-1]) print('============================================') print('Average precision score:',np.round(avg_precision_class02,2))
Output:
For the second dataset, the AP score was 0.96 which is also a good score. It indicates that the model is able to identify positive samples with high precision and high recall.
The mean Average Precision or mAP score is calculated by taking the mean AP over all classes and/or overall IoU thresholds, depending on the different detection challenges that exist. The formula for MAP:
# Number of classes or labels (in this case, 2 classes) num_labels = 2 # Calculate the Mean Average Precision (mAP) by averaging the AP scores for both classes mAP = (avg_precision_class2 + avg_precision_class1) / num_labels # Print the Mean Average Precision score print('Mean average Precision score:', np.round(mAP, 3))
Output:
You calculate the mAP score by averaging these AP scores for all classes. In this specific scenario, you averaged the AP scores for classes 1 and 2.
mAP is a widely used metric for evaluating the performance of object detection and instance segmentation algorithms. However, it has its own set of challenges and limitations that should be considered when interpreting its results:
Throughout this article, we've explored the inner workings of mAP, uncovering its mathematical underpinnings and its significance in assessing object detection performance.
Armed with this knowledge, you are better equipped to navigate the complex landscape of object detection, armed with the ability to make informed decisions when designing, training, and selecting models for specific applications.
Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI
Join the communityForget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.