Comparative Analysis of YOLOv9 and YOLOv8 Using Custom Dataset on Encord Active

Akruti Acharya
March 1, 2024
8 min read
blog image

Even as foundation models gain popularity, advancements in object detection models remain significant. YOLO has consistently been the preferred choice in machine learning for object detection.

Let’s train the latest iterations of the YOLO series, YOLOv9, and YOLOV8 on a custom dataset and compare their model performance. 

In this blog, we will train YOLOv9 and YOLOv8 on the xView3 dataset. The xView3 dataset contains aerial imagery with annotations for maritime object detection, making it an ideal choice for evaluating the robustness and generalization capabilities of object detection models.

If you wish to curate and annotate your own dataset for a direct comparison between the two models, you have the option to create the dataset using Encord Annotate. Once annotated, you can seamlessly follow the provided code to train and evaluate both YOLOv9 and YOLOv8 on your custom dataset.

light-callout-cta Read the Encord Annotate Documentation to get started with your annotation project.


We are going to run our experiment on Google Colab. So if you are doing it on your local system, please bear in mind that the instructions and the code was made to run on Colab Notebook.

Make sure you have access to GPU. You can either run the command below or navigate to Edit → Notebook settings → Hardware accelerator, set it to GPU, and the click Save.


To make it easier to manage datasets, images, and models we create a HOME constant.

import os
HOME = os.getcwd()

Train YOLOv9 on Encord Dataset

Install YOLOv9

!git clone 
%cd yolov9
!pip install -r requirements.txt -q
!pip install -q roboflow encord av

# This is a convenience class that holds the info about Encord projects and makes everything easier.

# The class supports bounding boxes and polygons across both images, image groups, and videos.

!wget '' -O


from typing import Literal
from pathlib import Path

from IPython.display import Image

import roboflow

from encord import EncordUserClient

from encord_to_roboflow_v1 import ProjectConverter

Data Preparation

Set up access to the Encord platform by creating and using an SSH key.

# Create ssh-key-path
key_path = Path("../")
if not key_path.is_file():
    !ssh-keygen -t ed25519 -f ../colab_key -N "" -q

key_content = key_path.read_text()

We will now retrieve the data from Encord, converting it to the format required by Yolo and storing it on disk. It's important to note that for larger projects, this process may encounter difficulties related to disk space.

The converter will automatically split your dataset into training, validation, and testing sets based on the specified sizes. 

# Directory for images
data_path = Path("../data")

client = EncordUserClient.create_with_ssh_private_key(
project_hash = "9ca5fc34-d26f-450f-b657-89ccb4fe2027" # xView3 tiny
encord_project = client.get_project(project_hash)
converter = ProjectConverter(
dataset_yaml_file = converter.do_it(batch_size=500, splits={"train": 0.5, "val": 0.1, "test": 0.4})
encord_project_title = converter.title

Download Model Weight

We will download the YOLOv9-e and the gelan-c weights. Although the YOLOv9 paper mentions versions yolov9-s and yolov9-m, it's worth noting that weights for these models are currently unavailable in the YOLOv9 repository.

!mkdir -p {HOME}/weights
!wget -q -O {HOME}/weights/
!wget -P {HOME}/weights -q

You can predict and evaluate the results of object detection with the  YOLOv9 weights pre-trained on COCO model. 

light-callout-cta Check out the blog YOLOv9 Explained and How to Run it if you want to run object detection on pre-trained YOLOv9 weights.

Train Custom YOLOv9 Model for Object Detection

We train a custom YOLOv9 model from a pre-trained gelan-c model.

!python \
--batch 8 --epochs 20 --img 640 --device 0 --min-items 0 --close-mosaic 15 \
--data $dataset_yaml_file \
--weights {HOME}/weights/ \
--cfg models/detect/gelan-c.yaml \
--hyp hyp.scratch-high.yaml

You can examine and validate your training results. The code for validation and inference with the custom model is available on Colab Notebook. Here we will focus on comparing the model performances.

Converting Custom YOLOv9 Model Predictions to Encord Active Format

pth = converter.create_encord_json_predictions(get_latest_exp("detect") / "labels", Path.cwd().parent)

print(f"Predictions exported to {pth}")

Download the predictions on your local computer and upload them via the UI to Encord Active for analysis of your results.

Moving on to training YOLOv8!

Train YOLOv8 on Encord Dataset

Install YOLOv8

!pip install ultralytics==8.0.196

from IPython import display


import ultralytics


Dataset Preparation

As we are doing a comparative analysis of two models, we will use the same dataset to train YOLOv8.

Train Custom YOLOv8 Model for Object Detection

from ultralytics import YOLO

model = YOLO('')  # load a pretrained YOLOv8n detection model

model.train(data=dataset_yaml_file.as_posix(), epochs=20)  # train the model


The code for running inference on the test dataset is available on the Colab Notebook shared below.

Converting Custom YOLOv8 Model Predictions to Encord Active Format

pth = converter.create_encord_json_predictions(get_latest_exp("detect", ext="predict") / "labels", Path.cwd().parent)

print(f"Predictions exported to {pth}")

Download this JSON file and upload it to Encord Active via UI.

Uploading JSON file using Encord UI

Comparative Analysis on Encord Active

On Encord Active under the tab Model Evaluation, you can compare both the model’s predictions. 

Model evaluation of YOLOv8 and YOLOv9

You can conveniently navigate to the Model Summary tab to view the Mean Average Precision (mAP), Mean Average Recall (mAR), and F1 score for both models. Additionally, you can compare the differences in predictions between YOLOv8 and YOLOv9.


YOLOv9 has higher false negative count while YOLOv8 has high false positive count

YOLOv8 may excel in correctly identifying objects (high true positive count) but at the risk of also detecting objects that aren't present (high false positive count). On the other hand, YOLOv9 may be more conservative in its detections (lower false positive count) but could potentially miss some instances of objects (higher false negative count). 


YOLOv8 with a higher true positive count, exhibits superior performance

In terms of recall, YOLOv8 exhibits superior performance with a higher true positive count (101) compared to YOLOv9 (43), indicating its ability to correctly identify more instances of objects present in the dataset.

Both models, however, show an equal count of false positives (643), suggesting similar levels of incorrect identifications of non-existent objects. YOLOv8 demonstrates a lower false negative count (1261) compared to YOLOv9 (1315), implying that YOLOv8 misses fewer instances of actual objects, highlighting its advantage in recall performance.

Precision-Recall Curve

YOLOv8 has a higher Area Under the Curve (AUC-PR) value thanYOLOv9

Based on the observed precision-recall curves, it appears that YOLOv8 achieves a higher Area Under the Curve (AUC-PR) value compared to YOLOv9. This indicates that YOLOv8 generally performs better in terms of both precision and recall across different threshold values, capturing a higher proportion of true positives while minimizing false positives more effectively than YOLOv9.
Precision-Recall Curve is not the only metric to evaluate the performance of models. There are other metrics like F1 score, IOU distribution, etc. 

light-callout-cta For more information on different quality metrics, read the blog Data, Label, & Model Quality Metrics in Encord.

Metric Correlation

The metric impact on performance in Encord refers to how specific metrics influence the performance of your model. Encord allows you to figure out which metrics have the most influence on your model's performance. This metric tells us whether a positive change in a metric will lead to a positive change (positive correlation) or a negative change (negative correlation) in model performance.

Metric correlation feature in Encord

The dimensions of the labeled objects significantly influence the performance of both models. This underscores the importance of the size of objects in the dataset. It's possible that the YOLOv9 model's performance is adversely affected by the presence of smaller objects in the dataset, leading to its comparatively poorer performance.

Metric Performance

The Metric Performance in model evaluation in Encord provides a detailed view of how a specific metric affects the performance of your model. It allows you to understand the relationship between a particular metric and the model's performance.

YOLOv8 identifies objects with a higher true positive count, while YOLOv9 has a lower false positive count

In conclusion, the comparison between YOLOv8 and YOLOv9 on Encord Active highlights distinct performance characteristics in terms of precision and recall.

While YOLOv8 excels in correctly identifying objects with a higher true positive count, it also exhibits a higher false positive count, indicating a potential for over-detection. On the other hand, YOLOv9 demonstrates a lower false positive count but may miss some instances of objects due to its higher false negative count.

light-callout-cta If you want to improve your object detection model, read the blog How to Analyze Failure Modes of Object Detection Models for Debugging for more information.

The precision-recall curve analysis suggests that YOLOv8 generally outperforms YOLOv9, capturing a higher proportion of true positives while minimizing false positives more effectively.

However, it's important to consider other metrics like F1 score and IOU distribution for a comprehensive evaluation of model performance. Moreover, understanding the impact of labeled object dimensions and specific metric correlations can provide valuable insights into improving model performance on Encord Active.

Written by Akruti Acharya
Akruti is a data scientist and technical content writer with a M.Sc. in Machine Learning & Artificial Intelligence from the University of Birmingham. She enjoys exploring new things and applying her technical and analytical skills to solve challenging problems and sharing her knowledge and... see more
View more posts
cta banner

Build better ML models with Encord

Get started today
cta banner

Discuss this blog on Slack

Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI

Join the community

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.