3 Ways To Add More Classes To Computer Vision Models
Adding new classes to a production computer vision model may be necessary for a number of reasons, which we’ve explored in more detail below:
- improved accuracy
- increased versatility
- increased robustness
When adding new classes, it is important to have enough high-quality data, use robust evaluation methods, and monitor the performance of the model over time to ensure its continued effectiveness.
Adding new classes to a computer vision model can lead to improved accuracy, increased versatility, and the ability to handle a wider range of inputs, but only when it is done well.
How do I know if I need to add new classes to my computer vision model?
When developing a computer vision model and putting it into production, it is essential to continually benchmark its performance and consider adding new classes where performance is lacking. Several signs may indicate the need for adding new classes, including:
- Decreased accuracy on new data
- Changes in business requirements
- Changes in the environment
- Insufficient data for existing classes
Decreased accuracy on new data
If you are observing a drop in accuracy when applying your model to new data, it may be due to the fact that the model has not encountered examples of the new classes present in the data or there are some errors in your dataset that you need to fix. To improve accuracy, you can add these classes to your training set and retrain your model.
Changes in business requirements
As business needs evolve, it may be necessary to add new classes to your model to account for the new objects or scenes that are now relevant to your application. For example, if you previously developed a model to recognize objects in a warehouse, but now need to extend it to recognize objects in a retail store, you may need to add new classes to account for the different types of products and displays.
Changes in the environment
Changes in the environment in which your model is being used can also impact its performance. For example, if the lighting conditions have changed or if there is a new background in the real-world images the detection model is analyzing, it may be necessary to add new classes to account for these changes.
Insufficient data for existing classes
If the data you have collected for your existing classes is not sufficient to train a high-quality model, adding new classes can help to improve overall performance. This is because the model will have access to more data to learn from.
Overfitting occurs when a model memorizes specific examples in the training data instead of learning general patterns. If you suspect that your model is overfitting, it may be because you have not provided it with enough variability in the training data. In this case, adding new classes can help to reduce overfitting by providing more diverse examples for the model to learn from.
Quality and Quantity of Data
It is important to consider the quality and quantity of data when adding new classes. A good rule of thumb is to have at least 100-1000 examples per class, but the number may vary depending on the complexity of the classes and the size of your model. The data should also be diverse and representative of the real-world scenarios in which the model will be used.
To evaluate the effectiveness of the model with the added classes, it is important to use robust evaluation methods such as cross-validation. This will provide a reliable estimate of the model's performance on unseen data and help to ensure that it is not overfitting to the new data.
Additionally, it is important to monitor the performance of your model over time and to be proactive in adding new classes if needed. Regular evaluation and monitoring can help you quickly identify when new classes are needed and ensure that your model remains up-to-date and effective.
What are the Benefits of Adding New Classes to a Computer Vision Model?
Adding new classes to a computer vision model can have several benefits, including:
- improved accuracy
- increased versatility
- the ability to handle a wider range of inputs.
One of the main benefits of adding new classes to a computer vision model is improved accuracy. By adding new classes, the model can learn to recognize a wider range of objects, scenes, and patterns, leading to better performance in real-world applications, such as facial recognition or self-driving cars.
This can be particularly important for tasks like image classification, object detection, and semantic segmentation, where the goal is to accurately identify and classify the elements in an image or video. With a larger number of classes, the model can learn to distinguish between similar objects, such as different breeds of dogs or species of flowers, and better generalize to unseen examples.
Results from the paper "an image is worth 16x16 words: transformers for image recognition at scale" show that by using more data (JFT-300M dataset has ~375 million annotated images) you can significantly improve the model's performance.
Another benefit of adding new classes is increased versatility. By expanding the range of objects and scenes the model can recognize, it can be applied to a wider range of use cases and problems. For example, a model trained on a large image dataset of natural images can be adapted to a specific domain, such as medical imaging, by adding classes relevant to that domain. This can help the model to perform well in more specialized applications, such as disease diagnosis or surgical planning.
Adding new classes can also help the model handle a wider range of inputs. For example, a model trained on a diverse set of images can be more robust to variations in lighting, viewpoint, and other factors that can affect image quality. This can be especially important for real-world applications, where the images used to test the artificial intelligence model may be different from those used during training.
How To Add New Classes To Your Computer Vision Model
Adding new classes to a computer vision model is a crucial step in improving its accuracy and functionality–
There are several steps involved in this process, including data collection, model training, debugging, and deployment. To make this task easier, various tools and software libraries have been developed.
There are three main ways to prepare a new class for your computer vision model:
- Manual collection and annotation
- Generating synthetic data
- Active learning
Let’s take a look at them one by one.
Manual Dataset Collection and Annotation
Annotation refers to the process of identifying and labeling specific regions in video or image data. This technique is mainly used for image classification through supervised learning. The annotated data serves as the input for training machine learning or deep learning models. With a large number of annotations and diverse image variations, the model can identify the unique characteristics of the images and videos and learn to perform tasks such as object detection or object tracking, image classification, and others, depending on the type of model being trained.
There are various types of annotations, including 2D bounding boxes, instance segmentation, 3D cuboid boxes, and keypoint annotations.
- Instance segmentation involves identifying and outlining multiple objects in an image. -
- Bounding boxes or 3D cuboid boxes can be drawn around objects and assigned labels for object detection.
- Polygonal outlines can be traced around objects for semantic and instance segmentation.
- Keypoints and landmarks can also be identified and labeled for object landmark detection, and straight lines can be marked for lane detection.
- For image classification, images are grouped based on labels.
To prepare and annotate your own dataset, you can either record videos, take photos, or search for freely available open source datasets online. If your company already has collected a dataset you can connect it to a platform via a cloud bucket integration (S3, Azure, GCP etc.).
However, before you can use these images for training, you need to annotate them, as opposed to using data from an already annotated dataset.
When collecting data, make sure to keep it as close to your intended inference environment as possible, considering all aspects of the input images, including lighting, angle, objects, etc. For example, if you want to build machine learning models that detect license plates you must take into account different light and weather conditions.
There are many tools available for annotating images for computer visiondatasets. Each tool has its own set of features and is designed for a specific type of project.
- Encord Annotate: An annotation platform for AI-assisted image and video annotation and dataset management. It's the best option for teams that are looking to use AI automation to make the video and image annotation process more efficient.
- CVAT (Computer VisionAnnotation Tool): A free, open-source, web-based annotation toolkit built by Intel. CVAT supports four types of annotations (points, polygons, bounding boxes, and polylines).
- Labelbox: A US-based data annotation platform.
- Appen: A data labeling tool founded in 1996, making it one of the first and oldest solutions in the market.
These are just a few of the tools available for adding new classes to a computer vision dataset.
The best tool for you will depend on the specific needs of your project, such as:
- The size of your dataset.
- The types of annotations you need to make/
- The platform you are using.
In an ideal scenario, the annotation tool should seamlessly integrate into your machine learning workflow. It should be efficient, user-friendly, and allow for quick and accurate annotation, enabling you to focus on training your models and improving their performance. The tool should also have the necessary functionalities and features to meet your specific annotation requirements, making the overall process of annotating data smoother and more efficient.
Another way to create a dataset is to generate synthetic data. This method can be especially useful for training in unusual circumstances, as it allows you to create a much larger dataset than you could otherwise obtain from real-world sources. As a result, your model is likely to perform better and achieve better results. However, it is not recommended to use only synthetic data or put synthetic data into validation/test data.
Generating synthetic computer vision datasets is another option for adding new classes to your model. There are several tools available for this purpose:
- Unity3D/Unreal Engine: Popular game engines that can be used to generate synthetic computer visiondatasets by creating virtual environments and simulating camera movements.
- Blender: A free and open-source 3D creation software that can be used to generate synthetic computer visiondatasets by creating 3D models and rendering them as images.
- AirSim: an open-source, cross-platform simulation environment for autonomous systems and robotics, developed by Microsoft. It uses Unreal Engine for physically and visually realistic simulations and allows for testing and developing autonomous systems such as drones, ground vehicles, and robotic systems.
- CARLA: an open-source, autonomous driving simulator. It provides a platform for researchers and developers to test and validate their autonomous vehicle algorithms in a simulated environment. CARLA simulates a variety of real-world conditions, such as weather, traffic, and road layouts, allowing users to test their algorithms in a range of scenarios. It also provides a number of pre-built maps, vehicles, and sensors, as well as an API for creating custom components.
- Generative adversarial networks (GANs) allow you to generate synthetic data by setting two neural networks to compete against each other. One generates the data and the other identifies whether it's real or synthetic. Through a process of iteration, the models adjust their parameters to improve their performance, with the discriminator becoming better at distinguishing real from synthetic data and the generator becoming more effective at creating accurate synthetic data. GANs can be used to supplement training datasets that lack sufficient real-world data, but there are also challenges to using synthetic data that need to be considered.
These tools can be used to generate synthetic data for various computer vision tasks, such as object detection, segmentation, and scene understanding. The choice of tool will depend on the specific requirements of your project, such as the type of data you need to generate, the complexity of the scene, and the resources available.
Annotation with Active Learning
Active learning is a machine learning technique that trains models by allowing them to actively query annotators for information that will help improve their performance. The process starts with a small initial subset of labeled data from a large dataset. The model uses this labeled data to make predictions on the remaining unlabeled data. ML engineers and data scientists then evaluate the model's predictions to determine its level of certainty.
A common method for determining uncertainty is by looking at the entropy of the probability distribution of the prediction. For example, in image classification, the model reports a probability of confidence for each class considered for each prediction made. If the model is highly confident in its prediction, such as a 99 percent probability that an image is a motorcycle, then it has a high level of certainty. If the model has low certainty in its prediction, such as a 55 percent probability that an image is a truck, then the model needs to be trained on more labeled images of trucks.
Another example is the classification of images of animals. After the model is initially trained on a subset of labeled data, it can identify cats with high certainty but is uncertain about how to identify a dog, reporting a 51 percent probability that it is not a cat and a 49 percent probability that it is a cat. In this case, the model needs to be fed more labeled images of dogs so that the ML engineers can retrain the model and improve its performance.
The samples with high uncertainty are sent back to the annotators, who label them and provide the newly labeled data to the ML engineers. The engineers then use this data to retrain the model, and the process continues until the model reaches an acceptable performance threshold. This loop of training, testing, identifying uncertainty, annotating, and retraining allows the model to continually improve its performance.
Active learning pipelines also help ML engineers identify failure modes such as edge cases, where the model makes a prediction with high uncertainty, indicating that the data does not fit into one of the categories that the model has been designed to detect. The model flags these outliers, and the ML engineers can retrain the model with the labeled sample to help the model learn to identify these edge cases.
Using active learning in machine learning can make model training faster and cheaper while reducing the burden of data labeling for annotators. Instead of labeling all the data in a massive dataset, organizations can intelligently select and label a portion of the data to increase model performance and reduce costs. With an AL pipeline, ML teams can prioritize labeling the data that is most useful for training the model and continuously adjust their training as new data is annotated and used for training.
Surprisingly, active learning is also useful even when ML engineers have a large amount of already labeled data. Training the model on every piece of labeled data in a dataset can be a poor allocation of resources, and active learning can help select a subset of data that is most useful for training the model, reducing computational costs.
Active learning is a powerful ML technique that allows models to actively seek information that will help improve their performance. By reducing the burden of data labeling and optimizing the use of computational resources, active learning can help organizations achieve better results more efficiently and cost-effectively. However, an active learning pipeline can be hard to implement.
Encord Active is an open-source active learning tool that includes visualizations, workflows, and a set of data and labels quality metrics and model performance analysis based on the model's predictions. It allows you to add the model's predictions, filter them by model's confidence and export them into your annotation tool (for example Encord Annotate).
What Do You Do Once You’ve Added New Classes?
Once you’ve added new classes to your computer vision model, there are several steps you can take to optimize its performance:
- Evaluate the model
- Fine-tune the model
- Data augmentation
- Monitor performance
Evaluate The Model
The first step after adding new classes to your model is to evaluate its performance. This involves using a dataset of images or videos to test the model and see how well it can recognize the new classes. You can use metrics like accuracy, precision, recall, and F1 score to quantify the model's performance and compare it with baseline models. You can also visualize the results and check model performance using confusion matrices, precision-recall curves, and ROC curves. These evaluations will help you identify areas where the model is performing well and where it needs improvement.
Fine-Tune The Model
Based on the evaluation results, you may need to fine-tune the model to optimize its performance for the new classes. Fine-tuning can involve adjusting the model's hyperparameters, such as learning rate or weight decay, or adjusting the architecture of the model itself. You can also use techniques like transfer learning to leverage pre-trained models and fine-tune them for your specific task.
Another approach to improving the model's performance is to use data augmentation. This involves transforming the existing training data to create new, synthetic examples. For example, you can use techniques like random cropping, flipping, or rotation to create new training samples. By increasing the size of the training dataset, data augmentation can help to prevent overfitting and improve the model's generalization ability.
Once you’ve fine-tuned the model, it’s important to monitor its performance over time. This can involve tracking the model's behavior on a test set or in a real-world deployment and adjusting the model as needed to keep it up-to-date. Monitoring performance can help to ensure that the model continues to function well when new classes are added and as the underlying data distribution changes.
Adding new classes to a computer vision model is just the first step in optimizing its performance. By evaluating the model, fine-tuning its parameters, using data augmentation and regularization, and monitoring its performance, you can make the model more accurate, versatile, and robust to new classes.
These steps are crucial for ensuring that your model remains effective and up-to-date over time and for achieving the best possible performance in real-world applications.
Want To Start Adding More Classes To Your Model?
“I want to start annotating” - Get a free trial of Encord here.
"I want to get started right away" - You can find Encord Active on Github here or try the quickstart Python command from our documentation.
"Can you show me an example first?" - Check out this Colab Notebook.
If you want to support the project you can help us out by giving a Star on GitHub ⭐
Want to stay updated?
- Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.
- Join the Slack community to chat and connect.