Software To Help You Turn Your Data Into AI
Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.
When you’re working with datasets or developing a machine learning model, you often find yourself looking for or hypothesizing about subsets of data, labels, or model predictions with certain properties.
Quality metrics form the foundation for finding such data and testing the hypotheses.
The core concept is to use quality metrics to index, slice, and analyze the subject in question in a structured way to perform informed actions when continuously cranking the active learning cycle.
Concrete example: You hypothesize that object "redness" influences the mAP score of your object detection model. To test this hypothesis, you define a quality metric that captures the redness of each object in the dataset. From the quality metric, you slice the data to compare your model performance on red vs. not red objects.
The best way to think of a quality metric in computer vision is:
By design, quality metrics are a very abstract class of functions because the accompanying methodologies are agnostic to the specific properties that the quality metrics express. No matter the specific quality metric, you can:
All of these are possible with Encord Active.
Data quality metrics are those metrics that require only information about the data itself. Within the computer vision domain, this means the raw images or video frames without any labels. This subset of quality metrics is typically used frequently at the beginning of a machine learning project where labels are scarce or perhaps not even existing.
Below are some examples of data quality metrics ranging from simple to more complex:
Image Brightness as a data quality metric on MS COCO validation dataset on Encord.
Image Singularity as a data quality metric on MS COCO validation dataset on Encord.
Label quality metrics apply to labels. Some metrics use image content, while others apply only to the label information. Label quality metrics serve many purposes, but some more frequent ones are surfacing label errors, model failure modes, and assessing annotator performance.
Here are some concrete examples of label quality metrics ranging from simple to more complex:
Object count as a label quality metric on MS COCO validation dataset on Encord.
Annotation Duplicate as a label quality metric on MS COCO validation dataset on Encord.
Model quality metrics also take into account the model predictions. The most obvious use-case for these metrics is acquisition functions, answering the question, "What should I label next?" There are many intelligent ways to leverage model predictions to answer this question. Here is a list of some of the most common ones:
Using Model Confidence as a model quality metric on MS COCO validation dataset on Encord. It shows the predictions where the confidence is between 50% to 80%.
Using Polygon Shape Similarity as a model quality metric on MS COCO validation dataset on Encord. It ranks objects by how similar they are to their instances in previous frames based on Hu moments. The more an object’s shape changes, the lower its score will be.
We have now reviewed some examples of common quality metrics already in Encord Active.
However, every machine learning project is different, and most likely, you have just the idea of what to compute to surface the data that you want to evaluate or analyze.
With Encord Active, you only need to define the per-data-point computation. The tool will handle everything from executing the computation to visualizing your data based on your new metric.
You may want to know when your skeleton predictions are occluded or in which frames of video-specific annotations are missing.
You could also get even smarter and compare your labels with results from foundational models like SAM.
These different use cases are situations where you would build your custom metrics.
You can find the documentation for writing custom metrics here or you can follow any of the links provided above to specific quality metrics and find their implementation on GitHub.
Quality Metrics constitute the foundation of systematically exploring, evaluating, and iterating on machine learning datasets and models.
With Encord Active, it’s easy to define, execute, and utilize quality metrics to get the most out of your data, models, and annotators. We use them for slicing data, comparing data, tagging data, finding label errors, and much more. The true power of these metrics is that they can be arbitrarily specific to a problem at hand.
Ready to improve the performance and quality metrics of your CV models?
Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams.
AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today.
Want to stay updated?
Related Blogs
What is a Quality Metric? When you are working with datasets or developing a machine learning model, you often find yourself looking for or hypothesizing about subsets of data, labels, or model predictions with certain properties. Quality metrics form the foundation for finding such data and testing the hypotheses. The core concept is to use quality metrics to index, slice, and analyze the subject in question in a structured way to continuously perform informed actions when cranking the active learning cycle. Concrete example: You hypothesize that object "redness" influences the mAP score of your object detection model. To test this hypothesis, you define a quality metric that captures the redness of each object in the dataset. From the quality metric, you slice the data to compare your model performance on red vs. not red objects. {{gray_callout_start}}💡 Tip: Find an example notebook for this use-case here. {{gray_callout_end}} Quality Metric Defined We like to think of a quality metric as: [A quality metric is] any function that assigns a value to individual data points, labels, or model predictions in a dataset. By design, quality metrics is a very abstract class of functions because the accompanying methodologies are agnostic to the specific properties that the quality metrics express. No matter the specific quality metric, you can: sort your data according to the metric slice your data to inspect specific subsets find outliers compare training data to production data to detect data drifts evaluate your model performance as a function of the metric define model test-cases and much more all of which are possible with Encord Active. Tip: Try to read the remainder of this post with the idea of "indexing" your data, labels, and model prediction based on quality metrics in mind. The metrics mentioned below are just the tip of the iceberg in terms of what quality metrics can capture -- only imagination limits the space. Data Quality Metric Data quality metrics are those metrics that require only information about the data itself. Within the computer vision domain, this means the raw images or video frames without any labels. This subset of quality metrics is typically used frequently at the beginning of a machine learning project where labels are scarce or perhaps not even existing. Below are some examples of data quality metrics ranging from simple to more complex: Image Brightness as a data quality metric on MS COCO validation dataset on Encord. Source: Author Image Singularity as a data quality metric on MS COCO validation dataset on Encord. Source: Author {{gray_callout_start}}💡 Tip: See the list of all pre-built data quality metrics here. {{gray_callout_end}} Label Quality Metric Label quality metrics apply to labels. Some metrics use image content while others apply only to the label information. Label quality metrics serve many purposes but some of the more frequent ones are surfacing label errors, model failure modes, and assessing annotator performance. Here are some concrete examples of label quality metrics ranging from simple to more complex: Object count as a label quality metric on MS COCO validation dataset on Encord. Source: Author Annotation Duplicate as a label quality metric on MS COCO validation dataset on Encord. Source: Author {{gray_callout_start}}💡 Tip: See the list of all pre-built label quality metrics here. {{gray_callout_end}} Model Quality Metric Model quality metrics also take into account the model predictions. The most obvious use-case for these metrics is acquisition functions; answering the question "What should I label next?". There are many intelligent ways to leverage model predictions to answer this question. Here is a list of some of the most common ones: Using Model Confidence as model quality metric on MS COCO validation dataset on Encord. It shows the predictions where the confidence is between 50% to 80%. Source: Author Using Polygon Shape Similarity as model quality metric on MS COCO validation dataset on Encord. It ranks objects by how similar they are to their instances in previous frames based on Hu moments. The more an object’s shape changes, the lower its score will be. {{gray_callout_start}}💡 Tip: To utilize acquisition functions with Encord Active, have a look here. {{gray_callout_end}} Custom Quality Metrics We have now gone over some examples of common quality metrics that already exist in Encord Active. However, every machine learning project is different, and most likely, you have just the idea of what to compute in order to surface the data that you want to evaluate or analyze. With Encord Active, you only need to define the per-data-point computation and the tool will take care of everything from executing the computation to visualizing your data based on your new metric. Perhaps, you want to know when your skeleton predictions are occluded or in which frames of video-specific annotations are missing. You could also get even smarter and compare your labels with results from foundational models like SAM. These different use-cases are situations in which you would be building your own custom metrics. You can find the documentation for writing custom metrics here or you can follow any of the links provided above to specific quality metrics and find their implementation on GitHub. If you need assistance developing your custom metric, the [slack channel][ea-slack] is also always open. Conclusion Quality Metrics constitute the foundation of systematically exploring, evaluating and iterating on machine learning datasets and models. We use them for slicing data, comparing data, tagging data, finding label errors, and much more. The true power of these metrics is that they can be arbitrarily specific to a problem at hand. With Encord Active, it is super easy to define, execute, and utilize quality metrics to get the most out of your data, your models, and your annotators. Footnotes [^1]: The difficulty metric is inspired by this paper. [^2]: With COCO, model performance is already evaluated against multiple different subsets of labels. For example, scores like $AP^{\text{small}}$ and $AR^{\text{max=10}}$ from COCO can be expressed as label quality metrics and evaluated with Encord Active.
April 19
3 min
As machine learning models become increasingly complex and ubiquitous, it's crucial to have a practical and methodical approach to evaluating their performance. But what's the best way to evaluate your models? Traditionally, average accuracy scores like Mean Average Precision (mAP) have been used that are computed over the entire dataset. While these scores are useful during the proof-of-concept phase, they often fall short when models are deployed to production on real-world data. In those cases, you need to know how your models perform under specific scenarios, not just overall. At Encord, we approach model evaluation using with a data-centric approach using model test cases. Think of them as the "unit tests" of the machine learning world. By running your models through a set of predefined test cases before continuing model deployment or prior to deployment, you can identify any issues or weaknesses and improve your model's accuracy. Even after deployment, model test cases can be used to continuously monitor and optimize your model's performance, ensuring it meets your expectations. In this article, we will explore the importance of model test cases and how you can define them using quality metrics. We will use a practical example to put this framework into context. Imagine you’re building a model for a car parking management system that identifies car throughput, measures capacity at different times of the day, and analyzes the distribution of different car types. You've successfully trained a model that works well on Parking Lot A in Boston with the cameras you've set up to track the parking lot. Your proof of concept is complete, investors are happy, and they ask you to scale it out to different parking lots. Car parking photos are taken under various weather and daytime conditions. However, when you deploy the same model in a new parking house in Boston and in another town (e.g., Minnesota), you find that there are a lot of new scenarios you haven't accounted for: In the parking lot in Boston, the cameras have slightly blurrier images, different contrast levels, and the cars are closer to the cameras. In Minnesota, there is snow on the ground, different types of lines painted on the parking lot, and new types of cars that weren't in your training data. This is where a practical and methodical approach to testing these scenarios is important. Let's explore the concept of defining model test cases in detail through five steps: Identify Failure Mode Scenarios Define Model Test Cases Evaluate Granular Performance Mitigate Failure Modes Automate Model Test Cases Identify Failure Mode Scenarios Thoroughly testing a machine learning model requires considering potential failure modes, such as edge cases and outliers, that may impact its performance in real-world scenarios. Identifying these scenarios is a critical first step in the testing process of any model. Failure mode scenarios may include a wide range of factors that could impact the model's performance, such as changing lighting conditions, unique perspectives, or variations in the environment. Let's consider our car parking management system. In this case, some of the potential edge cases and outliers could include: Snow on the parking lot Different types of lines painted on the parking lot New types of cars that weren't in your training data Different lighting conditions at different times of day Different camera angles, perspectives, or distance to cars Different weather conditions, such as rain or fog By identifying scenarios where your model might fail, you can begin to develop model test cases that evaluate the model's ability to handle these scenarios effectively. It's important to note that identifying model failure modes is not a one-time process and should be revisited throughout the development and deployment of your model. As new scenarios arise, it may be necessary to add new test cases to ensure that your model continues to perform effectively in all possible scenarios. Furthermore, some scenarios might require specialized attention, such as the addition of new classes to the model's training data or the implementation of more sophisticated algorithms to handle complex scenarios. For example, in the case of adding new types of cars to the model's training data, it may be necessary to gather additional data to train the model effectively on these new classes. Define Model Test Cases Defining model test cases is an important step in the machine learning development process as it enables the evaluation of model performance and the identification of areas for improvement. As mentioned earlier, this involves specifying classes of new inputs beyond those in the original dataset for which the model is supposed to work well, and defining the expected model behavior on these new inputs. Defining test cases begins by building hypotheses based on the different scenarios the model is likely to encounter in the real world. This can involve considering different environmental conditions, lighting conditions, camera angles, or any other factors that could affect the model's performance. Hereafter you define the expected model behavior under the scenario. My model should achieve X in the scenario where Y It is crucial that the test case is quantifiable. That is, you need to be able to measure whether the test case passes or not. In the next section, we’ll get back to how to do this in practice. For the car parking management system, you could define your model test cases as follows: The model should achieve an mAP of 0.75 for car detection when cars are partially covered in or surrounded by snow. The model should have an accuracy of 98% on parking spaces when the parking lines are partially covered in snow. The model should achieve an mAP of 0.75 for car detection in parking houses under poor light conditions. Evaluate Granular Performance Once the model test cases have been defined, the performance can be evaluated using appropriate performance metrics for each model test case. This might involve measuring the model's mAP, precision, and recall of data slices related to specified test cases. To find the specific data slices relevant to your model test case we recommend using Quality metrics. Quality metrics are useful to evaluate your model's performance based on specific criteria, such as object size, blurry images, or time of day. In practice, they are additional parametrizations added on top of your data, labels, and model predictions and they allow you to index your data, labels, and model predictions in semantically relevant ways. Read more here. Quality metrics can then be used to identify data slices related to your model test cases. To evaluate a specific model text case, you identify a slice of data that has the properties that the test case defines and evaluate your model performance on that slice of data. {{check_out_on_github_visual}} Mitigate Failure Modes If your model test case fails and the model is not performing according to your expectations in the defined scenario, you need to take action to improve performance. This is where targeted data quality improvements come in. These improvements can take various shapes and forms, including: Data collection campaigns: Collect new data samples that cover identified scenarios. Remember to ensure data diversity by obtaining samples from different locations and parking lot types. Nonetheless, you should regularly update the dataset to account for new scenarios and maintain model performance. Relabeling Campaigns: If your failure modes are due to label errors in the existing dataset it would be beneficial to correct any inaccuracies or inconsistencies in labels before collecting new data. If your use case is complex, we recommend collaborating with domain experts to ensure high-quality annotations. Data augmentation: By applying methods such as rotation, color adjustment, and cropping, you can increase the diversity of your dataset. Additionally, you can utilize techniques to simulate various lighting conditions, camera angles, or environmental factors that the model might encounter in real-world scenarios. Implementing domain-specific augmentation techniques, such as adding snow or rain to images, can further enhance the model's ability to generalize to various situations. Synthetic data generation: Creating artificial data samples can help expand the dataset, but it is essential to ensure that the generated data closely resembles real-world scenarios to maintain model performance. Combining synthetic data with real data can increase the dataset size and diversity, potentially leading to more robust models. Automated Model Test Cases Once you've defined your model test cases, you need a way to select data slices and test them in practice. This is where quality metrics and Encord Active comes in. Encord Active is an open-source data-centric toolkit that allows you to investigate and analyze your data distribution and model performance against these quality metrics, in an easy and convenient way. The chart above is automatically generated by Encord Active using uploaded model predictions. The chart shows the dependency between model performance and each metric - how much is model performance affected by each metric. With quality metrics, you identify areas where the model is underperforming, even if it's still achieving high overall accuracy. Thus they are perfect for testing your model test cases in practice. For example, the quality metric that specifically measures the model's performance in low-light conditions (see “Brightness” among quality metrics in the figure above) will help you to understand if your car parking management system model will struggle to detect cars in low-light conditions. You could also use the “Object Area” quality metric to create a model test case that checks if your model has issues with different sizes of objects (different distance to cars results in different object areas). One of the benefits of Encord Active is that it is open-source and it enables you to write your own custom quality metrics to test your hypotheses around different scenarios. Tip: If you have any specific things you’d like to test please get in touch with us and we would gladly help you get started. {{check_out_on_github_visual}} This means that you can define quality metrics that are specific to your use case and evaluate your model's performance against them. For example, you might define a quality metric that measures the model's performance in heavy rain conditions (a combination of low Brightness and Blur). Finally, if you would like to visually inspect the slices that your model is struggling with you can visualize model predictions (both TP, FP, and FNs) if you. Tip: You can use Encord Annotate to directly correct labels if you spot any outright label errors. Back to the car parking management system example: Once you have defined your model test cases and evaluated your model's performance against using the quality metrics, you can find low-performing "slices" of data. If you've defined a model test case for the scenario where there is snow on the ground in Minnesota, you can: Compute the quality metric that measures its performance in snowy conditions. Investigate how much this metric affects the overall performance. Filter the slice of images where your model performance is low. Set in motion a data collection campaign for images in similar conditions. Set up an automated model test that always tests for performance on snowy images in your future models. Tip: If you already have a database of unlabeled data you can leverage similarity search to find images of interest for your data collection campaigns. Benefits of The Model Test Case Framework As machine learning models continue to evolve, evaluating them is becoming more important than ever. By using a model test case framework, you can gain a more comprehensive understanding of your model's performance and identify areas for improvement. This approach is far more effective and safe than relying solely on high-level accuracy metrics, which can be insufficient in evaluating your model performance in real-world scenarios. So to summarize, the benefits of using model test cases instead of only high level accuracy performance metrics are: Enhanced understanding of your model: You gain a thorough understanding of your model by evaluating it in detail (rather than depending on one overall metric). systematically analyzing its performance will improve your (and your team's) confidence in its effectiveness during deployment and augments the model's credibility. Allows you to concentrate on addressing model failure modes: Armed with an in-depth evaluation from Encord Active, efforts to improve a model can be directed toward its weak areas. Focusing on the weaker aspects of your model accelerates its development, optimizes engineering time, and minimizes data collection and labeling expenses. Fully customizable to your specific case: One of the benefits of using open-source tools like Encord Active is that it enables you to write your own custom quality metrics and set up automated triggers without having to rely on proprietary software. If you're interested in incorporating model test cases into your data annotation and model development workflow, don't hesitate to reach out. Conclusion In this article, we start off by understanding why defining model test cases and using quality metrics to evaluate model performance against them is essential. It is a practical and methodical approach for identifying data-centric failure modes in machine learning models. By defining model test cases, evaluating model performance against quality metrics, and setting up automated triggers to test them, you can identify areas where the model needs improvement, prioritize data labeling efforts accordingly, and improve the model's credibility with your team. Furthermore, it changes the development cycle from reactive to proactive, where you can find and fix potential issues before they occur, instead of deploying your model in a new scenario and finding out that you have poor performance and trying to fix it. Open-source tools like Encord Active enable users to write their own quality metrics and set up automated triggers without having to rely on proprietary software. This can lead to more collaboration and knowledge sharing across the machine-learning community, ultimately leading to more robust and effective machine-learning models in the long run.
March 22
5 min
Adding new classes to a production computer vision model may be necessary for a number of reasons, which we’ve explored in more detail below: improved accuracy increased versatility increased robustness When adding new classes, it is important to have enough high-quality data, use robust evaluation methods, and monitor the performance of the model over time to ensure its continued effectiveness. Adding new classes to a computer vision model can lead to improved accuracy, increased versatility, and the ability to handle a wider range of inputs, but only when it is done well. How do I know if I need to add new classes to my computer vision model? When developing a computer vision model and putting it into production, it is essential to continually benchmark its performance and consider adding new classes where performance is lacking. Several signs may indicate the need for adding new classes, including: Decreased accuracy on new data Changes in business requirements Changes in the environment Insufficient data for existing classes Overfitting Decreased accuracy on new data If you are observing a drop in accuracy when applying your model to new data, it may be due to the fact that the model has not encountered examples of the new classes present in the data or there are some errors in your dataset that you need to fix. To improve accuracy, you can add these classes to your training set and retrain your model. Changes in business requirements As business needs evolve, it may be necessary to add new classes to your model to account for the new objects or scenes that are now relevant to your application. For example, if you previously developed a model to recognize objects in a warehouse, but now need to extend it to recognize objects in a retail store, you may need to add new classes to account for the different types of products and displays. Changes in the environment Changes in the environment in which your model is being used can also impact its performance. For example, if the lighting conditions have changed or if there is a new background in the real-world images the detection model is analyzing, it may be necessary to add new classes to account for these changes. Insufficient data for existing classes If the data you have collected for your existing classes is not sufficient to train a high-quality model, adding new classes can help to improve overall performance. This is because the model will have access to more data to learn from. Overfitting Overfitting occurs when a model memorizes specific examples in the training data instead of learning general patterns. If you suspect that your model is overfitting, it may be because you have not provided it with enough variability in the training data. In this case, adding new classes can help to reduce overfitting by providing more diverse examples for the model to learn from. Quality and Quantity of Data It is important to consider the quality and quantity of data when adding new classes. A good rule of thumb is to have at least 100-1000 examples per class, but the number may vary depending on the complexity of the classes and the size of your model. The data should also be diverse and representative of the real-world scenarios in which the model will be used. To evaluate the effectiveness of the model with the added classes, it is important to use robust evaluation methods such as cross-validation. This will provide a reliable estimate of the model's performance on unseen data and help to ensure that it is not overfitting to the new data. Additionally, it is important to monitor the performance of your model over time and to be proactive in adding new classes if needed. Regular evaluation and monitoring can help you quickly identify when new classes are needed and ensure that your model remains up-to-date and effective. What are the Benefits of Adding New Classes to a Computer Vision Model? Adding new classes to a computer vision model can have several benefits, including: improved accuracy increased versatility the ability to handle a wider range of inputs. Improved Accuracy One of the main benefits of adding new classes to a computer vision model is improved accuracy. By adding new classes, the model can learn to recognize a wider range of objects, scenes, and patterns, leading to better performance in real-world applications, such as facial recognition or self-driving cars. This can be particularly important for tasks like image classification, object detection, and semantic segmentation, where the goal is to accurately identify and classify the elements in an image or video. With a larger number of classes, the model can learn to distinguish between similar objects, such as different breeds of dogs or species of flowers, and better generalize to unseen examples. Results from the paper "an image is worth 16x16 words: transformers for image recognition at scale" show that by using more data (JFT-300M dataset has ~375 million annotated images) you can significantly improve the model's performance. Increased Versatility Another benefit of adding new classes is increased versatility. By expanding the range of objects and scenes the model can recognize, it can be applied to a wider range of use cases and problems. For example, a model trained on a large image dataset of natural images can be adapted to a specific domain, such as medical imaging, by adding classes relevant to that domain. This can help the model to perform well in more specialized applications, such as disease diagnosis or surgical planning. Increased Robustness Adding new classes can also help the model handle a wider range of inputs. For example, a model trained on a diverse set of images can be more robust to variations in lighting, viewpoint, and other factors that can affect image quality. This can be especially important for real-world applications, where the images used to test the artificial intelligence model may be different from those used during training. How To Add New Classes To Your Computer Vision Model Adding new classes to a computer vision model is a crucial step in improving its accuracy and functionality– There are several steps involved in this process, including data collection, model training, debugging, and deployment. To make this task easier, various tools and software libraries have been developed. There are three main ways to prepare a new class for your computer vision model: Manual collection and annotation Generating synthetic data Active learning Let’s take a look at them one by one. Manual Dataset Collection and Annotation Annotation refers to the process of identifying and labeling specific regions in video or image data. This technique is mainly used for image classification through supervised learning. The annotated data serves as the input for training machine learning or deep learning models. With a large number of annotations and diverse image variations, the model can identify the unique characteristics of the images and videos and learn to perform tasks such as object detection or object tracking, image classification, and others, depending on the type of model being trained. There are various types of annotations, including 2D bounding boxes, instance segmentation, 3D cuboid boxes, and keypoint annotations. Instance segmentation involves identifying and outlining multiple objects in an image. - Bounding boxes or 3D cuboid boxes can be drawn around objects and assigned labels for object detection. Polygonal outlines can be traced around objects for semantic and instance segmentation. Keypoints and landmarks can also be identified and labeled for object landmark detection, and straight lines can be marked for lane detection. For image classification, images are grouped based on labels. To prepare and annotate your own dataset, you can either record videos, take photos, or search for freely available open source datasets online. If your company already has collected a dataset you can connect it to a platform via a cloud bucket integration (S3, Azure, GCP etc.). However, before you can use these images for training, you need to annotate them, as opposed to using data from an already annotated dataset. When collecting data, make sure to keep it as close to your intended inference environment as possible, considering all aspects of the input images, including lighting, angle, objects, etc. For example, if you want to build machine learning models that detect license plates you must take into account different light and weather conditions. There are many tools available for annotating images for computer visiondatasets. Each tool has its own set of features and is designed for a specific type of project. Encord Annotate: An annotation platform for AI-assisted image and video annotation and dataset management. It's the best option for teams that are looking to use AI automation to make the video and image annotation process more efficient. CVAT (Computer VisionAnnotation Tool): A free, open-source, web-based annotation toolkit built by Intel. CVAT supports four types of annotations (points, polygons, bounding boxes, and polylines). Labelbox: A US-based data annotation platform. Appen: A data labeling tool founded in 1996, making it one of the first and oldest solutions in the market. These are just a few of the tools available for adding new classes to a computer vision dataset. The best tool for you will depend on the specific needs of your project, such as: The size of your dataset. The types of annotations you need to make/ The platform you are using. In an ideal scenario, the annotation tool should seamlessly integrate into your machine learning workflow. It should be efficient, user-friendly, and allow for quick and accurate annotation, enabling you to focus on training your models and improving their performance. The tool should also have the necessary functionalities and features to meet your specific annotation requirements, making the overall process of annotating data smoother and more efficient. {{try_encord}} Synthetic Datasets Another way to create a dataset is to generate synthetic data. This method can be especially useful for training in unusual circumstances, as it allows you to create a much larger dataset than you could otherwise obtain from real-world sources. As a result, your model is likely to perform better and achieve better results. However, it is not recommended to use only synthetic data or put synthetic data into validation/test data. Generating synthetic computer vision datasets is another option for adding new classes to your model. There are several tools available for this purpose: Unity3D/Unreal Engine: Popular game engines that can be used to generate synthetic computer visiondatasets by creating virtual environments and simulating camera movements. Blender: A free and open-source 3D creation software that can be used to generate synthetic computer visiondatasets by creating 3D models and rendering them as images. AirSim: an open-source, cross-platform simulation environment for autonomous systems and robotics, developed by Microsoft. It uses Unreal Engine for physically and visually realistic simulations and allows for testing and developing autonomous systems such as drones, ground vehicles, and robotic systems. CARLA: an open-source, autonomous driving simulator. It provides a platform for researchers and developers to test and validate their autonomous vehicle algorithms in a simulated environment. CARLA simulates a variety of real-world conditions, such as weather, traffic, and road layouts, allowing users to test their algorithms in a range of scenarios. It also provides a number of pre-built maps, vehicles, and sensors, as well as an API for creating custom components. Generative adversarial networks (GANs) allow you to generate synthetic data by setting two neural networks to compete against each other. One generates the data and the other identifies whether it's real or synthetic. Through a process of iteration, the models adjust their parameters to improve their performance, with the discriminator becoming better at distinguishing real from synthetic data and the generator becoming more effective at creating accurate synthetic data. GANs can be used to supplement training datasets that lack sufficient real-world data, but there are also challenges to using synthetic data that need to be considered. These tools can be used to generate synthetic data for various computer vision tasks, such as object detection, segmentation, and scene understanding. The choice of tool will depend on the specific requirements of your project, such as the type of data you need to generate, the complexity of the scene, and the resources available. Annotation with Active Learning Active learning is a machine learning technique that trains models by allowing them to actively query annotators for information that will help improve their performance. The process starts with a small initial subset of labeled data from a large dataset. The model uses this labeled data to make predictions on the remaining unlabeled data. ML engineers and data scientists then evaluate the model's predictions to determine its level of certainty. A common method for determining uncertainty is by looking at the entropy of the probability distribution of the prediction. For example, in image classification, the model reports a probability of confidence for each class considered for each prediction made. If the model is highly confident in its prediction, such as a 99 percent probability that an image is a motorcycle, then it has a high level of certainty. If the model has low certainty in its prediction, such as a 55 percent probability that an image is a truck, then the model needs to be trained on more labeled images of trucks. Another example is the classification of images of animals. After the model is initially trained on a subset of labeled data, it can identify cats with high certainty but is uncertain about how to identify a dog, reporting a 51 percent probability that it is not a cat and a 49 percent probability that it is a cat. In this case, the model needs to be fed more labeled images of dogs so that the ML engineers can retrain the model and improve its performance. The samples with high uncertainty are sent back to the annotators, who label them and provide the newly labeled data to the ML engineers. The engineers then use this data to retrain the model, and the process continues until the model reaches an acceptable performance threshold. This loop of training, testing, identifying uncertainty, annotating, and retraining allows the model to continually improve its performance. Active learning pipelines also help ML engineers identify failure modes such as edge cases, where the model makes a prediction with high uncertainty, indicating that the data does not fit into one of the categories that the model has been designed to detect. The model flags these outliers, and the ML engineers can retrain the model with the labeled sample to help the model learn to identify these edge cases. Using active learning in machine learning can make model training faster and cheaper while reducing the burden of data labeling for annotators. Instead of labeling all the data in a massive dataset, organizations can intelligently select and label a portion of the data to increase model performance and reduce costs. With an AL pipeline, ML teams can prioritize labeling the data that is most useful for training the model and continuously adjust their training as new data is annotated and used for training. Surprisingly, active learning is also useful even when ML engineers have a large amount of already labeled data. Training the model on every piece of labeled data in a dataset can be a poor allocation of resources, and active learning can help select a subset of data that is most useful for training the model, reducing computational costs. Active learning is a powerful ML technique that allows models to actively seek information that will help improve their performance. By reducing the burden of data labeling and optimizing the use of computational resources, active learning can help organizations achieve better results more efficiently and cost-effectively. However, an active learning pipeline can be hard to implement. Encord Active is an open-source active learning tool that includes visualizations, workflows, and a set of data and labels quality metrics and model performance analysis based on the model's predictions. It allows you to add the model's predictions, filter them by model's confidence and export them into your annotation tool (for example Encord Annotate). What Do You Do Once You’ve Added New Classes? Once you’ve added new classes to your computer vision model, there are several steps you can take to optimize its performance: Evaluate the model Fine-tune the model Data augmentation Monitor performance Evaluate The Model The first step after adding new classes to your model is to evaluate its performance. This involves using a dataset of images or videos to test the model and see how well it can recognize the new classes. You can use metrics like accuracy, precision, recall, and F1 score to quantify the model's performance and compare it with baseline models. You can also visualize the results and check model performance using confusion matrices, precision-recall curves, and ROC curves. These evaluations will help you identify areas where the model is performing well and where it needs improvement. Fine-Tune The Model Based on the evaluation results, you may need to fine-tune the model to optimize its performance for the new classes. Fine-tuning can involve adjusting the model's hyperparameters, such as learning rate or weight decay, or adjusting the architecture of the model itself. You can also use techniques like transfer learning to leverage pre-trained models and fine-tune them for your specific task. Data Augmentation Another approach to improving the model's performance is to use data augmentation. This involves transforming the existing training data to create new, synthetic examples. For example, you can use techniques like random cropping, flipping, or rotation to create new training samples. By increasing the size of the training dataset, data augmentation can help to prevent overfitting and improve the model's generalization ability. Monitor Performance Once you’ve fine-tuned the model, it’s important to monitor its performance over time. This can involve tracking the model's behavior on a test set or in a real-world deployment and adjusting the model as needed to keep it up-to-date. Monitoring performance can help to ensure that the model continues to function well when new classes are added and as the underlying data distribution changes. Adding new classes to a computer vision model is just the first step in optimizing its performance. By evaluating the model, fine-tuning its parameters, using data augmentation and regularization, and monitoring its performance, you can make the model more accurate, versatile, and robust to new classes. These steps are crucial for ensuring that your model remains effective and up-to-date over time and for achieving the best possible performance in real-world applications. Want To Start Adding More Classes To Your Model? “I want to start annotating” - Get a free trial of Encord here. "I want to get started right away" - You can find Encord Active on Github here or try the quickstart Python command from our documentation. "Can you show me an example first?" - Check out this Colab Notebook. If you want to support the project you can help us out by giving a Star on GitHub ⭐ Want to stay updated? Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning. Join the Slack community to chat and connect.
March 7
5 min
Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.