The Complete Guide to Image Annotation for Computer Vision
Image annotation is a crucial part of training AI-based computer vision models. Almost every computer vision model needs structured data created by human annotators.
Images are annotated to create training data for computer vision models. Training data is fed into a computer vision model that has a specific task to accomplish (for example, identifying black Ford cars of a specific age and design across a dataset). That model can be connected to an active learning pipeline to improve how the model learns, thereby becoming a production model.
In this post, we cover the goals of image annotation, the difference between classification and annotation, what to look for in image annotation tools, and best practices to improve image annotation for your computer vision projects.
What is Image Annotation?
Inputs make a huge difference to project outputs. In machine learning (ML) teams, the data-centric AI approach recognizes the importance of the data a model is trained on, even more so than the model or sets of models that are used.
So, if you’re an annotator working on an image or video annotation project, creating the most accurately labeled inputs can mean the difference between success and failure. Annotating images and objects within images correctly will save you a lot of time and effort later on.
Computer vision models and tools aren’t yet smart enough to correct human errors at the project's manual annotation and validation stage. Training datasets are more valuable when the data they contain has been correctly labeled.
As every annotator team manager knows, image annotation is more nuanced and challenging than many realize. It takes time, skill, a reasonable budget, and the right tools to make these projects run smoothly and produce the outputs data operations and ML teams and leaders need.
Image annotation is crucial to the success of computer vision models. Image annotation is the process of manually labeling and annotating images in a dataset to train artificial intelligence and machine learning computer vision models.
What is the Goal of Image Annotation?
Image annotation aims to accurately label and annotate images that are used to train a computer vision model.
Labeled images create a dataset. The model learns from the dataset. At the start of a project, once the first group of annotated images or videos are fed into it, the model might be 70% accurate. ML or data ops teams then ask for more data to train it, to make it more accurate.
Active learning pipelines are a way of the model telling you what additional data it needs to become more accurate (i.e. need more datasets from images that are in low light, have shadows, etc.)
Image annotation can either be done completely manually or with help from automation to speed up the labeling process.
The wholly manual annotation takes a lot more time than using automation and also increases the chances of labeling errors.
Doing this with automated data annotation tools reduces the time and resources a project requires at the annotation and training stage. Automation tools and features are algorithms that accelerate some of the most time-consuming tasks, such as creating segment masks. Once the manual input is complete, computer vision models take over, learning from the training data and iterating every time new data is introduced to improve the accuracy and outputs.
What is Image Annotation in Machine Learning?
Machine learning computer vision models require huge datasets involving thousands of images. Every single one of them needs labeling accurately to create the training dataset, according to the project outcomes, goals, and objectives.
In the healthcare sector, this could involve annotating and labeling medical images (such as X-Rays, DICOM, or NIfTI files) to accurately identify malignant tumors compared to benign tumors. In manufacturing, this could involve labeling images that show faults in machinery or products in a factory. In sports analytics, computer vision machine learning models and image labeling is useful for highlighting injuries.
There are thousands of use cases for image annotation and deploying machine learning tools in computer vision models.
What is the Difference Between Classification and Annotation in Computer Vision?
Although classification and annotation are both used to organize and label images to create high-quality image data, the processes and applications involved are somewhat different.
Image classification is usually an automatic task performed by image labeling tools.
Image classification comes in two flavors: “supervised” and “unsupervised”. When this task is unsupervised, algorithms examine large numbers of unknown pixels and attempt to classify them based on natural groupings represented in the images being classified.
Supervised image classification involves an analyst trained in datasets and image classification to support, monitor, and provide input to the program working on the images.
On the other hand, and as we’ve covered in this article, annotation in computer vision models always involves human annotators. At least at the annotation and training stage of any image-based computer vision model. Even when automation tools support a human annotator or analyst, creating bounding boxes or polygons and labeling objects within images requires human input, insight, and expertise.
What An Image Annotation Tool Needs to Offer?
Before we get into the features annotation tools need, annotators and project leaders need to remember that the outcomes of computer vision models are only as good as the human inputs. Depending on the level of skill required, this means making the right investment in human resources before investing in image annotation tools.
When it comes to picking image editors and annotation tools, you need one that can:
- Create labels for any image annotation use case;
- Create frame-level and object classifications
- And comes with a wide range of powerful automation features.
While there are some fantastic open-source image annotation tools out there (like CVAT), they don’t have this breadth of features, which can cause problems for your image labeling workflows further down the line. Now, let’s take a closer look at what this means in practice.
Image labeling for computer vision in Encord
#1: Labels For Any Image Annotation Use Case
An easy-to-use annotation interface, with the tools and labels for any image annotation type, is crucial to ensure annotation teams are productive and accurate. It's best to avoid any image annotation tool that comes with limitations on the types of annotations you can apply to images.
Ideally, annotators and project leaders need a tool that can give them the freedom to use the five most common types of annotations, including bounding boxes, polygons, polylines, keypoints, and primitives (more about these below). Annotators also need the ability to add detailed and descriptive labels and metadata.
During the setup phase, detailed and accurate annotations and labels produce more accurate and faster results when computer vision AI models process the data and images.
#2: Classification, Object Detection, Segmentation
- Classification is a way of applying nested and higher-order classes and classifications to individuals and an entire series of images. It’s a useful feature for self-driving cars, traffic surveillance images, and visual content moderation.
- Object detection is a tool for recognizing and localizing objects in images with vector labeling features. Once an object is labeled a few times during the data training stage, automated tools should label the same object over and over again when processing a large volume of images. It’s an especially useful feature in gastroenterology and other medical fields, in the retail sector, and in analyzing drone surveillance images.
- Segmentation is a way of assigning a class to each pixel (or group of pixels) within images using segmentation masks. Segmentation is especially useful in numerous medical fields, such as stroke detection, pathology in microscopy, and the retail sector (e.g. virtual fitting rooms).
#3: Increase Outputs With Automation Features
When using a powerful image annotation tool, annotators can make massive gains from automation features. With the right tool, you can import model predictions programmatically.
Using manually labeled and annotated image datasets, you can use automation features to pre-annotate a series of images that will help you reduce annotation costs and accelerate project implementation and successful outcomes.
What are the Most Common Types of Image Annotation?
There are five most commonly used types of image annotations — bounding boxes, polygons, polylines, keypoints, and primitives (also known as skeleton templates) — and we cover each of them in more detail here:
- Bounding Box: Drawing a bounding box around an object in an image — such as an apple or tennis ball — is one of several ways to annotate and label objects. With bounding boxes, you can draw arbitrary lines around any object, and then apply a label to that object. When this training data is fed into machine learning computer vision models, the annotations and labels applied at the manual annotation stage are what the model uses to annotate the rest of the images in a dataset automatically.
- Polygon: A polygon is another annotation type that can be drawn freehand. On images, these annotation lines can be used to outline static objects, such as a tumor in medical image files.
- Polyline: A polyline is a way of annotating and labeling something static that continues throughout a whole series of images, such as a road or railway line. Often, a polyline is applied in the form of two static and parallel lines. Once this training data is uploaded to a computer vision model, the AI-based labeling will continue where the lines and pixels correspond from one image to another.
- Keypoints are useful for outlining and pinpointing the features of specific objects, such as the human face. Keypoint annotation is a valuable way of labeling and identifying countless shapes and objects. Once an object is highlighted and labeled, computer vision models take this training data and apply it from one image to the next, dramatically accelerating the annotation process.
- Primitives (aka, skeleton templates) are used for specialized annotations to templatize specific shapes — such as 3D cuboids, pose estimation skeletons, and rotated bounding boxes.
Now let’s take a look at some best practices annotators can use for image annotation to create training datasets for computer vision models.
Best Practices for Image Annotation for Computer Vision
Ensure raw data (images) are ready to annotate
At the start of any image-based computer vision project, you need to ensure the raw data (images) are ready to annotate. Data cleansing is an important part of any project. Low-quality and duplicate images are usually removed before annotation work can start.
Understand and apply the right label types
Next, annotators need to understand and apply the right types of labels, depending on what an algorithmic model is being trained to achieve. If an AI-assisted model is being trained to classify images, class labels need to be applied. However, if the model is being trained to apply image segmentation or detect objects, then the coordinates for boundary boxes, polylines, or other semantic annotation tools are crucial.
Create a class for every object being labeled
AI/ML or deep learning algorithms usually need data that comes with a fixed number of classes. Hence the importance of using custom label structures and inputting the correct labels and metadata, to avoid objects being classified incorrectly after the manual annotation work is complete.
Annotate with a powerful user-friendly data labeling tool
Once the manual labeling is complete, annotators need a powerful user-friendly tool to implement accurate annotations that will be used to train the AI-powered computer vision model. With the right tool, this process becomes much simpler, cost, and time-effective.
Annotators can get more done in less time, make fewer mistakes, and have to manually annotate far fewer images before feeding this data into computer vision models.
And there we go, the features and best practices annotators and project leaders need for a robust image annotation process in computer vision projects!
Image annotation in the Encord platform
At Encord, our active learning platform for computer vision is used by a wide range of sectors - including healthcare, manufacturing, utilities, and smart cities - to annotate 1000s of images and accelerate their computer vision model development.
Experience Encord in action. Dramatically reduce manual image annotation tasks, generating massive savings and efficiencies.
Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams.
AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today.
Want to stay updated?
Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.
Join our Discord channel to chat and connect.