profile

Denis Gavrielov September 14, 2022

How to Automate Video Annotation For Machine Learning

blog image

Automated video labeling saves companies a lot of time and money by accelerating the speed and quality of manual video labeling, and eventually taking over the bulk of video annotation work. 

Once you start using machine learning and AI-based algorithms for video annotation — using large amounts of labeled videos — and ensuring those videos are accurately labeled is crucial to the success of the project. Generating labels manually during video annotation is highly laborious, time-consuming, costs a lot of money, and requires a whole team of people.

Businesses and organizations often outsource this work to save costs. However, this rarely makes the task any quicker and can often cause problems with quality. Automated video annotation can solve most of these problems, reducing manual inputs, saving time and money, and ensuring you can annotate and label much larger datasets while maintaining consistent quality. 

In this post, we look at four ways to automate video annotation while ensuring the quality and consistency of your labels

#1: Multi-Object Tracking (MOT) to Ensure Continuity from Frame to Frame 

Tracking objects automatically is a powerful automated video annotation feature. Once you’ve labeled an object, you want to ensure it’s tracked correctly and consistently from one frame to the next, especially if it’s moving and changing direction or speed. Or if the background and light levels change, such as a shift from day to night. 

Not only that, but if you’ve labeled multiple objects, you need an AI-based video annotation tool capable of tracking every single one of them. The most powerful automated video labeling tool tracks pixels within an annotation from one frame to the next. This shouldn't be a problem even if you are tracking multiple objects with automatic annotation. 

Multi-object tracking is especially useful when processing videos through a machine learning automation tool and an asset when analyzing drone footage, surveillance videos, and in the healthcare and manufacturing sectors. Healthcare companies often need to annotate and analyze surgical or gastroenterology videos, whereas manufacturers need clearer, annotated videos of assembly lines. 

null

Automated object tracking for video annotation in Encord

#2: Use Interpolation to Fill in the Gaps 

In automated video annotation or labeling, interpolation is the act of propagating labels between two keyframes. Say an annotation team has already manually labeled objects within hundreds of key frames, using bounding boxes or polygons — at the start and end of a video. Interpolation accelerates the annotation process, filling in the details within the unannotated frames. 

However, you must use interpolation carefully, at least when starting out with a video annotation project. There’s always a trade-off between speed and quality. Dependent, of course, on the quality of the labels applied and the complexity of the labeling agents used during the model training stage. 

For example, a polygon applied to a complex multi-faceted object that’s moving from one frame to the next might not interpolate as easily as a simple object with a bounding box around it that’s moving slowly. As annotators know, this entirely depends on how much is changing in the video from one frame to the next. 

When polygons are drawn on an object in a video, supported by a proprietary algorithm that runs without a representational model, it can tighten the perimeter of the polygon, interpolate, and track the various segments (in this case, clothes) within a moving object, e.g., a person. 

null

Interpolation to support video annotation in Encord

#3: Use Micro-Models to Accelerate AI-assisted Video Annotation 

In most cases, machine learning (ML) models and AI-based algorithms need vast amounts of data before they can produce meaningful results. Not only that, but the data going in should be clean and consistent. Otherwise, you risk the whole project taking much longer than anticipated, or having to start over again. 

Automated video labeling and annotation is complicated. This method is also known as model-assisted labeling (MAL), AI-assisted labeling (AAL). This type of labeling is far more complex than annotating static images or applying ML to vast Excel spreadsheets and other data sources. 

Conversely, micro-models are powerful, tightly-scoped approaches that over-fit data models to bootstrap your video annotation tasks. Training machine learning algorithms using micro-models is an iterative process that requires manual annotation and labeling at the start. However, you don’t need nearly as much manual work or time spent training the model as you would with other video annotation platforms. 

In some cases, you can train micro-models on as few as five labled frames. As we outline in another post, “micro-models are annotation specific models that are overtrained to a particular task or particular piece of data.” 

Micro-models are best applied to a narrow domain, e.g., automatically annotating particular objects throughout a long video, and the training data required is minimal. It can take minutes to train a micro-model and only minutes or hours to run through the development cycle. Micro-models save vast amounts of time and money for organizations in the healthcare, manufacturing, or research sectors, especially when annotating complex moving objects. 

#4: Auto Object Segmentation to Improve the Quality of Object Segments 

Auto-segmentation is drawing an outline around an object and then using an algorithm to automatically “snap” to the contours of the object, making the outline tighter and more accurately aligned with the object and label being tracked from one frame to the next. 

Annotators can do this using polygons. You might, for example, need to segment clothes a person is wearing in a surveillance video, so that you can see when a suspect takes off an item of clothing to put something else on. 

With the right video annotation tool, auto object segmentation is applicable for almost any use case across dozens of sectors. It works on arbitrary shapes, and interpolation can track object segments across thousands of frames. In most cases, the outcome is a massive time and cost saving throughout a video annotation project, resulting in much faster and higher quality segmentations. 

null

Automated object segmentation in Encord

The power of automated video annotation 

In our experience, there are very few cases where automatic video annotation can’t play a useful role during video annotation projects. Automation empowers annotators to work faster, more effectively, and deliver higher quality project outputs. 

Experience Encord in action. Try out our automated video annotation features (including our proprietary micro-model approach) for free today.