profile

Dominic Tarn September 9, 2022

5 Features You Need in a Video Annotation Tool

blog image

Labeling and annotating images is easy. Video annotation is not. Too many platforms focus on image annotation, throwing in video as an additional suite of features, rather than implementing video-native tools for annotators. 

In this post, we outline the 5 features you need to maximize video annotation ROI and efficiencies so you can choose the right video annotation tool for your needs. 

Contents 

What is video annotation, and how is it different from image annotation?

5 Essential Video Annotation Features

  1. Advanced Video Handling
  2. Easy-to-use Annotation Interface 
  3. Dynamic and Event-Based Classifications
  4. Automated Object Tracking, Interpolation & AI Assisted Labeling
  5. Team and Project Management

What Is Video Annotation, And How Is It Different From Image Annotation?

Video annotation is not the same as image annotation. You need a completely different — specialist, video-centric — suite of tools and features to handle videos. 

Otherwise, data and video analyst teams are juggling multiple annotation platforms (which is something we see more often than you’d imagine) to achieve their objectives. 

As a leader or manager within an organization that needs a video annotation and labeling solution, you need to ensure that the platform can effectively handle the specificities of video and image annotation. 

For example, within a large video — with a long runtime — you need to ensure the correct coordinates of objects that move from one frame to the next are aligned with the frame and timestamp the object first appeared. 

For several reasons, this doesn’t always happen with other tools, forcing companies to discard month’s worth of incorrectly labeled data. Let’s review the five most important features you need when considering which video annotation tool/platform to use. 

5 Essential Video Annotation Features 

‍Advanced Video Handling

Video annotation comes with dozens of challenges, such as variable frame rates, ghost frames, frame synchronization issues, and numerous others. To avoid these issues and ensure you don’t lose days of labeling activity, there’s two things your video annotation platform needs:

  • No limit to video length:  Most video annotation software limits the length of videos, forcing you to cut them into shorter videos before annotation can start. With the best video annotation tools, you won’t have this problem - they should be able to handle arbitrarily long videos. 
  • Video pre-processing: Frame synchronization issues are a massive headache for video annotation teams, and there are numerous causes, such as the types of browsers being used for annotation work, or variable frame rates at different points in a video. 

Effective pre-processing solves these challenges, ensuring a video is displayed properly, ready for annotation. Pre-processing means you avoid needing to re-label everything if there’s an issue with the video (e.g., sync frame issues, video not displayed properly, annotations are not matched with the proper frames, etc), saving your annotation team countless hours and a lot of budget at the start of a project. 

‍Easy-to-use Annotation Interface ‍

An easy-to-use video annotation and labeling interface is crucial to ensure annotators are productive. Video labeling and annotation shouldn’t take months, especially when annotating long videos. With this in mind, here are the key features you need to look out for to ensure your chosen annotation tool is easy to use: 

  • Navigation: When annotating long videos, a simple navigation tool is really important. Annotators need to be able to quickly find individual objects, move back and forth, and use labels to track specific objects as they move from frame to frame. 
  • Efficient manual annotation work: With an intuitive interface, annotators aren’t spending weeks getting to know the software. It should be easy to use by default. Hotkeys and other features make manual annotation work easier. Organizations can benefit from massive time, resource, and budget savings when annotators aren’t spending months on manual video labeling. 
  • Powerful annotation tooling: Annotation becomes a lot easier if you’ve got the right annotation types available to you. The main ones a video labeling tool should have are:
  • Bounding Boxes: Drawing a bounding box is one way to label, or classify, an object in a video. It’s integral to the process of video annotation. With the best annotation tools, you should have the ability to draw a box around the object you want to label. For example, city planners designing a smart city could label moving cars and vehicles in videos when analyzing traffic movement around urban areas. A powerful and effective annotation tool should make it easy to maintain the same bounding box from frame to frame, tracking multiple objects in motion.
  • Polygons are another annotation type, one you can draw free-hand. Add the relevant label and make polygons static or dynamic, depending on the annotated object. Static polygon annotations are useful when labeling cells or tumors in medical images.
  • Polylines are equally useful, especially if you’re labeling something that is static itself, but moves from frame to frame, such as a road, railway line, or waterway.
  • Keypoints outline or pinpoint the landmarks of specific shapes, such as a human face. Keypoint annotation is versatile and useful across countless shapes. Once you’ve highlighted the outline of a specific object it can be tracked from frame to frame, making it easier for AI-based systems or manual annotation of the same object throughout the rest of a video or series of images.
  • Primitives, also known as skeleton templates, are highly-useful for specialized annotations to templatize shapes (e.g., 3D cuboids, pose estimation skeletons, rotated bounding boxes, etc). Annotation teams can use primitives or skeleton templates to outline an object, empowering them to track the object from one frame to the next. Primitives are especially useful in medical video annotation. 
  • Object tracking is a simple and powerful way of labeling a specific object, giving it a unique ID that you can use to track it throughout a video. Pixels from the object that’s been labeled are matched to pixels in the frames that come next, allowing a moving object — such as a car or person running — to be automatically tracked. 

null

Navigation features in the video annotation section of Encord

‍Dynamic and Event-Based Classifications

Another important feature in a great video annotation tool is the ability to classify frames and events. This gives you additional data for your model to work from - whether it was night time in the video, or what the labeled object was doing at the time. 

Dynamic classifications are often called action or “event-based” classifications. The clue is in the name - they tell you what the object is doing - whether the car that you’re tracking is turning from left to right over a specific number of frames; hence these classifications being dynamic. It depends on what’s going on in the video and the granular level of detail you need to label. Dynamic or event-based classifications are a powerful feature that the best video annotation platforms come with, and you can use them regardless of the annotation type used to originally label the object in motion.

Frame Classifications are different from specific object classifications. Instead of labeling or classifying an object, you use an annotation tool to organize a specific frame within a video. Hotkeys and video labeling menus can make it simple to select the start and end of a frame and then give that frame a label while annotating. A frame classification is used to highlight something happening in the frame itself - whether it is day or night, or raining or sunny, for example.

Automated Object Tracking, Interpolation & AI Assisted Labeling

Annotation is a time-consuming, manual, data-intensive task. Especially when videos are long, complicated, or there are hundreds of videos to annotate. A solution is to automate video annotations. 

Automation leverages the skills of your annotation teams. It saves time and money, while increasing efficiency and the quality of the annotation work. 

  • Micro-Models are “annotation specific models that are overtrained to a particular task or particular piece of data.” Encord’s video annotation tool is the only one that uses the micro-model approach, and it is ideal for bootstrapping automated video annotation projects. What’s special about micro-models is that they don’t need huge amounts of data. Quite the opposite; you can train micro-models within a few minutes. Once you’ve labeled the object or specific thing, person, or action within a video you want to track, powerful AI-generated algorithms do the rest. Active learning is often the best approach with micro-models, as it may take a few iterations for an algorithm to get it right. For organizations with large video annotation projects, they have found that micro-models give them a massive advantage. 
  • Automated Object Tracking is an evolution of the ability to label specific objects while doing video annotation. This might be challenging when using older or less powerful software. However, when you use software that comes with a proprietary algorithm that runs without using a representational model, you will save time when implementing automated object tracking. 
  • Interpolation can be implemented automatically when the right software comes with a linear interpolation algorithm, designed for practical use cases in mind. Simply draw object vertices in arbitrary directions (e.g., clockwise, counter clockwise, and otherwise), and the algorithm will still track the same object as it moves from one frame to the next. 
  • Auto Object Segmentation is when you divide an object into multiple regions, or a series of pixels, without any constraints on the shape of those regions/pixels. 

For example, if an annotator has drawn a label boundary around a specific object — e.g. a cellular cluster being analyzed — the goal of auto-object segmentation is to tighten the edges so it fits more closely around the image in question. Algorithms can also track this image throughout the video automatically. 

null

Example of automated labeling using interpolation in Encord

‍Team and Project Management 

Large annotation teams are difficult to manage. Whether you’re a Head of Machine Learning  or Data Operations leader, you’ve got to juggle team management, budgets, operational timelines, and project outputs. 

Project leaders need visibility on what’s going on, being processed, and analyzed. You need a clear understanding of the state of the project in real-time, giving you the ability to react fast if anything changes. 

When big-budget and long-timescale annotation projects are underway, it’s often useful to leverage external annotation teams to implement labor-intensive aspects of the project. 

But working with external providers creates the need for advanced team and project management features, such as: 

  • Access control is essential when video data is confidential, such as medical video annotations. As a project leader, you need to set clear rules and restrictions on who has access to specific data assets, especially when this could breach GDPR in Europe, or healthcare data security legislation in the US (e.g. HIPAA). 
  • Performance dashboards, giving project leaders real-time visibility on video annotation project progress. Performance dashboards need to be granular. Giving you an overview for each annotator, reviewer, and annotation object (e.g., time spent, quality of annotation/rejection rate, and as much detail as you need to manage the process and project outputs effectively). On a higher-level, you need to know the total number of annotations done (compared to the project total, so you can track progress), and which kind of annotations, alongside dozens of other details. 

null

User management in Encord

And there we go, the 5 features every video annotation tool needs.

At Encord, our active learning platform for computer vision is used by a wide range of sectors - including healthcare, manufacturing, utilities, and smart cities - to annotate videos and accelerate their computer vision model development. 

Experience Encord in action. Dramatically reduce manual video annotation tasks, generating massive savings and efficiencies. Try it for free today