Back to Blogs

Encord Blog

How to Automate Video Annotation for Machine Learning

November 11, 2022

6 mins

Back to Blogs

Power your AI models with the right data

Automate your data curation, annotation and label validation workflows.

Get started

Written by

Denis Gavrielov

View more posts

Automated video labeling saves companies a lot of time and money by accelerating the speed and quality of manual video labeling, and eventually taking over the bulk of video annotation work. ‍

Once you start using machine learning and AI-based algorithms for video annotation — using large amounts of labeled videos — and ensuring those videos are accurately labeled is crucial to the success of the project. Generating labels manually during video annotation is highly laborious, time-consuming, costs a lot of money, and requires a whole team of people.

Businesses and organizations often outsource this work to save costs. However, this rarely makes the task any quicker and can often cause problems with quality. Automated video annotation can solve most of these problems, reducing manual inputs, saving time and money, and ensuring you can annotate and label much larger datasets while maintaining consistent quality.

In this post, we look at four ways to automate video annotation while ensuring the quality and consistency of your labels‍

‍#1: Multi-Object Tracking (MOT) to Ensure Continuity from Frame to Frame

Tracking objects automatically is a powerful automated video annotation feature. Once you’ve labeled an object, you want to ensure it’s tracked correctly and consistently from one frame to the next, especially if it’s moving and changing direction or speed. Or if the background and light levels change, such as a shift from day to night.

Not only that but if you’ve labeled multiple objects, you need an AI-based video annotation tool capable of tracking every single one of them. The most powerful automated video labeling tool tracks pixels within an annotation from one frame to the next. This shouldn't be a problem even if you are tracking multiple objects with automatic annotation.

Multi-object tracking is especially useful when processing videos through a machine learning automation tool and an asset when analyzing drone footage, surveillance videos, and in the healthcare and manufacturing sectors. Healthcare companies often need to annotate and analyze surgical or gastroenterology videos, whereas manufacturers need clearer, annotated videos of assembly lines.

Automated object traking for video annotation in Encord platform.

Automated object tracking for video annotation in Encord

‍#2: Use Interpolation to Fill in the Gaps

In automated video annotation or labeling, interpolation is the act of propagating labels between two keyframes. Say an annotation team has already manually labeled objects within hundreds of keyframes, using bounding boxes or polygons — at the start and end of a video. Interpolation accelerates the annotation process, filling in the details within the unannotated frames.

However, you must use interpolation carefully, at least when starting out with a video annotation project. There’s always a trade-off between speed and quality. Dependent, of course, on the quality of the labels applied and the complexity of the labeling agents used during the model training stage.

For example, a polygon applied to a complex multi-faceted object that’s moving from one frame to the next might not interpolate as easily as a simple object with a bounding box around it that’s moving slowly. As annotators know, this entirely depends on how much is changing in the video from one frame to the next.

When polygons are drawn on an object in a video, supported by a proprietary algorithm that runs without a representational model, it can tighten the perimeter of the polygon, interpolate, and track the various segments (in this case, clothes) within a moving object, e.g., a person.

Interpolation to support video annotation in Encord.

Interpolation to support video annotation in Encord

#3: Use Micro-Models to Accelerate AI-assisted Video Annotation

In most cases, machine learning (ML) models and AI-based algorithms need vast amounts of data before they can produce meaningful results. Not only that, but the data going in should be clean and consistent. Otherwise, you risk the whole project taking much longer than anticipated or having to start over again.

Automated video labeling and annotation are complicated. This method is also known as model-assisted labeling (MAL), or AI-assisted labeling (AAL). This type of labeling is far more complex than annotating static images or applying ML to vast Excel spreadsheets and other data sources.

Conversely, micro-models are powerful, tightly-scoped approaches that over-fit data models to bootstrap your video annotation tasks. Training machine learning algorithms using micro-models is an iterative process that requires manual annotation and labeling at the start. However, you don’t need nearly as much manual work or time spent training the model as you would with other video annotation platforms.

In some cases, you can train micro-models on as few as five labeled frames. As we outline in another post, “micro-models are annotation-specific models that are overtrained to a particular task or particular piece of data.”

Micro-models are best applied to a narrow domain, e.g., automatically annotating particular objects throughout a long video, and the training data required is minimal. It can take minutes to train a micro-model and only minutes or hours to run through the development cycle. Micro-models save vast amounts of time and money for organizations in the healthcare, manufacturing, or research sectors, especially when annotating complex moving objects.

#4: Auto Object Segmentation to Improve the Quality of Object Segments

‍‍Auto-segmentation is drawing an outline around an object and then using an algorithm to automatically “snap” to the contours of the object, making the outline tighter and more accurately aligned with the object and label being tracked from one frame to the next.

Annotators can do this using polygons. You might, for example, need to segment clothes a person is wearing in a surveillance video so that you can see when a suspect takes off an item of clothing to put something else on.

With the right video annotation tool, auto object segmentation is applicable for almost any use case across dozens of sectors. It works on arbitrary shapes, and interpolation can track object segments across thousands of frames. In most cases, the outcome is a massive time and cost saving throughout a video annotation project, resulting in much faster and higher quality segmentations.

Auto-segmentation feature in Encord

Automated object segmentation in Encord

The power of automated video annotation ‍

In our experience, there are very few cases where automatic video annotation can’t play a useful role during video annotation projects. Automation empowers annotators to work faster, more effectively, and deliver higher-quality project outputs.

‍Experience Encord in action. Try out our automated video annotation features (including our proprietary micro-model approach).

Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams.

AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today.

Want to stay updated?

Join our Discord channel to chat and connect.

Power your AI models with the right data

Automate your data curation, annotation and label validation workflows.

Get started

Written by

Denis Gavrielov

View more posts

Previous blog

Product Update [August 2022]

Next blog

Introduction to Balanced and Imbalanced Datasets in Machine Learning

Related blogs

View all

Data Operations

The Python Developer's Toolkit for PDF Processing

PDFs (Portable Document Format) are a ubiquitous part of our digital lives, from eBooks and research papers to invoices and contracts. For developers, automating PDF processing can save time and boost productivity. 🔥Fun Fact: While PDFs may appear to contain well-structured text, they do not inherently include paragraphs, sentences, or even words. Instead, a PDF file is only aware of individual characters and their placement on the page.🔥 This characteristic makes extracting meaningful text from PDFs challenging. The characters forming a paragraph are indistinguishable from those in tables, footers, or figure descriptions. Unlike formats such as .txt files or Word documents, PDFs do not contain a continuous stream of text. A PDF document is composed of a collection of objects that collectively describe the appearance of one or more pages. These may include interactive elements and higher-level application data. The file itself contains these objects along with associated structural information, all encapsulated in a single self-contained sequence of bytes. In this comprehensive guide, we’ll explore how to process PDFs in Python using various libraries. We’ll cover tasks such as reading, extracting text and metadata, creating, merging, and splitting PDFs. Prerequisites Before diving into the code, ensure you have the following: Python installed on your system Basic understanding of Python programming Required libraries: PyPDF2, pdfminer.six, ReportLab, and PyMuPDF (fitz) You can install these libraries using pip: pip install PyPDF2 pdfminer.six reportlab PyMuPDF Reading PDFs with PyPDF2 PyPDF2 is a pure-python library used for splitting, merging, cropping, and transforming pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. Code Example Here we are reading a PDF and extracting text from it: import PyPDF2 def extract_text_from_pdf(file_path): with open(file_path, 'rb') as file: reader = PyPDF2.PdfReader(file) text = '' for page_num in range(len(reader.pages)): text += reader.pages[page_num].extract_text() return text # Usage file_path = 'sample.pdf' print(extract_text_from_pdf(file_path)) Extracting Text and Metadata with pdfminer.six pdfminer.six is a tool for extracting information from PDF documents, focusing on getting and analyzing the text data. Code Example Here’s how to extract text and metadata from a PDF: from pdfminer.high_level import extract_text from pdfminer.pdfparser import PDFParser from pdfminer.pdfdocument import PDFDocument def extract_text_with_pdfminer(file_path): return extract_text(file_path) def extract_metadata(file_path): with open(file_path, 'rb') as file: parser = PDFParser(file) doc = PDFDocument(parser) metadata = doc.info[0] return metadata # Usage file_path = 'sample.pdf' print(extract_text_with_pdfminer(file_path)) print(extract_metadata(file_path)) Creating and Modifying PDFs with ReportLab ReportLab is a robust library for creating PDFs from scratch, allowing for the addition of various elements like text, images, and graphics. Code Example To create a simple PDF: from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas def create_pdf(file_path): c = canvas.Canvas(file_path, pagesize=letter) c.drawString(100, 750, "Hello from Encord!") c.save() # Usage create_pdf("test.pdf") To modify an existing PDF, you can use PyPDF2 in conjunction with ReportLab. Manipulating PDFs with PyPDF2 Code Example for Merging PDFs from PyPDF2 import PdfMerger def merge_pdfs(pdf_list, output_path): merger = PdfMerger() for pdf in pdf_list: merger.append(pdf) merger.write(output_path) merger.close() # Usage pdf_list = ['file1.pdf', 'file2.pdf'] merge_pdfs(pdf_list, 'merged.pdf') Code Example for Splitting PDFs from PyPDF2 import PdfReader, PdfWriter def split_pdf(input_path, start_page, end_page, output_path): reader = PdfReader(input_path) writer = PdfWriter() for page_num in range(start_page, end_page): writer.add_page(reader.pages[page_num]) with open(output_path, 'wb') as output_pdf: writer.write(output_pdf) # Usage split_pdf('merged.pdf', 0, 2, 'split_output.pdf') Code Example for Rotating Pages from PyPDF2 import PdfReader, PdfWriter def rotate_pdf(input_path, output_path, rotation_degrees=90): reader = PdfReader(input_path) writer = PdfWriter() for page_num in range(len(reader.pages)): page = reader.pages[page_num] page.rotate(rotation_degrees) writer.add_page(page) with open(output_path, 'wb') as output_pdf: writer.write(output_pdf) # Usage input_path = 'input.pdf' output_path = 'rotated_output.pdf' rotate_pdf(input_path, output_path, 90) Extracting Images from PDFs using PyMuPDF (fitz) PyMuPDF (also known as fitz) allows for advanced operations like extracting images from PDFs. Code Example Here is how to extract images from PDFs: import fitz def extract_images(file_path): pdf_document = fitz.open(file_path) for page_num in range(len(pdf_document)): page = pdf_document.load_page(page_num) images = page.get_images(full=True) for image_index, img in enumerate(images): xref = img[0] base_image = pdf_document.extract_image(xref) image_bytes = base_image["image"] image_ext = base_image["ext"] with open(f"image{page_num+1}_{image_index}.{image_ext}", "wb") as image_file: image_file.write(image_bytes) # Usage extract_images('sample.pdf') If you're extracting images from PDFs to build a dataset for your computer vision model, be sure to explore Encord—a comprehensive data development platform designed for computer vision and multimodal AI teams. Conclusion Python provides a powerful toolkit for PDF processing, enabling developers to perform a wide range of tasks from basic text extraction to complex document manipulation. Libraries like PyPDF2, pdfminer.six, and PyMuPDF offer complementary features that cover most PDF processing needs. When choosing a library, consider the specific requirements of your project. PyPDF2 is great for basic operations, pdfminer.six excels at text extraction, and PyMuPDF offers a comprehensive set of features including image extraction and table detection. As you get deeper into PDF processing with Python, explore the official documentation of these libraries for more advanced features and optimizations (I have linked them in this blog!). Remember to handle exceptions and edge cases, especially when dealing with large or complex PDF files.

Jul 17 2024

5 M

sampleImage_data-curation-guide-for-video

sampleImage_scale-data-labeling-operations

Data Operations

How to Scale Your Data Labeling Operations

Data labeling operations are integral to the success of machine learning and computer vision projects. Data operation teams manage the entire end-to-end lifecycle of data labeling, including data sourcing, cleaning, and collaborating with ML teams to implement model training, quality assurance, and auditing workflows. The scalability of these teams is crucial. Behind the scenes, data operations teams ensure that artificial intelligence projects run smoothly. As computer vision, machine learning, and deep learning projects scale and data volumes expand, it is critical that data ops teams grow, streamline, and adapt to meet the challenge of handling more labeling tasks. In this article, we will cover 6 steps that data operations managers need to take to scale their teams and operational practices. What is Data Labeling for Machine Learning and Computer Vision? Data labeling or data annotation ⏤ the two terms that are often used synonymously, ⏤ is the act of applying labels and annotations to unlabeled data for the purpose of machine learning algorithms. Labels can be applied to various types of data, including images, video, text, and voice. For the purpose of this article, we will focus on data labeling for computer vision use cases, in which labels are applied to images and videos to create high-quality training datasets for AI models. Data labeling tasks could be as simple as applying a bounding box or polygon annotation with “cat” label or as complicated as microcellular labels applied to segmentations of tumors for a healthcare computer vision project. Regardless of complexity, accuracy is essential in the labeling process to ensure high-quality training datasets and to optimize model performance. Data labeling can be time-consuming and expensive. As such, companies must weigh the advantages and disadvantages of outsourcing or hiring in-house. While outsourcing is often more cost-effective, it comes with quality control concerns and data security risks. And, while in-house teams are expensive, they guarantee higher labeling quality and real-time insight into team members labeling tasks. The quality of training data directly impacts the performance of machine learning algorithms.,, Ultimately, it comes down to the labeling quality, a responsibility entrusted to data labeling teams. High-quality data requires a quality-centric data operations process with systems and management that can handle large volumes of labeling tasks for images or videos. 💡Find out more with Encord’s guides on How to choose the best datasets for machine learning and How to choose the right data for computer vision projects. Challenges of Scaling Data Labeling Operations Data labeling is a time-consuming and resource-intensive function. Data ops team members have to account for and manage everything from sourcing data to data cleaning, building and maintaining a data pipeline, quality assurance, and training a model using training, validation, and test sets. Even with an automated data annotation tool, there is a lot for data ops managers to oversee. There are several challenges that data labeling teams face when scaling: Project resources: Scaling requires additional resources and funding. Determining the best allocation of both can be a challenge Hiring and training: Hiring and training new team members require time and resources to align with project requirements and data quality standards. This forces teams to consider the options of outsourcing or managing teams in-house? Quality control: As the volume of data increases, maintaining How do we maintain high-quality labels becomes challenging. Workflow and data security: As data labeling tasks increase, it can be challenging to maintain data security, compliance, and audit trails. Annotation software: As image and video volumes increase, it can be challenging to manage projects. It is imperative to use the right tools, as teams can often benefit from the automation of data labeling tasks. Let’s look at how to solve these challenges. 6 Best Practices to Implement Scalable Data Labeling Operations Data operations teams are crucial for supporting data scientists and engineers. Here are 6 best practices for managing and implementing data labeling operations at scale. 1. Design a workflow-centric process Designing workflow-centric processes is crucial for any AI project. Data ops managers need to establish the data labeling project’s processes and workflows by creating standard operating procedures. 💡For more information, read Best Practice Guide for Computer Vision Data Operations Teams The support of senior leadership is vital to obtain the resources and budget to grow the data ops team, use the right tools, and employ a workforce for data labeling that can handle the volume needed. 2. Select an effective workforce for data labeling To select the appropriate workforce for data labeling operations, there are three options available: an in-house team, outsourced labeling services, or a crowd-sourced labeling team. The choice depends on several factors: Data volume Specialist knowledge Data security Cost considerations Management In many cases, the benefits of using outsourced labeling service providers outweigh the associated risks and costs. In regulated sectors like healthcare, however, the use of in-house teams is often the only option given data security concerns and the highly specialized knowledge required. Crowdsourcing through platforms like Amazon Mechanical Turk (MTurk) and SageMaker Ground Truth is another viable option for computer vision projects. Proper systems and processes, including workforce and workflow management and annotator training, are essential to the success of crowdsourcing or outsourcing. 3. Automate the data labeling process Similar to the staffing question, there are three options for automating data labeling: in-house tools, open-source, or commercial annotation solutions such as Encord. Open-source data labeling tools are suitable for projects with limited funding, such as academia or research, or for when a small team is building an MVP (minimum viable product) version of an AI model. These tools, however, often don’t meet the requirements for large-scale commercial projects. Developing an in-house tool can be a time-consuming and costly endeavor, taking 9 to 18 months and involving significant R&D expenses. In contrast, an off-the-shelf labeling platform can be quickly implemented. While pricing is higher than open-source (usually free for basic versions), it is cheaper than building an in-house data labeling tool. With an AI-assisted labeling and annotation platform, such as Encord, data ops teams can manage and scale the annotation workflows. The right tool also provides quality control mechanisms and training data-fixing solutions. 4. Leverage software principles for DataOps Software development principles can be leveraged when scaling data labeling and training for a computer vision project. Since data engineers, scientists, and analysts often engage in code-intensive tasks, integrating practices like continuous integration and delivery (CI/CD) and version control into data ops workflows is logical and advantageous. 5. Implement quality assurance (QA) iterative workflows To ensure quality control and assurance at scale, it is crucial to establish a fast-moving and iterative process. One effective approach is to establish an active learning pipeline and dashboard. This allows data ops leaders to maintain tight control over quality at both a high-level and individual label level. 💡Here are our guides on 5 Ways to Improve The Quality of Data Labels and an Introduction to Quality Metrics in Computer Vision 6. Ensure transparency and audibility in the data and labeling pipeline Label transparency and audibility are essential throughout the data pipeline. A clear, user-logged, and timestamped audit trail is critical for projects in secure or regulated sectors like healthcare where FDA compliance is required. With new AI laws likely to come into force worldwide in the next few years, a data labeling audit trail could also become mandatory for commercial AI models in non-regulated industries. 💡 Find out more with our Best Practice Guide for Computer Vision Data Operations Teams Scaling Data Labeling Operations: Key Takeaways High-quality training datasets are essential for optimizing model performance. The function of data operations teams is to ensure the labeling quality and labeling workflow are smooth and frictionless. Follow these 6 best practices to scale your data operations properly: Design workflow-centric processes Select an effective workforce for data labeling Automate the data labeling process Leverage software principles for DataOps Implement QA iterative workflows Ensure transparency and audibility in the data and labeling pipeline With an AI-powered annotation platform, data ops managers can oversee complex workflows, make annotation more efficient, and achieve labeling quality and productivity targets. Are you ready to scale your data labeling operations and need a powerful AI-based software suite for computer vision projects? Sign-up for a free trial of Encord: The Data Engine for AI Model Development, used by the world’s pioneering computer vision teams. Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.

Jul 04 2023

4 M

sampleImage_webinar-semantic-visual-search-chatgpt-clip

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.