Back to Blogs

Contents

What are Action Classifications in Video Annotation?
Action Classification vs. Static Classification: What’s the Difference? ‍
Best Practices for Action Classifications in Video
What Are The Use Cases for Action Classification in Video Annotation?

Encord Blog

How to use Action Classifications In Video Annotation

November 11, 2022

8 mins

Back to Blogs

Power your AI models with the right data

Automate your data curation, annotation and label validation workflows.

Get started

Contents

What are Action Classifications in Video Annotation?
Action Classification vs. Static Classification: What’s the Difference? ‍
Best Practices for Action Classifications in Video
What Are The Use Cases for Action Classification in Video Annotation?

Written by

Ulrik Stig Hansen

View more posts

In almost every video some objects move. A car could be moving from frame to frame, but static annotations limit the amount of data machine learning teams can train a model on. Hence the need for action classifications in video annotation projects.

With action, dynamic or events-based classification, video annotation teams can add a richer layer of data for computer vision machine learning models.

Annotators can label whether a car is accelerating or decelerating, turning, stopping, starting, or reversing, and apply numerous other labels to a dynamic object.

‍In this post, we will explain action classifications, also known as dynamic or event-based classification in video annotation in more detail, why this is difficult to implement, how it works, best practices, and use cases.

{‍{try_encord}}

What are Action Classifications in Video Annotation?

Action, dynamic or event-based classification (also known as activity recognition) in video annotation is a time-dependent approach to annotation.

Annotators need to apply action classifications to say what an object is doing and over what timescale those actions are taking place. With the right video annotation tool, you can apply these annotation labels so that an algorithm-generated machine-learning model has more data to learn from. This helps improve the overall quality of the dataset, and therefore, the outputs the model generates.

For example, a car could be accelerating in frames 100 to 150, then decelerate in frames 300 to 350, and then turn left in frames 351 to 420. Dynamic classifiers contribute to the ground truth of a video annotation, and the video data a machine learning model learns from.

Action or dynamic classifications are incredibly useful annotation and labeling tools, acting as an integral classifiers in the annotation process. However, dynamic classifications and labels are difficult to implement successfully. Very few video annotation platforms come with this feature. Encord does, and that’s why we’re going into more detail as to why dynamic or event classifications matter, how it works, best practices, and use cases.

Action Classification vs. Static Classification: What’s the Difference? ‍

Before we do, let’s compare action with static classifications.

With static classifications, annotators use an annotation tool to define and label the global properties of an object (e.g. the car is blue, has four wheels, and slight damage to the drivers-side door), and the ground truth of video data an ML is trained on. You can apply as much or as little detail as you need to train your computer vision model algorithm using static classifications and labels.

On the other hand, action, or dynamic classifications, describe what an object is doing and when those actions take place. Action classifications are labels that are always inherently time and action-orientated. An object needs to be in motion, whether that’s a person, car, plane, train, or anything else that moves from frame to frame.

An object’s behavior — whether that’s a person running, jumping, walking; a vehicle in motion, or anything else — defines and informs the labels and annotations applied during video annotation work and the object detection process. When annotated training datasets are fed into a computer vision or machine learning model, those dynamic labels and classifications influence the model’s outputs.

Why are Action Classifications in Video Datasets Difficult to Implement?

Action classifications are a truly innovative engineering achievement.

Despite decades of work, academic research, and countless millions in funding for computer vision, machine learning, artificial intelligence (AI), and video annotation companies, most platforms don’t offer dynamic classification in an easy-to-implement format.

Static classifications and labels are easier to do. Every video annotation tool and platform comes with static labeling features. Dynamic classification features are less common. Hence the advantage of finding an annotation tool that does static and dynamic, such as Encord.

Action classifications require special features to apply dynamic data structures of object descriptions, to ensure a computer vision model understands this data accurately so that a moving car in one frame is tracked hundreds of frames later in the same video.

How Does Action Classification for Video Data Work?

Annotating and labeling movements aren’t easy. When an object is static, annotators give objects descriptive labels. Object detection is fairly simple for annotation tools. Static labels can be as simple as “red car”, or as complicated as describing the particular features of cancerous cells.

On the other hand, dynamic labels and classifications can cover everything from simple movement descriptors to extremely detailed and granular descriptions. When we think about how people move, so many parts of the body are in motion at any one time. Hence the advantage of using keypoints and primitives (skeleton templates) when implementing human pose estimation (HPE) annotations; this is another form of dynamic classification when the movements themselves are dynamic.

Therefore, annotations of human movement might need to involve an even higher level of granular detail. In a video of tennis players, notice the number of joints and muscles in action as a player hits a serve. In this one example, we can see that the players’ feet, legs, arms, neck, and head are all in motion. Every limb moves, and depending on what you’re training a computer vision model to understand, it means ensuring annotations cover as much detail as possible.

Annotations of human movements.

How to Train Computer Vision Models on Action Classification Annotations?

Answering this question comes down to understanding how much data a computer vision model needs, and whether any AI/ML-based model needs more data when the video annotations are dynamic.

Unfortunately, there’s no clear answer to that question. It always depends on a number of factors, such as the model's objectives and project outcomes, interpolation applied, the volume, and quality of the training datasets, and the granularity of the dynamic labels and annotations applied.

Any model is only as accurate as the data provided. The quality, detail, number of segmentations, and granularity of labels and annotations applied during the stage influence how well and fast computer vision models learning. And crucially, how accurate any model is before more data and further iterations of that data need to be fed into the model.

As with any computer vision model, the more data you feed it, the more accurate it becomes. Providing a model with different versions of similar data — e.g. a red car moving fast in shadows, compared to a red car moving slowly in evening or morning light — the higher the accuracy of the training data.

With the right video annotation tool, you can apply any object annotation type and label to an object that’s in motion — bounding boxes, polygons, polylines, keypoints, and primitives.

Using Encord, you can annotate the localized version of any object — static and dynamic — regardless of the annotation type you deploy. Everything is conveniently accessible in one easy-to-use interface for annotators, and Encord tools can also be used through APIs and SDKs.

Now let’s take a look at the best practices and use cases for action classifications in video annotation projects.

Best Practices for Action Classifications in Video

Use clean (raw) data

Before starting any video-based annotation project, you need to ensure you’ve got a large enough quantity and quality of raw data (videos). Data cleansing is integral and essential to this process. Ensure low-quality or duplicate frames, such as ghost frames, are removed.

Understand the dynamic properties video dataset annotations are trying to explain

Once the videos are ready, annotation and ML teams need to be clear on what dynamic classification annotations are trying to explain. What are the outcomes you want to train a computer vision model for? How much detail should you include?

Answering these questions will influence the granular level of detail annotators should apply to the training data, and subsequent requests ML teams make when more data is needed. Annotators might need to apply more segmentation to the videos or classify the pixels more accurately, especially when comparing against benchmark datasets.

Understand the dynamic properties video dataset annotations are trying to explain

Next, you need to ensure the labels and annotations being used align with the problem the project is trying to solve. Remember, the quality of the data — from the localized version of any object to the static or dynamic classifications applied — has a massive impact on the quality of the computer vision model outcomes.

Projects often involve comparing model outcomes with benchmark video classification datasets. This way, machine learning team leaders can compare semantic metrics against benchmark models and machine learning algorithm outcomes.

Go granular with annotation details, especially with interpolation, object detection, and segmentation

Detail and context are crucial. Start with the simplest labels, and then go as granular as you need with the labels, annotations, specifications, segmentations, protocols, and metadata, right down to classifying individual pixels. This could involve as much detail as saying a car went from 25kmph to 30kmph in the space of 10 seconds.

What Are The Use Cases for Action Classification in Video Annotation?

Action classification in video annotation is useful across dozens of sectors, with countless practical applications already in use. In our experience, some of the most common rights now include computational models for autonomous driving, sports analytics, manufacturing, and smart cities.

Key Takeaways for Using Action Classification in Video Annotation

Any sector where movement is integral to video annotation and computer vision model projects can benefit from dynamic or events-based classifications.

Action classifications give annotators and ML teams a valuable tool for classifying moving and time-based objects. Movement is one of the most difficult things to annotate and label. A powerful video annotation tool is needed, with dynamic classification features, to support annotators when events/time-based action needs to be accurately labeled.

‍At Encord, our active learning platform for computer vision is used by a wide range of sectors - including healthcare, manufacturing, utilities, and smart cities - to annotate 1000s of videos and accelerate their computer vision model development. Speak to sales to request a trial of Encord

Want to stay updated?

Join our Discord channel to chat and connect.

Power your AI models with the right data

Automate your data curation, annotation and label validation workflows.

Get started

Written by

Ulrik Stig Hansen

View more posts

Previous blog

6 Steps to Build Better Computer Vision Models

Next blog

4 Questions to Ask When Evaluating Training Data Pipelines

Related blogs

View all

Data Operations

The Python Developer's Toolkit for PDF Processing

PDFs (Portable Document Format) are a ubiquitous part of our digital lives, from eBooks and research papers to invoices and contracts. For developers, automating PDF processing can save time and boost productivity. 🔥Fun Fact: While PDFs may appear to contain well-structured text, they do not inherently include paragraphs, sentences, or even words. Instead, a PDF file is only aware of individual characters and their placement on the page.🔥 This characteristic makes extracting meaningful text from PDFs challenging. The characters forming a paragraph are indistinguishable from those in tables, footers, or figure descriptions. Unlike formats such as .txt files or Word documents, PDFs do not contain a continuous stream of text. A PDF document is composed of a collection of objects that collectively describe the appearance of one or more pages. These may include interactive elements and higher-level application data. The file itself contains these objects along with associated structural information, all encapsulated in a single self-contained sequence of bytes. In this comprehensive guide, we’ll explore how to process PDFs in Python using various libraries. We’ll cover tasks such as reading, extracting text and metadata, creating, merging, and splitting PDFs. Prerequisites Before diving into the code, ensure you have the following: Python installed on your system Basic understanding of Python programming Required libraries: PyPDF2, pdfminer.six, ReportLab, and PyMuPDF (fitz) You can install these libraries using pip: pip install PyPDF2 pdfminer.six reportlab PyMuPDF Reading PDFs with PyPDF2 PyPDF2 is a pure-python library used for splitting, merging, cropping, and transforming pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. Code Example Here we are reading a PDF and extracting text from it: import PyPDF2 def extract_text_from_pdf(file_path): with open(file_path, 'rb') as file: reader = PyPDF2.PdfReader(file) text = '' for page_num in range(len(reader.pages)): text += reader.pages[page_num].extract_text() return text # Usage file_path = 'sample.pdf' print(extract_text_from_pdf(file_path)) Extracting Text and Metadata with pdfminer.six pdfminer.six is a tool for extracting information from PDF documents, focusing on getting and analyzing the text data. Code Example Here’s how to extract text and metadata from a PDF: from pdfminer.high_level import extract_text from pdfminer.pdfparser import PDFParser from pdfminer.pdfdocument import PDFDocument def extract_text_with_pdfminer(file_path): return extract_text(file_path) def extract_metadata(file_path): with open(file_path, 'rb') as file: parser = PDFParser(file) doc = PDFDocument(parser) metadata = doc.info[0] return metadata # Usage file_path = 'sample.pdf' print(extract_text_with_pdfminer(file_path)) print(extract_metadata(file_path)) Creating and Modifying PDFs with ReportLab ReportLab is a robust library for creating PDFs from scratch, allowing for the addition of various elements like text, images, and graphics. Code Example To create a simple PDF: from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas def create_pdf(file_path): c = canvas.Canvas(file_path, pagesize=letter) c.drawString(100, 750, "Hello from Encord!") c.save() # Usage create_pdf("test.pdf") To modify an existing PDF, you can use PyPDF2 in conjunction with ReportLab. Manipulating PDFs with PyPDF2 Code Example for Merging PDFs from PyPDF2 import PdfMerger def merge_pdfs(pdf_list, output_path): merger = PdfMerger() for pdf in pdf_list: merger.append(pdf) merger.write(output_path) merger.close() # Usage pdf_list = ['file1.pdf', 'file2.pdf'] merge_pdfs(pdf_list, 'merged.pdf') Code Example for Splitting PDFs from PyPDF2 import PdfReader, PdfWriter def split_pdf(input_path, start_page, end_page, output_path): reader = PdfReader(input_path) writer = PdfWriter() for page_num in range(start_page, end_page): writer.add_page(reader.pages[page_num]) with open(output_path, 'wb') as output_pdf: writer.write(output_pdf) # Usage split_pdf('merged.pdf', 0, 2, 'split_output.pdf') Code Example for Rotating Pages from PyPDF2 import PdfReader, PdfWriter def rotate_pdf(input_path, output_path, rotation_degrees=90): reader = PdfReader(input_path) writer = PdfWriter() for page_num in range(len(reader.pages)): page = reader.pages[page_num] page.rotate(rotation_degrees) writer.add_page(page) with open(output_path, 'wb') as output_pdf: writer.write(output_pdf) # Usage input_path = 'input.pdf' output_path = 'rotated_output.pdf' rotate_pdf(input_path, output_path, 90) Extracting Images from PDFs using PyMuPDF (fitz) PyMuPDF (also known as fitz) allows for advanced operations like extracting images from PDFs. Code Example Here is how to extract images from PDFs: import fitz def extract_images(file_path): pdf_document = fitz.open(file_path) for page_num in range(len(pdf_document)): page = pdf_document.load_page(page_num) images = page.get_images(full=True) for image_index, img in enumerate(images): xref = img[0] base_image = pdf_document.extract_image(xref) image_bytes = base_image["image"] image_ext = base_image["ext"] with open(f"image{page_num+1}_{image_index}.{image_ext}", "wb") as image_file: image_file.write(image_bytes) # Usage extract_images('sample.pdf') If you're extracting images from PDFs to build a dataset for your computer vision model, be sure to explore Encord—a comprehensive data development platform designed for computer vision and multimodal AI teams. Conclusion Python provides a powerful toolkit for PDF processing, enabling developers to perform a wide range of tasks from basic text extraction to complex document manipulation. Libraries like PyPDF2, pdfminer.six, and PyMuPDF offer complementary features that cover most PDF processing needs. When choosing a library, consider the specific requirements of your project. PyPDF2 is great for basic operations, pdfminer.six excels at text extraction, and PyMuPDF offers a comprehensive set of features including image extraction and table detection. As you get deeper into PDF processing with Python, explore the official documentation of these libraries for more advanced features and optimizations (I have linked them in this blog!). Remember to handle exceptions and edge cases, especially when dealing with large or complex PDF files.

Jul 17 2024

5 M

sampleImage_data-curation-guide-for-video

sampleImage_scale-data-labeling-operations

Data Operations

How to Scale Your Data Labeling Operations

Data labeling operations are integral to the success of machine learning and computer vision projects. Data operation teams manage the entire end-to-end lifecycle of data labeling, including data sourcing, cleaning, and collaborating with ML teams to implement model training, quality assurance, and auditing workflows. The scalability of these teams is crucial. Behind the scenes, data operations teams ensure that artificial intelligence projects run smoothly. As computer vision, machine learning, and deep learning projects scale and data volumes expand, it is critical that data ops teams grow, streamline, and adapt to meet the challenge of handling more labeling tasks. In this article, we will cover 6 steps that data operations managers need to take to scale their teams and operational practices. What is Data Labeling for Machine Learning and Computer Vision? Data labeling or data annotation ⏤ the two terms that are often used synonymously, ⏤ is the act of applying labels and annotations to unlabeled data for the purpose of machine learning algorithms. Labels can be applied to various types of data, including images, video, text, and voice. For the purpose of this article, we will focus on data labeling for computer vision use cases, in which labels are applied to images and videos to create high-quality training datasets for AI models. Data labeling tasks could be as simple as applying a bounding box or polygon annotation with “cat” label or as complicated as microcellular labels applied to segmentations of tumors for a healthcare computer vision project. Regardless of complexity, accuracy is essential in the labeling process to ensure high-quality training datasets and to optimize model performance. Data labeling can be time-consuming and expensive. As such, companies must weigh the advantages and disadvantages of outsourcing or hiring in-house. While outsourcing is often more cost-effective, it comes with quality control concerns and data security risks. And, while in-house teams are expensive, they guarantee higher labeling quality and real-time insight into team members labeling tasks. The quality of training data directly impacts the performance of machine learning algorithms.,, Ultimately, it comes down to the labeling quality, a responsibility entrusted to data labeling teams. High-quality data requires a quality-centric data operations process with systems and management that can handle large volumes of labeling tasks for images or videos. 💡Find out more with Encord’s guides on How to choose the best datasets for machine learning and How to choose the right data for computer vision projects. Challenges of Scaling Data Labeling Operations Data labeling is a time-consuming and resource-intensive function. Data ops team members have to account for and manage everything from sourcing data to data cleaning, building and maintaining a data pipeline, quality assurance, and training a model using training, validation, and test sets. Even with an automated data annotation tool, there is a lot for data ops managers to oversee. There are several challenges that data labeling teams face when scaling: Project resources: Scaling requires additional resources and funding. Determining the best allocation of both can be a challenge Hiring and training: Hiring and training new team members require time and resources to align with project requirements and data quality standards. This forces teams to consider the options of outsourcing or managing teams in-house? Quality control: As the volume of data increases, maintaining How do we maintain high-quality labels becomes challenging. Workflow and data security: As data labeling tasks increase, it can be challenging to maintain data security, compliance, and audit trails. Annotation software: As image and video volumes increase, it can be challenging to manage projects. It is imperative to use the right tools, as teams can often benefit from the automation of data labeling tasks. Let’s look at how to solve these challenges. 6 Best Practices to Implement Scalable Data Labeling Operations Data operations teams are crucial for supporting data scientists and engineers. Here are 6 best practices for managing and implementing data labeling operations at scale. 1. Design a workflow-centric process Designing workflow-centric processes is crucial for any AI project. Data ops managers need to establish the data labeling project’s processes and workflows by creating standard operating procedures. 💡For more information, read Best Practice Guide for Computer Vision Data Operations Teams The support of senior leadership is vital to obtain the resources and budget to grow the data ops team, use the right tools, and employ a workforce for data labeling that can handle the volume needed. 2. Select an effective workforce for data labeling To select the appropriate workforce for data labeling operations, there are three options available: an in-house team, outsourced labeling services, or a crowd-sourced labeling team. The choice depends on several factors: Data volume Specialist knowledge Data security Cost considerations Management In many cases, the benefits of using outsourced labeling service providers outweigh the associated risks and costs. In regulated sectors like healthcare, however, the use of in-house teams is often the only option given data security concerns and the highly specialized knowledge required. Crowdsourcing through platforms like Amazon Mechanical Turk (MTurk) and SageMaker Ground Truth is another viable option for computer vision projects. Proper systems and processes, including workforce and workflow management and annotator training, are essential to the success of crowdsourcing or outsourcing. 3. Automate the data labeling process Similar to the staffing question, there are three options for automating data labeling: in-house tools, open-source, or commercial annotation solutions such as Encord. Open-source data labeling tools are suitable for projects with limited funding, such as academia or research, or for when a small team is building an MVP (minimum viable product) version of an AI model. These tools, however, often don’t meet the requirements for large-scale commercial projects. Developing an in-house tool can be a time-consuming and costly endeavor, taking 9 to 18 months and involving significant R&D expenses. In contrast, an off-the-shelf labeling platform can be quickly implemented. While pricing is higher than open-source (usually free for basic versions), it is cheaper than building an in-house data labeling tool. With an AI-assisted labeling and annotation platform, such as Encord, data ops teams can manage and scale the annotation workflows. The right tool also provides quality control mechanisms and training data-fixing solutions. 4. Leverage software principles for DataOps Software development principles can be leveraged when scaling data labeling and training for a computer vision project. Since data engineers, scientists, and analysts often engage in code-intensive tasks, integrating practices like continuous integration and delivery (CI/CD) and version control into data ops workflows is logical and advantageous. 5. Implement quality assurance (QA) iterative workflows To ensure quality control and assurance at scale, it is crucial to establish a fast-moving and iterative process. One effective approach is to establish an active learning pipeline and dashboard. This allows data ops leaders to maintain tight control over quality at both a high-level and individual label level. 💡Here are our guides on 5 Ways to Improve The Quality of Data Labels and an Introduction to Quality Metrics in Computer Vision 6. Ensure transparency and audibility in the data and labeling pipeline Label transparency and audibility are essential throughout the data pipeline. A clear, user-logged, and timestamped audit trail is critical for projects in secure or regulated sectors like healthcare where FDA compliance is required. With new AI laws likely to come into force worldwide in the next few years, a data labeling audit trail could also become mandatory for commercial AI models in non-regulated industries. 💡 Find out more with our Best Practice Guide for Computer Vision Data Operations Teams Scaling Data Labeling Operations: Key Takeaways High-quality training datasets are essential for optimizing model performance. The function of data operations teams is to ensure the labeling quality and labeling workflow are smooth and frictionless. Follow these 6 best practices to scale your data operations properly: Design workflow-centric processes Select an effective workforce for data labeling Automate the data labeling process Leverage software principles for DataOps Implement QA iterative workflows Ensure transparency and audibility in the data and labeling pipeline With an AI-powered annotation platform, data ops managers can oversee complex workflows, make annotation more efficient, and achieve labeling quality and productivity targets. Are you ready to scale your data labeling operations and need a powerful AI-based software suite for computer vision projects? Sign-up for a free trial of Encord: The Data Engine for AI Model Development, used by the world’s pioneering computer vision teams. Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.

Jul 04 2023

4 M

sampleImage_webinar-semantic-visual-search-chatgpt-clip

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.