Complete Guide to Open Source Data Annotation

Nikolaj Buhl
December 7, 2022
5 min read
blog image

Open-source annotation tools and software are widely used in the computer vision and machine learning sectors across hundreds of projects. 

In some cases, it can be an advantage to use an open-source tool, especially if a project or company is in the startup phase and there’s a limited budget for annotation work. Academic projects often find open-source tools useful, alongside the hundreds of open-source datasets (such as COCO). 

However, for commercial projects and use cases, there are downsides too. Open-source doesn’t always come with the tools and features machine learning and data operations teams need to manage projects effectively, efficiently, or at scale. 

In this article, we look at what open-source annotation tools are used for, provide more details on 5 of the most popular open-source tools, and then weigh up the pros and cons of using open-source tools, before comparing this to the option of using something more advanced. 

What is an Open-source Data Annotation Tool?

An open-source data annotation tool is a piece of software that’s specifically designed for image labeling and data annotation for image and video datasets. Annotation is an essential part of training computer vision models, as labels and data annotations are required to train models to produce the results/outcomes that organizations need. 

Open-source tools are free to use. Anyone can download and use them, so there’s no license fee or monthly subscription to pay, unlike Software as a Service (SaaS) products. Open-source tools are usually maintained by a foundation, similar to a charity, through community donations, or with sponsorship from tech companies. 

Scale your annotation workflows and power your model performance with data-driven insights
medical banner

What Would You Use an Open-source Labeling Tool For?

Finding the right open-source tool isn’t always easy. It depends on what you need it for, whether this is for image labeling, video labeling, or both. 

Or whether you need an open-source labeling tool with specific functionality for certain use cases, such as annotating medical imaging datasets. As we’ve covered that topic in previous articles, in this post we are focusing on more widespread computer-vision-based image and video use cases, such as smart cities, manufacturing, security, and sports analytics. 

Open-source labeling tools are used for everything from image segmentation to drawing bounding boxes, polylines, object detection, and numerous other annotations and labels on images and videos. You can use open-source tools for human pose estimation (HPE), and dozens of other computer vision (CV) project use cases. 

Image segmentation in Encord annotate.

Image annotation in Encord

Now let’s take a look at 5 of the most popular open-source data annotation tools for computer vision projects. 

What Are The Main Open-source Data Annotation Tools?

CVAT

The Computer Vision Annotation Tool (CVAT) started as an internal Intel project in 2017. Now it’s an independent company and foundation, with over 1 million downloads of their open-source image and video annotation software, and a passionate community of supporters and contributors. 

With CVAT, you can annotate images and videos by creating classifications, segmentations, 3D cuboids, and skeleton templates. CVAT is used across the healthcare, retail, manufacturing, sports, automotive, and aerial observation (drones) sectors. 

CVAT is an open-source project supported by Intel, under the OpenCV umbrella, and is free to use commercially, thanks to the permissive MIT license. CVAT’s core team will work with the OpenCV team to support the project, and OpenCV will support those migrating from the original CVAT.org to its new home, at CVAT.ai. 

MONAI Label

MONAI Label is an open-source image annotation tool that uses AI to automate annotation work. Although it’s primarily used in the medical and healthcare sectors, MONAI Label can be used for any kind of image annotation project. It’s an ecosystem that’s easy to install and can run locally on a machine with single or multiple GPUs. Both the server-side and client-side can work on the same or different machines, depending on what you need. 

LabelMe

LabelMe is an open-source “online annotation tool to build image databases for computer vision research” that emerged from the MIT Computer Science and Artificial Intelligence Laboratory.

LabelMe comes with the downloadable source code, a toolbox, an open-source version for 3D images, image datasets you can use for computer vision training projects, and the ability to outsource data labeling through Amazon Mechanical Turk

RIL-Contour

RIL-Contour is another open-source annotation tool that accelerates annotation projects using iterative deep learning (IDL). It was primarily designed for medical imaging datasets but can be used for any kind of image-based dataset for computer vision and machine learning projects. 

RIL-Contour is an open-source project with over 1000 contributors, with the schema and framework originating from ELIXIR, the European Infrastructure for Biological Information.

Sefexa

Sefexa is an open-source image segmentation tool. Sefexa was created by Ales Fexa, a software engineer in Prague with a passion for computer vision and mathematics. With Sefexa, you can use it to semi-automate image segmentation in image-based datasets, analyze images and export the findings into Excel, and create ground truth data from the images in a dataset. 

Now let’s look at the pros and cons of using open-source annotation tools. 

What Are The Pros and Cons of Using Open-source Annotation Tools?

Unfortunately, open-source tools come with several downsides, and here they are: 

Cons of Open-Source Annotation Tools

Buying vs. Building: Sunk cost fallacy turned upside down 

As most founders know, there’s an advantage to buying instead of building as it ensures your engineering team is devoted to developing your product. Otherwise, your developers could spend far too much time building non-core in-house tech solutions when there are hundreds of options on the market. 

In the video and image annotation space, open-source solutions represent a potential answer to the challenge of annotating and labeling thousands of images and video datasets. 

However, this is one area where the ‘sunk cost fallacy’ gets turned on its head. Some companies use these open-source tools ‘straight out the box’, or as the basis for building an in-house version. Unfortunately, as we outline below, open-source tools come with far too many downsides compared to the advantages of buying off-the-shelf and customizable premium annotation solutions that aren’t weighed the disadvantages of open-source annotation tools. 

Difficult to scale annotation projects 

One of the foremost challenges is scaling annotation projects. 

Image and video annotation projects usually involve annotating thousands of images and videos. Every single one needs labels and suitable annotations, such as bounding boxes, polygons, polylines, object detection, HPE, and anything else required. Annotation tools automate this process as much as possible. 

Open-source tools often come with technical limitations. They can operate slower, making projects take longer, and even when open-source tools come with automation features, those features from commercial vendors are often faster, more efficient, and more effective. 

However, automation is only possible once human annotators have given annotation software something to work with. With commercial and feature-packed annotation tools, scaling these projects is much easier and less time-consuming. Everyone can see the whole team’s work and more importantly, project leaders can monitor annotators and scale up and down accordingly. 

With open-source software, annotation teams can only share image and video datasets via cloud-storage solutions such as Dropbox. Making it more difficult to scale annotation projects, and right now you don't need any more headaches when managing an annotation project. 

Weak or limited data security, no audit trails 

Data security and audit trails are integral to computer vision and machine learning projects. 

With open-source tools, there are no audit trails, and data security is weak or non-existent. Ensuring your project stays compliant with relevant data protection laws, such as GDPR in Europe, CE certification, or CCPA in the US is difficult without the ability to track and monitor a basic audit trail and timestamps on images and videos. 

Project leaders can’t monitor annotation teams 

Open-source tools don’t give annotators the ability to monitor the work of annotation teams as cost-effectively as premium software. Because open-source tools aren’t cloud-based, project leaders can’t monitor the progress of annotators in real-time. There are no dashboards, so you can’t see who’s done what, who is performing well, and who isn’t. 

Benchmarking performance takes a lot more time and effort. Collaboration is reliant on annotators sending completed batches of images and videos through cloud-based shared folders, such as Dropbox and Box. 

Annotation projects often take more time, especially if re-annotation and re-labeling are required, or accuracy is low. When projects are on a tight deadline and accurate training data is needed quickly, using an open-source tool could cost your team time you can’t afford to waste. 

Pros of Open-source annotation tools 

Free to download and use! 

One of the best, and main reasons to use open-source annotation tools is the price: they’re free! 

Annotation work is time-consuming. Getting your hands on any kind of tool that accelerates this work is a bonus, even more so if you don’t have to pay for it. 

For startups and academic projects, an open-source tool could be the right solution, especially when you’ve got to cover the budget for a team of annotators, and machine learning, computer vision, or data ops engineers to pay for too. When annotation budgets are tight, every penny helps. 

Adaptable and editable software 

Another advantage of open-source tools is they’re adaptable and editable. Open-source tools usually publish their source code and documentation, so if the tool doesn’t align with exactly what you need there are ways to adapt and modify it accordingly. Plus, you can use plugins, APIs, and other technical adaptations and workarounds to modify open-source software to your exact requirements. 

Community support 

Unlike proprietary and premium annotation software, where the support comes from the company, open-source projects are often surrounded by large and active communities. These are people who are either software users or have contributed to the development of the software. You can always count on these communities to answer any questions you might have, as others are likely to have encountered similar challenges during annotation and labeling projects. 

However, given the nature of the support from commercial tools, many would argue that this usually beats answers a community can provide, especially when you’re on a deadline and need a solution to a problem fast. 

When Should You Look at Using Commercial Annotation Tools?

When we factor in the challenges of using open-source tools effectively, and efficiently, with the workflow oversight required, collaboratively, and at scale, there’s a good reason many project leaders turn to and prefer commercial software solutions. 

With solutions such as Encord, you benefit from an easy-to-use, collaborative interface. You need to be able to manage annotators in different countries and work with other teams as required. You can’t do this as easily when annotators have their own local version of the software and are sharing files using services such as Dropbox. 

Automation features are equally important. Automation features can save annotation teams a massive amount of time. For example, interpolation, which can match pixel data from one image to the next and ensure that annotators can draw interpolation labels in any direction is a huge time saver. Let’s face it, anything that can save annotation teams time is worth doing! 

A project dashboard with built-in quality control processes and features is equally useful. It’s essential for the smooth running of any annotation project. For project managers, this can make the difference between the success or failure of an annotation project. 

Audit trails and data compliance are equally valuable, especially in sectors with stringent levels of regulatory compliance to align with, such as healthcare and anything to do with defense. 

Scale your annotation workflows and power your model performance with data-driven insights
medical banner

Wrapping up 

There are numerous advantages to using open-source tools. Especially for startups and academic projects. In a commercial scenario, an open-source tool could be a good starting point for developing your own in-house proprietary annotation solution or deciding what you need when buying an off-the-shelf solution. Although if you want to save time, buying is always the quickest route, compared to building!

Despite certain downsides, open-source annotation tools will continue to be popular and evolve to adapt to the changing needs of the market, businesses, and organizations that require annotation software for video and imaging datasets. 

Ready to automate and improve the quality of your data labeling? 

Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams. 

AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today

Want to stay updated?

Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.

author-avatar-url
Written by Nikolaj Buhl
Nikolaj is a Product Manager at Encord and a computer vision enthusiast. At Encord he oversees the development of Encord Active. Nikolaj holds a M.Sc. in Management from London Business School and Copenhagen Business School. In a previous life, he lived in China working at the Danish Embas... see more
View more posts
cta banner

Build better ML models with Encord

Get started today
cta banner

Discuss this blog on Slack

Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI

Join the community

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.