Data Annotation Tooling: Build vs. Buy: Lessons from Practitioners
Until recent years, any organization that wanted to scale data annotation, machine learning (ML), computer vision (CV) and other artificial intelligence-based (AI) projects had to build their own data annotation and labeling tools.
Or failing that, use a combination of in-house tools and open-source annotation software to attempt to implement computer vision projects.
Now technical leaders have a wide range of off-the-shelf data labeling, annotation, and active learning platforms to choose from. Whether you’re a CTO at an early-stage or growth-stage startup, or a Head of AI, Head of Computer Vision, or Data Operations leader at a larger organization, there’s a lot of choice in this market.
And yet, the question is still something technical and ML leaders think about: “Should we build or buy an annotation tool?”
This article aims to answer this question with insights from data annotation team leaders and practitioners.
Why do data annotation teams need a labeling tool?
Even now, with every technical advantage we have, annotating and labeling images or video-based datasets is a massively time-consuming part of any computer vision project.
The quality and accuracy of data annotation labels are crucial. Poor-quality labeled data can cause huge problems for machine-learning teams.
One of the best and fastest ways to improve the quality and accuracy of your labeled data is to use artificial intelligence (AI-assisted) labeling tools. AI solutions save time and money.
Now comes the question, “Can we build our own or get an out-of-the-box solution?”
Let’s see what data annotation leaders and practitioners have to say . . .
Does your software engineering team have the time/resources to build a data annotation solution?
Building an in-house solution is time-consuming and expensive. It can take anything from 9 to 18 months, costing 6 to 7 figures of in-house resources and taking over the working schedules of several engineers.
As one sports analytics Encord customer found (before they came to us), “An in-house tool and interface for data annotation had limitations: it took months to build and refine, and the result was a single-purpose tool.”
“When they needed new functionality, it took the in-house engineers months to redesign and reconfigure the tool. On the other hand, “Encord can build out a new ontology in a matter of minutes. Spending months building an in-house tool for each specific annotation task wasn’t a feasible, sustainable, or scalable strategy.”
That client confirms that in-house resources were better spent elsewhere: “Before using Encord, the ML team had to take the safe route because of the high cost of pursuing new ideas that failed. With a multi-purpose, cost-effective annotation tool, they can now iterate on ideas and be more adventurous in developing new products and features.”
How long would it take to build a data annotation tool for your project(s)?
Building an in-house annotation tool can take months. It all depends on:
- The volume of an image or video-based datasets you need to annotate;
- The functionality the platform needs;
- The number of annotators who are going to use the platform,
- The time you’ve got, as an ML or data ops leader, to get this solution to market, so you can start using it to annotate images and videos (before beginning the process of training a data model);
- How scaleable this tool needs to be: What other projects will it be needed on in the future?
With that in mind, an engineering team can start estimating project build time. Or, if you’ve got the budget, the outsourcing costs of having a third-party software development company complete the project.
Either way, we are talking months of work, a large capital budget’s required, and a project leader is needed to oversee it. Once complete, you’ll need in-house developers familiar with the annotation software to fix bugs, maintain it, and implement any upgrades and new features/functionality it needs.
Would it make more sense to outsource development to a third party?
In some cases, outsourcing development to low-cost regions, such as Central & Eastern Europe (CEE), can cost less than building in-house. Especially when you compare the cost of engineers and data scientists in those regions vs. US or Western European professionals with those same skills.
However, the challenges are similar to building in-house. The project still needs managing. Once ready, an in-house team must look after, debug, maintain the tool, and implement new features and functionality.
Advantages of Buying a Data Annotation Tool
Instead of going the in-house or outsourced build route, many organizations are making the financially and time-based decision to buy an out-of-the-box solution, such as Encord.
Dr. Hamza Guzel, a Turkish Ministry of Health radiologist, explains the advantages of using Encord for medical image data annotation.
Dr. Guzel also works with Floy, a medical AI company developing technology that assists radiologists in detecting lesions, helping them prepare the medical imaging data used to train their machine learning models.
Floy had numerous problems with other commercial off-the-shelf solutions and didn’t consider building one because of the time and cost involved. So, the solution was to switch to Encord for CT & MRI annotation and labeling.
“The organizational issue was not a problem in Encord, and with Encord’s freehand annotation tool, we can label data however we want. We can decrease the distance between the dots on boundaries to work at the millimeter scale that we need to label lesions and other objects precisely. Labeling is also a smooth experience– it’s very easy to draw on the image and move from one image slice to another.”
“It’s also fast. I didn’t realize how slow the other platforms were, or how fast labeling could be until we switched to Encord.”
“Using Encord, we reduced the labeling time for CT series by 50 percent and 25 percent for MRI series.”
In Conclusion: Should I Build or Buy a Data Annotation Tool?
Depending on your data annotation needs, here are five features that the best out-of-the-box solutions have, such as Encord.
If all of those features sound familiar (and we’ve introduced more since then, such as Encord Active and the Annotator Training Module), you have to ask yourself, do we have the in-house time/resources to build something similar ourselves?
Or would it be easier to avoid capital outlays and project management headaches and simply buy an off-the-shelf data annotation solution?
In every way, buying a data annotation tool is:
- Far less expensive than building
- Less time consuming (you can be set up in minutes instead of months)
- Significantly faster for getting machine learning, and computer vision models production ready
- More flexible (better functionality, including APIs and SDKs)
As one G2 review says: “Encord has helped us streamline our data pipelines and get our training data into one place. We've managed to build fairly seamless integrations using the flexible API.”
“We've also been using some of the customizable dashboards & reports in our reporting, which has been a plus. The user interface is easy to navigate, and the object detection annotation tools (bounding box, etc.) are very expansive in functionality as we can define rich ontologies supported in the platform.” Benjamin, Data Scientist at a Mid-market company using Encord.
Another review says: “Encord's DICOM annotation solution is solving the problem of inefficient and time-consuming image annotation and workflow management for building training datasets for medical AI. By streamlining these processes, it is saving our team a lot of time and increasing our overall productivity.”
“Additionally, the quality control features ensure that all images are of the highest quality, providing peace of mind for both radiologists and our ML research team which has helped with going through FDA clearance. Overall, this product is greatly benefiting our team by making our annotation work more efficient and organized.” Thomas, Clinical Machine Learning Engineer.
At Encord, our active learning platform for computer vision is used by a wide range of sectors - including healthcare, manufacturing, utilities, and smart cities - to annotate human pose estimation videos and accelerate their computer vision model development.
Encord is a comprehensive AI-assisted platform for collaboratively annotating data, orchestrating active learning pipelines, fixing dataset errors, and diagnosing model errors & biases. Try it for free today.