Dr. Andreas Heindl August 30, 2022

Introduction to Medical Image Labeling for Machine Learning

blog image

One of the most radical improvements in modern-day medicine has been the application of machine learning to evaluate medical images for diagnostic purposes. This exciting new landscape has a lot of potential, but it relies on labeling and annotating medical images to create the high-quality training data these medical machine learning models need.

In this guide, we’ll explore the important considerations for robust and effective medical image annotation – and what it means for the future of the medical industry.

What is Medical Image Annotation?


Encord DICOM annotation tool

A computer vision model relies on a large training data set that informs the underlying neural networks and delivers on the promise of high-precision medical diagnostics. In order to get that training dataset, you need annotated examples of the medical images so that the algorithm knows what to look for. These annotations are provided by experts in that particular field and form the basis for all of the innovation that comes with the application of artificial intelligence.

The best example of this is a radiologist who uses an annotation platform to note down their opinion of a scan, which in turn trains the neural network accordingly. Companies can build their own labelling platform or take advantage of third party medical imaging labeling tools (take a look at this blog for the pros and cons of each approach). Whatever you decide to do, the better your approach to labeling your DICOM or NIfTI images, the better your model will perform.

Common Machine Learning Use Cases for Medical Imagery

To give you a taste of where this becomes useful, here are some of the common use cases where medical image labeling can help:

  • Pathology – For the vast majority of diseases, most of the diagnostic capabilities come from various scans and images that are taken by highly specialized medical equipment. By labeling these scans accurately, we can train machine learning models to pick up those diseases themselves, minimizing the need for human involvement.
  • Cancer Detection – Cancers are notoriously difficult to diagnose and so medical image annotation can help us to train models that spot them earlier and more accurately than humans can – which can make a huge difference to patient outcomes. Just by automatically screening for the most common cancers, we can make drastic improvements in early detection.
  • Ultrasound – By annotating ultrasound images we can use artificial intelligence to pick up higher levels of granularity for things like gallbladder stones, fetal deformation, and other diagnostic insights. The quicker we understand what we’re dealing with, the better the care will be.
  • Microscopy – In medical research, we rely a lot on what we can examine under a microscope to understand what’s happening at the lowest level of abstraction. By labeling these images and applying them as a training dataset, we can push medical research forward and scale our impact as a result.

The beauty of machine learning is that there are many more use cases to discover as we start to work with this data and let the algorithms do their thing. This really is at the forefront of the future of medicine and the quality of the annotations is going to be a major factor in how things evolve.


Windowing preset feature on the Encord medical image annotation tool

Important Considerations When Preparing Medical Imaging Data for Machine Learning

Medical image annotation relies on high levels of precision because of the complexity involved as well as the stakes under which these models are going to be used. In order to pass FDA guidelines and to actually make it to production, the data must be of the highest quality possible – to alleviate regulatory concerns and to create a stronger and more effective machine learning model.

In order to do this at scale, companies need to make it as easy and intuitive as possible for annotators to capture the required information. The time of these experts is very expensive and so the more efficient and frictionless the annotation process, the better quality data you’ll get and the more you can control costs.

There are four key considerations that should be prioritized when you’re tasked with annotating medical imaging data:

  • Volume of Data. As with all of machine learning, the more data that you have to train with, the better the model is going to perform. This assumes a certain level of data quality of course, but wherever you can – you should try to increase the size of the training set as much as you can.
  • Data Distribution. In the medical field, there is tremendous diversity in terms of human bodies and that needs to be reflected in your data. You should be proactive in ensuring that you have sufficient distribution across demographic factors like age, gender, geography, hospitals, previously diagnosed conditions, and so on. This diversity is crucial if you want your model to be effective in the real world.
  • Data Formats. Medical images can come in a variety of different formats including DICOM, NIfTI, ultra-high resolution video, and others. Your annotation process needs to handle these formats natively so that you don’t lose any detail or information along the way. This ensures that you’re getting the most out of your workflow and that you know it can fit into the existing medical system that you want to innovate in.
  • Data Visualization. Wherever possible you should strive to view the data in 3D so that you can get the full picture of whatever medical image is being annotated. You want to provide the annotator with everything that they need to provide an accurate evaluation and that means that you need to be thoughtful and intentional about how you present the images to them.

This list is obviously not exhaustive, but it should give you a sense of the sorts of things to think about when building a medical image annotation workflow.


Brush selection tool for annotating DICOM and NIfTI images in Encord

The Importance of Data Security for Medical Image Labeling

Alongside all the considerations about the quality of the data being captured and the efficiency of the process, we also need to think carefully about the security of the images that you’re annotating. The labeling tool that you’re using needs to conform to today’s security best practices to reassure your stakeholders and customers that you’re taking data security seriously.

There are two key regulatory frameworks that you should be aware of here if you’re looking for an external medical image annotation tool:

  • SOC 2 defines criteria and benchmarks for managing customer data and is measured through an external audit that evaluates data security practices across the board. 
  • HIPAA (or the Health Insurance Portability and Accountability Act) is a Federal law that deals with the protection of sensitive patient health information. This is a non-negotiable for whoever is providing your data labeling tool.

As well as the data security credentials of your annotation tool provider, you also need to carefully control the permissions available to your annotators. You want to have very granular access controls so that they only see the absolute minimum that they need in order to do their job. All of this should be tied up in a product where you retain the rights to your data and models – which simultaneously protects your IP and makes it easier to ensure high-quality data protection from source all the way to final outputs.

If you're looking for a medical imaging labeling tool, get in touch for a demo of Encord