Guide to Experiments for Medical Imaging in Machine Learning

Dr. Andreas Heindl
November 25, 2022
5 min read
blog image

In the scientific, and especially data science community, the word “experiment” means to test a hypothesis until empirical data agrees or conflicts with an experiment's desired outcomes. Machine learning medical imaging experiments need to be rigorous. 

In medical imaging machine learning experiments, this involves testing dozens of datasets using machine learning models to achieve higher levels of accuracy, until the artificial intelligence model can be put into production. 

Running medical imaging dataset experiment is an essential part of building a stable, robust, and reliable computer vision model (such as a tool for use in oncology). The outcomes of these experiments are even more important when building models for healthcare; you have to be even more confident in the accuracy of the results, as this could influence a life-or-death decision for patients.  

However, running multiple experiments can quickly become a massive challenge. Managing the models, datasets, annotators, and experiment results is a full-time job. An inefficient workflow for managing these experiments can make these problems much worse. 

In this article, we will look at how to increase the efficiency and effectiveness of your medical imaging dataset experiments to create state-of-the-art models. 

Why Do You Need to Run Experiments For Deep Learning Models?

Running experiments for machine learning and computer vision models is crucial to the process of creating a viable and accurate production model. At the experimental stage, you need to figure out which approach will work and which won’t. 

Once you’ve got a working model and a source of ground truth (dataset), then you can scale and replicate this approach during the production stage to achieve the project outcomes and objectives. 

Reaching this goal means going through dozens of experiments. It’s a time-consuming task, and running experiments is a full-time job. You need a team of annotators, a large volume of high-quality data (medical imaging datasets of tumors or lesions, for example), and the right tools to make this work easier. At every stage, the results should gradually improve, until you’ve got a viable and accurate model and process. 

Before starting any experiment cycle, it’s important to know the key parameters you want to track. For example, hyperparameters, model architectures, accuracy scores, loss measures, weighting, bias, gradients, dependencies, and other model metrics. Once the experiment outcomes, goals, and metrics are clear, then you can start running machine learning imaging dataset experiments. 

Collaborative DICOM annotation platform for medical imaging
CT, X-ray, mammography, MRI, PET scans, ultrasound
medical banner

Why is it More Important to Create Experiments For Medical Imaging Datasets?

In the healthcare sector, medical image machine learning and computer vision models play an integral role in patient diagnosis, our understanding of diseases, and numerous other medical fields. 

Medical images come from numerous sources (including Magnetic Resonance Imaging (MRI), X-rays, and Computed Tomography (CT images)) for a range of conditions, such as Alzheimer's disease, lung cancer, or breast cancer. Unlike other datasets in other sectors, medical images come in more complex formats, including DICOM and NIfTI. These widely-used medical image file formats have several layers of data, such as patient information, connections to other databases, and appointment details. 

Even when patient training data is anonymized, the layers and formats make medical imaging datasets more detailed and involved than you will find in other sectors. 

DICOM annotation in Encord

An example of DICOM annotation in Encord

Alongside these complications, project leaders have to weigh the necessity of gaining regulatory approval for working models, clear audit trails, and enhanced data security. Remember, the ultimate outcome of any medical machine learning model could directly impact patient healthcare treatment outcomes. Accuracy and keeping bias as low as possible are essential. 

For example, a slight inaccuracy when analyzing music preference data isn’t going to hurt anyone. Whereas, with medical imaging datasets, the accuracy and results can have serious, life-changing outcomes for patients worldwide. Hence the need to test as much data as possible. Not only is this important to ensure a robust model for primary use cases; but you also need to assess datasets and models against a wider range of edge and corner cases. 

What Does The Ideal Experiment Workflow Look Like?

Machine learning medical imaging dataset experiments have tried and tested workflows that improve efficiency. Before starting ML-based experiments, you need to ensure you’ve got the right components to start running experiments. 

Components of a machine learning experiment workflow need to include: 

  • Dataset(s): in this case, medical imaging datasets from the right medical fields, specialisms (such as radiology), image sources, and file formats; 
  • A hypothesis with a range of hyperparameters and variables to test; 
  • Project outcomes and goals, including the relevant benchmarking and accuracy targets; 
  • Experiment iteration cycle frameworks, e.g. the number of experiments you’ve got the resources and time to run; 
  • Other relevant experiment components, such as the metadata needed and model architecture. 

Once these components are ready, you can start running machine-learning experiments on medical imaging datasets. An ideal workflow should involve the following: 

  • Outline the experiment hypothesis, parameters, and variables; 
  • Source the data (either open-source datasets or in-house data); 
  • Ensure the right annotations and labels are applied to a series of segments within these datasets. Not the entire dataset, because at this stage you simply need enough images to run small-scale experiments. You can use automated image annotation tools and software, such as Encord, to accelerate this phase in the project. 
  • Once the annotated datasets are ready, and the machine learning or computer vision algorithms in place to run these experiments, they can begin. 
  • Each experiment could take one or two weeks. Running a whole series of experiments and iterating on the results, reducing bias and increasing accuracy could take anything from 1 to 6 months before the experiment outcomes and datasets are ready to go into production. 
  • Experiment results determine when it’s possible to put a machine-learning model into production. 
  • Ongoing monitoring of these experiments, the outcomes, and audit trails are equally crucial. Especially in the healthcare sector. Project leaders need a 360 overview (with a few clicks of a mouse) of the entire experiment lifecycle and every iteration, right down to the granular level, including detailed oversight of the work of the annotation teams. 
  • Once the ideal outcome has been achieved, you need to ensure the configuration of the machine learning model that produced that outcome is the one used for the production model. Make sure the annotations and labels used in the most successful iteration of the experiment are carried over and replicated across the entire medical imaging datasets. 

How Does Collecting More Data Improve Experiment Outcomes?

With machine learning experiments, or any computer vision or AI-based experiments, the more data you have the better. Especially when it comes to medical imaging ML model experiments for most use cases. 

However, it’s important to remember that quality and diversity are as important as the volume of data. Medical imaging data should include the most relevant clinical practice data possible for the experiments. Such as having enough images with positive and negative cases, different ethnic groups, and either including or excluding the relevant edge cases; e.g. patients who have or haven’t received treatment. 

DICOM image ontology in Encord

Example of a DICOM image ontology in Encord

Getting a high volume of data is crucial. But the quality and diversity of the datasets you’ve got available matter too. As does the quality and accuracy of the annotations and labels applied and reviewed by skilled radiologists and clinicians. 

What Happens If You Get The Wrong Machine Learning Experiments Outcomes? 

During machine learning experiments, most go wrong or fail in some way. That’s not unusual. As most data scientists and clinical ops managers know, this is normal. You might have 100 experiments running and only 10 to 15 produce outcomes close to what you need. A failure isn’t a setback. 

In fact, following the scientific methodology, failures simply get you closer to successful outcomes that validate a hypothesis. Even if a hypothesis is invalidated, that’s a positive too, as it will help you refocus efforts on the right perimeters and valuators to test a new theory. Or in some cases, a negative outcome could be the goal behind an ML-based experiment. 

So, it’s useful to never see failure as a negative but to learn from the experiments that fail and move forward with the learnings from those that have achieved the desired outcomes. 

Only this way can you successfully put a machine learning model into production. 

How Can The Right Experiment Workflow Improve Experiment Efficiency? 

With the right tools, processes, and systems, project and clinical ops managers can create efficient medical imaging machine learning project workflows. 

Open-source tools can be a great starting point but can make it harder to develop the scope of your projects. For example, open-source tools can reduce efficiency, make scaling difficult, weaken data security, and monitoring or audit annotators’ work is almost impossible. 

Instead, medical image dataset, annotation, and machine-learning teams benefit from using proprietary automated image annotation tools to improve experiment efficiency. 

Encord has developed our medical imaging dataset annotation software in close collaboration with medical professionals and healthcare data scientists, giving you a powerful automated image annotation suite, fully auditable data, and powerful labeling protocols.

Ready to automate and improve the quality of your medical data annotations? 

Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams. 

AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today

Want to stay updated?

Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.

author-avatar-url
Written by Dr. Andreas Heindl
Dr Andreas Heindl is a Machine Learning Product Manager at Encord. He has spent the past 10 years applying computer vision and deep learning techniques in Healthcare at Encord, The Institute of Cancer Research, and Kheiron Medical. The main focus of Andreas' research and work until now has... see more
View more posts
cta banner

Build better ML models with Encord

Get started today
cta banner

Discuss this blog on Slack

Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI

Join the community

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.