How Harvard Medical School and MGH Cut Down Annotation Time and Model Errors with Encord

Ulrik Stig Hansen
January 18, 2024
5 min read
blog image


A new paper published in MDPI (Multidisciplinary Digital Publishing Institute) demonstrates how, using the Encord platform, researchers at Harvard Medical School, Massachusetts General Hospital, and Brigham and Women’s Hospital were able to reduce vascular ultrasound annotation time from days to minutes and run automated analyses of their datasets.

Using Encord, the team was able to:

  • Create their first segmentation models by labeling only a handful of images
  • Cut annotation time through segmentation models by an order of magnitude
  • Visually explore their dataset and identify problematic areas - in their case, the impact of blur on their dataset
  • Evaluate the performance of their segmentation models in the Encord platform 

Problem: DUS Image Annotation is Resource-Intensive and Prone to Human Error

Medical imaging, particularly Arterial Duplex Ultrasound (DUS), plays a crucial role in diagnosing and managing vascular diseases like Popliteal Artery Aneurysms (PAAs). The traditional method of analyzing DUS images relies heavily on manual annotation by skilled medical professionals.

This process is fraught with challenges:

  • Time-consuming—especially with the growing volume of medical imaging data.
  • Prone to human error.
  • Heavily dependent on expertise and experience - furthering how resource-intensive the process becomes

The subjective nature of manual annotations can lead to inconsistent measurements and interpretations due to inter- and intra-observer variability during annotation. This raises concerns about the reliability and reproducibility of the results and could impact the accuracy of diagnoses and treatment plans for patients.

The primary issue in this research paper lies in precisely annotating the inner and outer lumens of the artery in images - a critical step for accurate measurement and subsequent treatment planning.

Solution: Encord Annotate to Auto-Label DUS Images and Encord Active to Validate Model Performance

The study tested the feasibility of the Encord platform to create an automated model that segments the inner and outer lumen within PAA DUS. Using image segmentation to find the largest diameter and thrombus area within PAAs helped standardize DUS measurements that are important for making decisions about surgery. 

Using Encord Annotate for Automated Annotation

The researchers collected and prepared (deidentification and extraction) a dataset comprising DUS images of PAAs for upload to Encord before annotating a few images to serve as ground truth for the annotation models using Encord Annotate.

Using Encord Annotate’s automated labeling feature, they could generate segmentation masks for unlabeled images. This reduced the time and effort required for DUS image analysis while minimizing the potential for human error. 

Using Encord Active to Select the Best-Performing Model

They trained three models and validated them with Encord Active on the annotated images (20, 60, and 80 sets). Encord Active enabled the researchers to understand the performance metrics that helped them select the best model for segmenting the inner and outer lumens of the popliteal artery with high precision. 

light-callout-cta After training models on image subsets, we tested them within the Encord platform. We selected the desired tests in the analysis tab of the project, and after a runtime period, the platform presented calculations of true positives, false negatives, mAP, IoU, and blur.

The report referenced Encord’s ability to seamlessly integrate into clinical processes with a user-friendly interface, simple onboarding, and rapid annotation workflows as crucial to the study's success. For healthcare practitioners who use the platform, this improves their diagnostic process without disrupting established procedures.


Encord Reduced Annotation Time from Days to Minutes

Where manual annotation could take several minutes per image, the researchers accomplished the task in a fraction of the time using Encord. Their workflow went from relying on RPVI-certified physicians manually annotating DUS images that took days to use Encord to annotate a few images, train models, and auto-label unlabeled images in minutes. 

This efficiency proves crucial in clinical settings, where timely diagnosis and treatment decisions can significantly impact patient outcomes.


Figure 1. AI segmentation classifications on duplex ultrasound images. (A) Outer polygon true-positive classification, where the color green indicates a correct segmentation. (B) Outer polygon false-positive classification, where red indicates an incorrect segmentation. (C) Inner polygon true-positive classification, where the color green indicates a correct segmentation. (D) Inner polygon false-positive classification, where red indicates an incorrect segmentation.

Better Evaluation and Observability of Model Performance with Encord Active

The researchers quantitatively assessed the performance of the three models with Encord Active providing analytics on the following metrics: 

  • mean Average Precision (mAP). 
  • Intersection over Union (IoU).
  • True Positive Rate (TPR).

Encord Active calculated the outer polygon's mAP to be 0.85 for the 20-image model, 0.06 for the 60-image model, and 0 for the 80-image model. The mAP of the inner polygon was 0.23 for the 20-image, 60-image, and 80-image models. The true-positive rate (TPR) for the inner polygon remained at 0.23. See the full results in the table below:


“With regard to the models for outer and inner polygons, the outer polygon model

outperformed the inner polygon model on every metric. The outer polygon demonstrated almost equal precision and recall at 0.85. The mAP for the outer polygon model was 0.85 with a true-positive rate of 0.86, which is comparable to other clinically used high-performing models for US segmentation.”

With Encord Active automatically providing model evaluation analytics, the team instantly discovered the model's strengths and weaknesses. For every model they trained, Active provided breakdowns and graphs on its performance, including the ability to visually explore the regions the model incorrectly segmented vs. the ground truth.

Encord Active Uncovered Blurry DUS Images that Could Degrade Annotation Model Performance

The researchers used Encord Active to explore the model's performance depending on the blur level, allowing them to visually interact with varying levels of blur in their dataset to understand how this impacted model performance.

The paper states, “Intuitively, our analysis found that as the images became blurrier, the model precision declined, and false-negative rates increased... Removing blur from—or augmenting—blur in images can be important for training accurate AI models.”


light-callout-cta In summary, the platform’s intuitive navigation, complemented by tutorials for both model training and analysis, allowed for straightforward operationalization of the model training system among members of the research team. The results were displayed in an understandable format and interpreted within the following discussion.

The findings have far-reaching consequences for medical imaging and diagnosis. The researchers greatly improved the accuracy, reliability, and efficiency of DUS image analysis by auto-annotating images with Encord Annotate and validating annotation models with Encord Active. This could result in potentially better patient care, treatment planning, and diagnostic procedures.

At Encord, we are committed to continually providing healthcare practitioners and physicians with the data-centric AI platform they need to improve their medical imaging and analysis workflows. 

We’re proud of the work the researchers were able to accomplish and how Encord is paving the way for broader applications of AI in various aspects of medical diagnostics.  

light-callout-cta 📑 Read the full paper on MDPI (Multidisciplinary Digital Publishing Institute).

Written by Ulrik Stig Hansen
Ulrik is the President & Co-Founder of Encord. Ulrik started his career in the Emerging Markets team at J.P. Morgan. Ulrik holds an M.S. in Computer Science from Imperial College London. In his spare time, Ulrik enjoys writing ultra-low latency software applications in C++ and enjoys exper... see more
View more posts

Think Encord could be a good fit for your team as well?

Book a demo

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.