How Surgical Data Science Collective (SDSC) Conducted Video Annotation 10x Faster

Ulrik Stig Hansen
September 5, 2023
5 min read
Back to blogs
blog image

Surgical Data Science Collective (SDSC) is a data platform that provides surgeons with access to data and quantitative insights about procedures to expedite the training process and democratize access to safe surgery. Working with Encord, SDSC has increased the speed of annotations by 10x while simultaneously improving precision and accuracy.

Customer

Meet SDSC

SDSC is a non-profit organization dedicated to transforming surgery from an art to a science. With five core products, SDSC provides essential metrics on various procedures once videos are uploaded to the platform. For instance, the Kinematics model captures the movement of specific tools during a surgical procedure.

As Director of Machine Learning (ML), Margaux Masson-Forsythe is responsible for leading the ML roadmap at SDSC, defining the strategy of generating high-quality training data, managing the data pipeline, and overseeing the ML team.

Problem

A vast amount of video data requires technical knowledge for annotation and a need to connect their training data platform to a wider pipeline.

Before switching to Encord, the SDSC team faced three common problems: quantity of data, poor quality of annotations, and a lack of customizability and integrations.

Firstly, they faced a challenge in dealing with the vast amount of video data that required annotation. With each procedure split into 20 clips and each clip lasting approximately 15 minutes, the team had several Terabytes of data to annotate. Their previous tool suffered from a lot of latency issues when rendering videos, which hindered the labelers’ ability to effectively annotate at speed.

Secondly, the team discovered that around 20% of the annotations they had previously conducted were incorrect, with most of these coming in the form of inconsistent naming conventions on the same objects. Using Encord Active’s automated label error detection feature, the team could identify these errors that they attributed to: i) the absence of a robust annotation toolkit and ii) the requirement for a high level of technical knowledge and expertise to conduct and review annotations.

Lastly, the team had difficulty programmatically interacting with their training data platform and integrating it into their wider model production pipeline. They needed a working and usable Python SDK to create automated training data pipelines.

Solution

Leveraging Encord’s comprehensive Training Data Platform to conduct video annotations with state-of-the-art tools and unparalleled support.

After reviewing several solutions, the SDSC team chose to integrate Encord into their data pipelines. On the onboarding process, Margaux noted “Getting started with Encord and integrating it into our workflow was really fast. The thing that I find the most valuable is the flexibility of how we can integrate the Encord pipeline into our own pipeline, we use the Python SDK a lot”.

By natively rendering videos in the Encord platform, SDSC’s team was able to speed up annotation while increasing precision. Margaux praised the platform's support for video annotation, noting “How smooth [Encord Annotate] was and all of the different tools that come with labeling videos” and that “[Encord] definitely was the best platform we’ve seen and we were looking around different platforms”.

In order to solve their issues with incorporating expert review into their annotation workflows, the team used Encord Annotate's workflows to automate review by their labeling manager. Margaux explained that with Encord “We have a better reviewing system [...] that is the key component to having better quality datasets that we were missing before”. This allowed the annotators to get up to speed with complex annotations a lot quicker, without requiring experts to conduct annotations themselves.

Margaux praised Encord's analytics capabilities, noting that “Now I have this whole system where I get analytics from Encord and we’re going to populate that into a dashboard so we can see how the annotation is going up”. She also appreciated how quick Encord was to incorporate Meta’s Segment Anything Model (SAM) into the platform, stating “One feature that made me go with Encord was the integration of SAM in the [Encord Annotate] platform which was done really quickly after the model was released so I knew when there was a new computer vision model released it will be integrated into the platform quite fast - which is something that was also a really good point”.

Results

10x faster annotation whilst moving towards 0% annotation errors (previously 20%)

After integrating Encord into their wider data pipelines, SDSC was able to produce high-quality training data with quick annotations. With the help of Encord Active, the team identified that approximately 20% of the annotations completed on the previous tool were incorrect. The team is now “aiming to have 0% bad annotations” with the use of Encord’s platform.

Margaux discussed an upcoming project where SDSC would be annotating 100 hours of procedures (20 procedures at 5 hours per procedure) in four months and she expressed confidence in their ability to complete the task with Encord, in conjunction with their wider Active Learning pipeline. According to Margaux,“... we know we can do that now with Encord because of the whole process that we have, which compared to what we had before, it would be maybe one procedure every two months even, much slower”. This represents a 10x increase in efficiency, as SDSC would have previously been able to annotate only around 10 hours (2 procedures) in the same time frame. 

As SDSC continues to grow and increase model production, they will further scale their use of Encord Annotate in addition to building out more mature Active Learning pipelines using Encord Active, given their initial success with the automated label error detection feature.

author-avatar-url
Written by Ulrik Stig Hansen
Ulrik is the President & Co-Founder of Encord. Ulrik started his career in the Emerging Markets team at J.P. Morgan. Ulrik holds an M.S. in Computer Science from Imperial College London. In his spare time, Ulrik enjoys writing ultra-low latency software applications in C++ and enjoys exper... see more
View more posts

Think Encord could be a good fit for your team as well?

Book a demo

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.