Software To Help You Turn Your Data Into AI
Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.
In machine learning, precise image annotation is crucial for training accurate and reliable models. Encord's Bitmask brush tool revolutionizes the annotation process by allowing interactive and fine-grained selection of regions of interest within images. Designed to cater to the needs of machine learning practitioners, this comprehensive guide will walk you through the ins and outs of utilizing Encord's Bitmask brush tool, empowering you to create precise and highly accurate annotations within the Encord platform.
A bit mask brush allows you to interactively define regions or areas of interest within an image by "brushing" over them. As you paint or brush over the image, the bit mask brush assigns specific ‘bits’ or values to the corresponding pixels or regions you select. These bits represent the labels or categories associated with the selected areas.
For example, if you are labeling outlines of blood vessels in an image, you can use a bit of mask brush to brush over the pixels corresponding to the vessel’s boundaries. The bit mask brush would assign a specific value or bit pattern to those pixels, indicating that they belong to the vessel class or category.
Similarly, if you are labeling topologically separate regions belonging to the same frame classification, you can use a bitmask brush to assign different bit patterns or values to the regions you select. This allows you to differentiate between regions or segments within the same frame category.
The Bitmask brush is a powerful tool for creating annotations or labels by selecting specific regions within an image, providing flexibility and control over the labeling process. Let’s explore its key functionalities:
When the Bitmask annotation type is selected, the brush tool is automatically chosen by default. You can access it by clicking the brush icon or pressing the 'f' key, and you are able to adjust the brush size using a convenient slider. This enables you to tailor the brush size to the level of detail needed for your annotations.
Once you have adjusted the brush size, you can begin annotating your image by selecting the desired areas. As you brush over the regions, the Bitmask brush assigns specific bit patterns or values to the corresponding pixels, indicating their association with the selected labels or categories.
Once your annotation is complete, you can apply the label by clicking the "Apply label" button or pressing the Enter key, finalizing the annotation and incorporating it into the labeling or annotation process.
The Eraser tool provides the ability to erase parts or the entirety of your bitmask selection. This can be useful if you need to refine or modify your annotations before applying the final label. You can access the Eraser tool by clicking the eraser icon or pressing the 'h' key on your keyboard while the popup window is open.
The Threshold brush, specific to DICOM images, offers additional functionality by enabling you to set an intensity value threshold for your labels. The preview toggle allows you to visualize which parts of the image correspond to your set threshold, helping you determine the areas that will be labeled when covered by the Threshold brush.
To access the Threshold brush, click the corresponding icon or press the 'g' key while the popup window is open. Adjust the brush size and the range of intensity values using the sliders in the popup.
The Encord Bitmask SDK empowers you to effortlessly generate, modify, and analyze annotations within the Encord platform, leveraging the vast capabilities of Python's comprehensive libraries and tools to their fullest extent.
To conclude, Encord’s Bitmask brush tool, equipped with its diverse range of features, offers an intuitive and flexible solution for creating annotations within the Encord platform. Harnessing the power of the Bitmask brush and the Encord Bitmask SDK, you can elevate your annotation workflow to achieve precise and reliable results.
Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI
Join the communityRelated Blogs
In the current AI boom, one thing is certain: data is king. Data is at the heart of the production and development of new models; and yet, the processing and structuring required to get data to a form that is consumable by modern AI are often overlooked. One of the most primordial elements of intelligence that can be leveraged to facilitate this is search. Search is crucial to understanding data: the more ways to search and group data, the more insights you can extract. The greater the insights, the more structured the data becomes. Historically, search capabilities have been limited to uni-modal approaches: models used for images or videos in vision use cases have been distinct from those used for textual data in natural language processing. With GPT-4’s ability to process both images and text, we are only now starting to see the potential impacts of performant multi-modal models that span various forms of data. Embracing the future of multi-modal data, we propose the Search Anything Model. The unified framework combines natural language, visual property, similarity, and metadata search together in a single package. Leveraging computer vision processing, multi-modal embeddings, LLMs, and traditional search characteristics, Search Anything allows for multiple forms of structured data querying using natural language. If you want to find all bright images with multiple cats that look similar to a particular reference image, Search Anything will match over multiple index types to retrieve data of the requisite form and conditions. {{product_hunt}} What is Natural Language Search? Natural Language Search (NLS) uses human-like language to query and retrieve information from databases, datasets, or documents. Unlike traditional keyword-based searches, NLS algorithms employ Natural Language Processing (NLP) techniques to understand the context, semantics, and intent behind user queries. By interpreting the query’s meaning, NLS systems provide more accurate and relevant search results, mimicking how humans communicate. The computer vision domain requires a similar general understanding of data content without requiring metadata for visuals. {{gray_callout_start}}💡Encord is a data-centric computer vision company. With Encord Active, you can use the Search Anything Model to explore, curate, and debug your datasets. {{gray_callout_end}} What Can You Use the Search Anything Model for? Let’s dive into some examples of computer vision uses for the Search Anything Model. Data Exploration Search Anything simplifies data exploration by allowing users to ask questions in plain language and receive valuable insights. Instead of manually formulating complex queries and algorithms that may require pre-existing metadata, you can pose questions such as: “Which images are blurry?” Or “How is my model performing on images with multiple labels?” Search Anything interprets these queries to provide visualizations or summaries of the data quickly and effectively to gain valuable insights. {{Training_data_CTA}} Data Curation Search Anything streamlines data curation, making the process highly efficient and user-friendly. Filter, sort, or aggregate data using only natural language commands For example, you can request the following: “Remove all the very bright images from my dataset” Or “Add an ‘unannotated’ tag to all the data that has not been annotated yet.” Search Anything processes these commands, automatically performs the requested actions, and presents the curated data all without complex coding or SQL queries. Using Encord Active to filter out bright images in the COCO dataset. Use the bulk tagging feature to tag all the data. Data Debugging Search Anything expedites the process of identifying and resolving data issues. To investigate anomalies to inconsistencies, ask questions or issue commands such as: “Are there any missing values for the image difficulty quality metric?” Or “Find records that are labeled ‘cat’ but don’t look like a typical cat.” Once again, Search Anything analyzes the data, detects discrepancies, and provides actionable insights to assist you in identifying and rectifying data problems efficiently. {{gray_callout_start}} 💡Read to find out how to find and fix label errors with Encord Active. {{gray_callout_end}} Cataloging Data for E-commerce Search Anything can also enhance the cataloging process for e-commerce platforms. By understanding product photos and descriptions, Search Anything enable users to search and categorize products efficiently, users can ask: . “Locate the green and sparkly shoes.” Search Anything interprets this query, matches the desired criteria with the product images and descriptions, and displays the relevant products, facilitating improved product discovery and customer experience. How to Use Search Anything Model with Encord? At Encord, we are building an end-to-end visual data engine for computer vision. Our latest release, Encord Active, empowers users to interact with visual data only using natural language. Let’s dive into a few use cases: Use Case 1: Data Exploration User Query: “red dress,” “denim jeans,” and “black shirts” Encord Active identifies the images in the dataset that most accurately corresponds to the query. Use Case 2: Data Curation User query: “Display the very bright images” Encord Active displays filtered results from the dataset based on the specified criterion. {{gray_callout_start}} Read to find out how to choose the right data for your computer vision project. {{gray_callout_end}} Use Case 3: Data Debugging User Query: “Find all the non-singular images?” Encord Active detects any duplicated images in the dataset, and displays images that are not unique within the dataset. Can I Use My Own Model? Yes, Encord Active allows you to leverage your models. Through fine-tuning or integrating custom embedding models, you can tailor the search capabilities to your specific needs, ensuring optimal performance and relevance. {{gray_callout_start}} 💡At Encord, we are actively researching how to fine-tune LLMs for the purpose of searching Encord Active projects efficiently. Get in touch if you would like to get involved. {{gray_callout_end}} {{try_encord}} Conclusion Natural Language Search is revolutionizing the way we interact with data, enabling intuitive and efficient exploration, curation, and debugging. By harnessing the power of NLP and computer vision models, our Search Anything Model allows you to pose queries, issue commands, and obtain actionable insights using human-like language. Whether you are an ML engineer, a data scientist, or an e-commerce professional, incorporating NLS into your workflow can significantly enhance productivity and unlock the full potential of your data.
June 20
If you feed an AI model with junk, it’s bound to return the favor. The quality of the data being consumed by an AI algorithm has a direct correlation with its success when it comes to generalizing to new instances; this is the reason data professionals spend 80% of their time during model development, ensuring the data is appropriately prepared, and is representative of the real world. Data labeling is an essential task in supervised learning, as it enables AI algorithms to create accurate input-to-output mappings and build a comprehensive understanding of their environment. Data labeling can consume up to 80% of data preparation time, and at least 25% of an entire ML project is spent labeling. Therefore, efficient data labeling strategies are critical for improving the speed and quality of machine learning model development. {{gray_callout_start}} 💡Read the blog to learn how to automate your data labeling process. {{gray_callout_end}} Manual data labeling can be a challenging and error-prone process, as it relies on human judgment and subjective interpretation. Labelers may have different levels of expertise, leading to consistency in the labeling process and reduced accuracy. Moreover, manual data labeling can be time-consuming and expensive, especially for large datasets. This can hinder the scalability and efficiency of AI model development. Integrating automated data labeling into your machine learning projects can be an effective strategy for mitigating the challenges of manual data labeling. By leveraging AI technology to perform data labeling tasks, businesses can reduce the risk of human error, increase the speed and efficiency of model development, and minimize costs associated with manual labeling. Additionally, automated data labeling can help improve the accuracy and consistency of labeled data, resulting in more reliable and robust AI models. Let's take a closer look at automated data labeling, including its workings, advantages, and how Encord can assist you in automating your data labeling process. {{try_encord_CTA_annotate_visual}} Using Annotation Tools for Automated Data Labeling Automated data labeling is using software tools and algorithms to automatically annotate or tag data with labels or tags that help identify and classify the data. This process is used in machine learning and data science to create training datasets for machine learning models. {{gray_callout_start}} “Automated data annotation is a way to harness the power of AI-assisted tools and software to accelerate and improve the quality of creating and applying labels to images and videos for computer vision models.” – Frederik H. The Full Guide to Automated Data Annotation. {{gray_callout_end}} Annotation tools can be used for automated data labeling by providing a user interface for creating and managing annotations or labels for a dataset. These tools can help to automate the process of labeling data by providing features such as: Auto-labeling: Annotation tools can use pre-built machine learning models or algorithms to generate labels for data automatically. Active learning: Annotation tools can use machine learning algorithms to suggest labels for data based on patterns and correlations in the existing labeled data. Human-in-the-loop: Annotation tools can provide a user interface for human annotators to review and correct the labels generated by the automation process. Quality control: Annotation tools can help to ensure the quality of the labels generated by the automation process by providing tools for validation and verification. Data management: Annotation tools can provide tools for managing and organizing large datasets, including tools for filtering, searching, and exporting data. Organizations can reduce the time and cost required to create high-quality training datasets for machine learning models by using annotation tools for automated data labeling. However, it is important to ensure that the tools used are appropriate for the specific task and that the labeled data is carefully validated and verified to ensure its quality. {{check_out_on_github_visual}} AI Annotation Tools {{gray_callout_start}}💡Check out our curated list of the 9 Best Image Annotation Tools for Computer Vision to discover what other options are on the market.{{gray_callout_end}} Encord Annotate Encord Annotate is an automated annotation platform that performs AI-assisted image annotation, video annotation, and dataset management; part of the Encord product, alongside Encord Active. The key features of Encord Annotate include: Support for all annotation types such as bounding boxes, polygons, polylines, image segmentation, and more. It incorporates auto-annotation tools such as Meta’s Segment Anything Model and other AI-assisted labeling techniques. It has integrated MLOps workflow for computer vision and machine learning teams Use-case-centric annotations — from native DICOM & NIfTI annotations for medical imaging to SAR-specific features for geospatial data. Easy collaboration, annotator management, and QA workflows — to track annotator performance and increase label quality. Robust security functionality — label audit trails, encryption, FDA, CE Compliance, and HIPAA compliance. Benefits of Automated Data Labeling with AI Annotation Tools The most straightforward way to label data is to implement it manually, where a human user is presented with raw unlabeled data and applies a set of rules to label it. However, this approach has certain drawbacks such as being time-consuming and costly and having a higher probability of natural human error. An alternative approach is to use AI annotation tools to automate the labeling process, which can help address the issues associated with manual labeling by: Increasing accuracy and efficiency: Speed is just as important as being accurate. Yes, an automatic AI annotation tool can process large amounts of images much faster than a human can, but what makes it so effective is its ability to remain accurate, which ensures labels are precise and reliable. Improving productivity and workflow: It’s normal for humans to make mistakes – especially when they are performing the same task for 8 or more hours straight. When you use an AI-assisted labeling tool, the workload is significantly reduced, which means annotating teams can put more focus on ensuring things are labeled correctly the first time around. Reduction in labeling costs and resources: Deciding to manually annotate data means paying someone or a group of people to carry out the task; this means each hour that goes by has a cost, which can quickly become extremely high. An AI-assisted labeling tool may take off some of that load by allowing a human annotation team can manually label a percentage of the data and then have an AI tool do the rest. How to Automate Data Labeling with Encord A step-by-step guide to automating data labeling with Encord: Micro models Micro-models are models that are designed to be overtrained for a specific task or piece of data, making them effective in automating one aspect of data annotation workflow. They are not meant to be good at solving general problems and are typically used for a specific purpose. {{gray_callout_start}} 💡Read the blog to find out more about micro-models. {{gray_callout_end}} The main difference between a traditional model and a micro-model is not in their architecture or parameters but in their application domain, the data science practices used to create them, and their ultimate end-use. Step 1: Step 2: Auto-segmentation Auto-segmentation is a technique that involves using algorithms or annotation tools to automatically segment an image or video into different regions or objects of interest. This technique is used in various industries, including medical imaging, object detection, and scene segmentation. For example, in medical imaging, auto-segmentation can be used to identify and segment different anatomical structures in images, such as tumors, organs, and blood vessels. This can help medical professionals to make more accurate diagnoses and treatment plans Auto-segmentation can potentially speed up the image analysis process and reduce the likelihood of human error. However, it is important to note that the accuracy of auto-segmentation algorithms depends on the input data quality and the segmentation task's complexity. In some cases, manual review and correction may still be necessary to ensure the accuracy of the results. {{gray_callout_start}} 💡Read the explainer blog on Segment Anything Model to understand how foundation models are used for auto-segmentation. {{gray_callout_end}} Interpolation Interpolation is typically used to fill in missing values or smooth the noise in a dataset. It encompasses the process of estimating the value of a function at points that lie between known data points. Several methods can be used for interpolation in ML such as linear interpolation, polynomial interpolation, and spline interpolation. The choice of interpolation method will depend on the data's characteristics and the project's goals. Step 1: Step 2: Object Tracking Object tracking plays a vital role in various applications like security and surveillance, autonomous vehicles, video analysis, and many more. It’s a crucial component of computer vision that enables machines to track and follow objects in motion Using object tracking, you will be able to predict the position and other relevant information of moving objects in a video or image sequence. Step 1: Step 2: {{gray_callout_start}} 💡Check out the Complete Guide to Object Tracking Tutorial to for more insight.{{gray_callout_end}}. Conclusion Supervised machine learning algorithms depend on labeled data to learn how to generalize to unseen instances. The quality of data provided to the model has a significant impact on its final performance, hence it’s vital the data is accurately labeled and representative of the data available in a real-world scenario; this means AI teams often spend a large portion of their time preparing and labeling their data before it reaches the model training phase. Manually labeling data is slow, tedious, expensive, and prone to human error. One way to mitigate this issue is with automated data labeling and annotation solutions. Such tools can serve as a cost-effective way to accurately speed up the process, which in turn improves the team’s productivity and workflow. Ready to accelerate the automation of your data annotation and labeling? Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams. AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today. Want to stay updated? Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning. Join our Discord channel to chat and connect. Automated Data Labeling FAQs What are the benefits of automated data labeling? Automated data labeling helps to increase the accuracy and efficiency of the labeling process in contrast to when it’s performed by humans. It also reduces labeling costs and resources as you are not required to pay labelers to perform the tasks. How is automated data labeling different than manual labeling? Manual data labeling is the process of using individual annotators to assign labels to raw data. Opposingly, automated labeling is the same thing but the responsibility is passed on to machines instead of humans to speed up the process and reduce costs. What is AI data labeling? AI data labeling refers to a technique that leverages machine learning to provide one or more meaningful labels to raw data (e.g., images, videos, etc.). This is done with the intent of offering a machine learning model with context to learn input-output mappings from the data and make inferences on new, unseen data.
May 19
4 min
Getting AI models through FDA approval takes time, effort, robust infrastructure, data security, medical expert oversight, and the right AI-based tools to manage data pipelines, quality assurance, and model training. In this article, we’ve reviewed the US Food & Drug Administration’s (FDA’s) latest thinking and guidelines around AI models (from new software, to devices, to broader healthcare applications). This step-by-step guide is aimed at ensuring you are equipped with the information you need to approach FDA clearance — we will cover the following key steps for getting your AI model through FDA scrutiny: Create or source FDA-compliant medical imaging or video-based datasets Annotate and label the data (high-quality data and labels are essential) Review Medical expert review of labels in medical image/video-based datasets A clear and robust FDA-level audit trail Quality control and validation studies Test your models on the data, figure out what data you need more of/less of to improve your models. State of FDA approval for AI algorithms The number of AI and ML algorithms being approved by the US Food & Drug Administration (FDA) has accelerated dramatically in recent years. As of January 2023, the FDA has approved over 520 AI and ML algorithms for medical use. Most of these are related to medical imaging and healthcare image and video analysis, and diagnoses, so in the majority of use cases, these are computer vision (CV) models. The FDA first approved the use of AI for medical purposes in 1995. Since then, only 50 other algorithms were approved over the next 18 years. And then, between 2019 and 2022, over 300 were approved, with a further 178 granted FDA approval in 2023. Given the accelerated development of AI, ML, CV, Foundation Models, and Visual Foundation Models (VFMs), the FDA is bracing itself for hundreds of new medical-related models and algorithms seeking approval in the next few years. See the complete list of FDA-cleared algorithms here. Source How many AI algorithms are FDA approved? Can the FDA handle all of these new approval submissions? Considering the number of AI projects seeking FDA approval, there are naturally concerns about capacity. Fortunately, just over two years ago, the FDA created its Digital Health Center of Excellence led by Bakul Patel. Patel’s since left the FDA. However, However, his processes have modernized the FDA approval processes for AI models, ensuring they’re equipped for hundreds of new applications. As a University of Michigan law professor specializing in life science innovation, Nicholson Price, said: “There have been questions about capacity constraints on FDA, whether they have the staff and relevant expertise. They had a plan to increase hiring in this space, and they have in fact hired a bunch more people in the digital health space.” {{gray_callout_start}} 💡 Around 75% of AI/ML models the FDA has approved so far are in radiology, with only 11% in cardiology. Out of 521 approved up until January 2023, that’s 392 in radiology AI. {{gray_callout_end}} One of the reasons for this is the vast amount of image-based data that data scientists and ML engineers can use when training models, mainly from imaging and electrocardiograms. Source Unfortunately, it’s difficult to assess the number of submitted applications and their outcomes. We know how many are approved. What’s unclear is the number that are rejected or need to be re-submitted. Here’s where FDA approval for AI gets interesting: “FDA-authorized devices likely are just a fraction of the AI- and machine-learning-enabled tools that exist in healthcare as most applications of automated learning tools don’t require regulatory review.” For example, predictive tools (such as artificial intelligence, machine learning, and computer vision models) that use medical records and images don’t require FDA approval. But . . . that might change under new guidance. Professor Price says, “My strong impression is that somewhere between the majority and vast majority of ML and AI systems being used in healthcare today have not seen FDA review.” So, for ML engineers, data science teams, and AI businesses working on AI models for the healthcare sector, the question you need to answer first is: Do we need FDA approval? How do you know if your AI healthcare model needs FDA approval? Whether you’re AI healthcare model or an AI model that has healthcare or medical imaging applications needs FDA approval is an important question. Providing approval isn’t needed, then it will save you hours of time and work. So, we’ve spent time investigating this, and here’s what we’ve found: Under the 21st Century Cures Act, most software and AI tools are exempt from FDA regulatory approval “as long as the healthcare provider can independently review the basis of the recommendations and doesn’t rely on it to make a diagnostic or treatment decision.” For regulatory purposes, AI tools and software fall into the FDA category known as Clinical Decision Support Software (CDS). {{gray_callout_start}} ➡️ Here are the criteria the FDA uses, and if your AI, CV, or ML model/software meets all four criteria then your software function may be a non-device CDS and, therefore won’t need FDA approval: {{gray_callout_end}} Your software function does NOT acquire, process, or analyze medical images, signals, or patterns. Your software function displays analyzes, or prints medical information normally communicated between health care professionals (HCPs). Your software function provides recommendations (information/options) to a HCP rather than provide a specific output or directive. Your software function provides the basis of the recommendations so that the HCP does not rely primarily on any recommendations to make a decision. If you aren’t clear whether your AI model falls within FDA regulatory requirements, it’s worth checking the Digital Health Policy Navigator. Source In most cases, AI models themselves don’t need FDA approval. However, if your company is working with a healthcare, medical imaging, medical device, or any other organization that is going through FDA approval, then any algorithmic models, datasets, and labels being used to train a model need to be compliant with FDA guidelines. Let’s dive into how you can do that . . . How to get your AI model through FDA approval: Step-by-step guide Here are the steps you need to take when working on an AI, ML, or CV model for healthcare organizations, including MedTech companies, that are using a model for devices or new forms of diagnosing patients or treatments that require FDA approval: Create or source FDA-compliant medical imaging or video-based datasets Annotate and label the data (high-quality data and labels are essential) Review Medical expert review of labels in medical image/video-based datasets A clear and robust FDA-level audit trail Quality control and validation studies Test your models on the data, figure out what data you need more of/less of to improve your models Here’s how to ensure your AI model will meet FDA approval: 1. Create or source FDA-compliant medical imaging or video-based datasets Every AI model starts with the data. When working with any company or organization that’s going through the FDA approval process, it’s crucial that the image or video datasets are FDA-compliant. In practice, this means sourcing (whether open-source or proprietary) high-quality datasets that don’t contain identifiable patient tags and metadata. If files contain specific patient identifiers, then it’s vital annotators and providers cleanse it of anything that could impact the project's development and regulatory approval. Other factors to consider include: Do we have enough data to train a model? Quantity is as important as quality for model training, especially if the project is focused on medical edge cases, and outliers, and addressing any ethnic or gender-based bias. How are we storing and transferring this data? Security is crucial, especially if you’re outsourcing the annotation process. Can we outsource annotation work? For data security purposes, you need to ensure that transfers, annotation, and labeling is FDA-compliant and adheres to other regulations, such as HIPAA and other relevant data protection laws (e.g., European CE regulations for EU-based projects). When working with organizations that are obtaining regulatory approval, the company will have to run a clinical study, and this will require using untouched data that has not been seen by the model or anyone working on it. Before annotation work can start, you need to split and partition the dataset, ideally keeping it in a separate physical location to make it easier to demonstrate compliance during the regulatory approval process. Open-source CT scan image dataset on Kaggle Once the datasets are ready to use, it’s time to start the annotation and labeling work. 2. Annotate and label the data (high-quality data and labels are essential) Medical image annotation for machine learning models requires accuracy, efficiency, high quality, and security. As part of this process, it could be worth having medical teams pre-populate labels for greater accuracy before a team of annotators gets started. Highly skilled medical professionals don’t have much time to spare, so getting medical input at the right stages in the project, such as pre-populating labels and during the quality assurance process, is crucial. Medical imaging annotation projects run smoother when annotators have access to the right tools. For example, you’ll probably need an annotation tool that can support native medical imaging formats, such as DICOM and NIfTI (recent DICOM updates from Encord). DICOM annotation Ensure the datasets and labels being used for model development include a wide statistical range quality of images when searching for the ground truth of medical datasets. Once enough images or videos have been labeled (whether you’re using a self-supervised, semi-supervised, automated, or human-in-the-loop approach), it’s time for a medical expert review. Especially if you’re working with a company that’s going to seek FDA approval for a device or other medical application in which this model will be used. {{gray_callout_start}} 💡 For more information on annotation and labeling datasets, check out our articles: What is Data Labeling: The Full Guide 5 Strategies To Build Successful Data Labeling Operations The Full Guide to Automated Data Annotation 7 Ways to Improve Your Medical Imaging Datasets for Your ML Model {{gray_callout_end}} 3. Medical expert review of labels in medical image/video-based datasets Now the first batch of images or videos has been labeled; you need to loop medical experts back into the process. You need to consider that medical professionals and the FDA take different approaches to determining consensus. Having a variety of approaches built into the platform is especially useful for regulatory approval because different localities will want companies to use different methods to determine consensus. Make sure this is built into the process, and ensure the medical experts you’re working with have approved the labels annotators have applied before releasing the next batch of data for annotation. 4. A clear and robust FDA-level audit trail Regulatory processes for releasing a model into a clinical setting expect data about intra-rater reliability as well as inter-rater reliability, so it’s important to have this test built into the process and budget from the start. Alongside this, a robust audit trail for every label created and applied, the ontological structure, and a record of who accessed the data is crucial. When seeking FDA approval, you can’t leave anything to chance. That’s why medical organizations and companies creating solutions for that sector are turning to Encord for the tools they need for healthcare imaging annotation, labeling, and active learning. As one AI customer explained about why they’ve signed-up to Encord: “We went through the process of trying out each platform– uploading a test case and labeling a particular pathology,” says Dr. Ryan Mason, a neuroradiologist overseeing annotations at RapidAI. MRI Mismatch analysis using RapidAI 5. Quality control and validation studies Next comes the rigors of quality control and validation studies. In other words, making sure that the labels that have been applied meet the standards the project needs, especially with FDA approval in mind. Loop in medical experts as needed while being mindful of the project timeline, and use this data to train the model. Start accelerating the training cycles using iterative learning, or human-in-the-loop strategies, whichever method is the most effective to achieve the required results. 6. Test your models on the data, figure out what data you need more of/less of to improve your models Ensure an active data pipeline is established with robust quality assurance built in. And then get the model production-ready once it can accurately analyze and detect the relevant objects in the images in a real-world medical setting. At this stage, you can accelerate the training and testing cycles. Once the model is production-ready, it can be deployed in the medical device or other healthcare application it’s being built for, and then the organization you’re working with can submit it along with their solution for FDA approval. Bonus: Obtaining and maintaining FDA approval with open-source or in-house tools Although there are numerous open-source tools on the market that support medical image datasets, including 3DSlicer, ITK-Snap, MITK Workbench, RIL-Contour, Sefexa, and several others, organizations seeking FDA approval should be cautious about using them. And the same goes for using in-house tools. There are three main arguments against using in-house or open-source software for annotation and labeling when going through the FDA approval process: 1. Unable to effectively scale your annotation activity 2. Weak data security makes FDA certification harder 3. You can’t effectively monitor your annotators or establish the kind of data audit trails that the FDA will need to see. For more information, here’s why open-source tools could negatively impact medical data annotation projects. FDA AI approval: Conclusion & Key Takeaways Going through the FDA approval process, as several of our clients have⏤including Viz AI and RapidAI⏤is time-consuming and requires higher levels of data security, quality assurance, and traceability of how medical datasets move through the annotation and model training pipeline. When building and training a model, you need to take the following steps: Create or source FDA-compliant medical imaging or video-based datasets; Annotate and label the data (high-quality data and labels are essential); Review Medical expert review of labels in medical image/video-based datasets; A clear and robust FDA-level audit trail; Quality control and validation studies; Test your models on the data, and figure out what data you need more of/less of to improve your models. Encord has developed our medical imaging dataset annotation software in close collaboration with medical professionals and healthcare data scientists, giving you a powerful automated image annotation suite, fully auditable data, and powerful labeling protocols. AI FDA Regulatory Approval FAQs For more information, here are a couple of FAQs on FDA approval for AI models and software or devices that use artificial intelligence. What’s the FDA's current thinking on approving AI? For product owners, AI software developers, and anyone wondering whether they need FDA approval, it’s also worth referring to the following published guideline documents and reports: Policy for Device Software Functions and Mobile Medical Applications General Wellness: Policy for Low Risk Devices Changes to Existing Medical Software Policies Resulting from Section 3060 of the 21st Century Cures Act Medical Device Data Systems, Medical Image Storage Devices, and Medical Image Communications Devices Clinical Decision Support Software What’s the FDA’s role in regulating AI algorithms? The FDA does play a role in regulating AI algorithms. However, that’s only if your algorithm requires regulatory approval. In the majority of cases, providing it falls under the category of being a non-device CDS and is within the framework of the 21st Century Cures Act, then FDA approval isn’t needed. Make sure to check the FDA’s Digital Health Policy Navigator or contact them for clarification: Division of Industry and Consumer Education (DICE) at 1-800-638-2041 or DICE@fda.hhs.gov. Contact The Digital Health Center of Excellence at DigitalHealth@fda.hhs.gov. Ready to improve the performance of your computer vision models for medical imaging? Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams, including dozens of healthcare organizations and AI companies in the medical sector. AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today. Want to stay updated? Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning. Join our Discord Channel to chat and connect.
May 16
10 min
Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.