Back to Case Studies
Case Studies

Reducing Model False Positive Rate from 6% to 1% with Vida ID

November 22, 2022
|
8 mins
Detail Page Image
title

Problems

Vida needed a platform to manage a 20-person labeling team and annotate tens of thousands of images while ensuring data privacy and access control for sensitive Personally Identifiable Information (PII).

title

Key Results

With Encord, Vida could manage its labeling team, quickly annotate 70,000 images monthly, and reduce the false acceptance rate from 6% to 1%. Encord's platform provided strong access control, allowing Vida to keep customer data on its servers via Amazon S3 integration. The team could easily iterate on projects with feedback incorporated into Encord's user interface, facilitating large-scale image projects without compromising quality or data privacy.

Vida, a full-service verified digital identity platform, serves customers throughout Southeast Asia. While facial recognition is a mature technology, most open-source facial recognition datasets aren’t reflective of the region’s populations. Models trained on these datasets perform poorly, so Vida needs to build and manage new own datasets. Using Encord’s platform, Vida can oversee a large labeling team and annotate tens of thousands of images quickly.

Customer: Meet Vida

Vida uses optical character recognition and computer vision technology to provide a full-service verified digital identity platform. Digital verification empowers people to participate in the economy. For instance, financial institutions require identity verification to reduce fraud and ensure that assets arrive in the correct accounts, and ride-hailing services require drivers to verify their identity. 

Operating mainly in Indonesia, Vida’s services enable banks, fintechs, online trading platforms, ride-hailing companies, and other companies to verify the identity of users online. 

Indonesia is the biggest archipelago in the world, and traveling long distances can be challenging. With digital verification, customers no longer have to spend time waiting or traveling to get their identities verified. Vida users take a photo of themselves and a photo of their identification document. Vida’s technology then confirms the authenticity of the document, and compares the document to the photo and an authoritative source to verify the user's identity.

Throughout Indonesia, a digital signature must be accompanied by identity verification. Being able to sign documents and open bank accounts online is incredibly beneficial, especially for micro-entrepreneurs and SMEs in rural areas. Vida’s platform reduces barriers for these populations when accessing financial products like loans and savings accounts.

Problem: Large Datasets, Large Labeling Teams 

Vida trains its computer vision models to predict the liveness of an image. The models learn to determine whether the image contains a physically present person or whether the image contains a fake representation of a person, such as a pre-taken photo or a 2D mask. 

Although facial recognition is a mature technology, most of the open-source facial verification datasets contain faces from the Western Hemisphere or East Asia. When models train on these datasets, they don’t perform well on Southeast Asian demographics. Indonesia is also a majority Muslim country, so many women wear a hijab, an attribute rarely encountered by models that train on these open-source datasets. 

To improve model performance, Vida began collecting and annotating new datasets– ones reflective of Southeast Asian populations. The company needed a platform that could help them label and manage the tens of thousands of new images collected.

Vida’s team tried using some open source tools, but none of them allowed for managing a labeling team. Furthermore, facial verification data contains sensitive Personally Identifiable Information (PII), and Vida struggled to find a tool that gave them strong access control and the ability to keep customer data on their own servers.

Solution: Flexible Platform, Iterative Process

With Encord’s platform, Vida could easily set up a system for managing their 20-person labeling team. They have key managers who oversee the other annotators as well as reviewers. When a new annotator comes on board, Vida uses Encord’s tools to evaluate the new annotations and ensure that all labels are high quality.

Reviewing annotators and label quality in Encord

Vida’s work requires managing a lot of images – about 60,000 in a project. At first, Encord’s interface was showing all 60,000 at once, which created challenges around speed. However, after Vida gave Encord’s team feedback, they quickly changed the UI so that Vida could scale up the amount of images in each project.

“I’ve been very impressed with how Encord iterates on the SDK, listens to feedback, and constantly improves the product,” says Jeffrey Siaw, VP of Data Science at Vida.

With Encord, Vida can keep the data in their own Amazon S3 buckets, alleviating data privacy concerns about access and storage. Rather than require that data be stored on its own servers, Encord’s platform facilitates the use of a signed URL allowing it to access and retrieve the data from a customer’s preferred storage facility without storing data locally. 

Results: Increased Labeling Speed, Decreased False Acceptance Rate 

In the first month of using Encord, Vida’s team labeled 70,000 images at a rate much faster than they expected. 

When trained on the old datasets, Vida’s previous models had a false acceptance rate– they predicted that an image was of a physically present person when it was not– of six percent. 

False acceptances can have serious implications for Vida’s customers. For banks, a false verification could result in the opening of a fraudulent account. In ride hailing companies, it can increase the chance of robbery because a driver with a criminal record is onboarded using a false identity.

With Encord, Vida improved the quality of their datasets, and the new models had a false acceptance rate of only one percent.

“Using Encord’s platform, we were able to train our new models on much better datasets with much higher quality labels, reducing our false acceptance rate to only one percent” says Jeffrey Siaw, VP of Data Science.

As Vida continues to grow, the data science team will begin taking a more granular approach in their data management and labeling. They’ll try labeling faces differently and label attributes such as religious headdresses to better track how their models perform across more specific demographic features. 

Using Encord, they can label these datasets at speed while managing multiple projects with different types of labels and ontologies, all in one platform.

Ready to automate and improve the quality of your data labeling? 

Sign-up for an Encord Free Trial: The Active Learning Platform for Computer Vision, used by the world’s leading computer vision teams. 

AI-assisted labeling, model training & diagnostics, find & fix dataset errors and biases, all in one collaborative active learning platform, to get to production AI faster. Try Encord for Free Today

Want to stay updated?

Follow us on Twitter and LinkedIn for more content on computer vision, training data, and active learning.

Join our Discord channel to chat and connect.

Frequently asked questions
  • Yes. In addition to being able to train models & run inference using our platform, you can either import model predictions via our APIs & Python SDK, integrate your model in the Encord annotation interface if it is deployed via API, or upload your own model weights.

  • At Encord, we take our security commitments very seriously. When working with us and using our services, you can ensure your and your customer's data is safe and secure. You always own labels, data & models, and Encord never shares any of your data with any third party. Encord is hosted securely on the Google Cloud Platform (GCP). Encord native integrations with private cloud buckets, ensuring that data never has to leave your own storage facility.

    Any data passing through the Encord platform is encrypted both in-transit using TLS and at rest.

    Encord is HIPAA&GDPR compliant, and maintains SOC2 Type II certification. Learn more about data security at Encord here.

  • Yes. If you believe you’ve discovered a bug in Encord’s security, please get in touch at security@encord.com. Our security team promptly investigates all reported issues. Learn more about data security at Encord here.

  • Yes - we offer managed on-demand premium labeling-as-a-service designed to meet your specific business objectives and offer our expert support to help you meet your goals. Our active learning platform and suite of tools are designed to automate the annotation process and maximise the ROI of each human input. The purpose of our software is to help you label less data.

  • The best way to spend less on labeling is using purpose-built annotation software, automation features, and active learning techniques. Encord's platform provides several automation techniques, including model-assisted labeling & auto-segmentation. High-complexity use cases have seen 60-80% reduction in labeling costs.

  • Encord offers three different support plans: standard, premium, and enterprise support. Note that custom service agreements and uptime SLAs require an enterprise support plan. Learn more about our support plans here.

Explore our products