Class Imbalance

Encord Computer Vision Glossary

When the number of observations in one class—also referred to as the majority class—is much higher than the number of observations in the other class, this is referred to as a class imbalance (also known as the minority class). This can happen in a variety of industries, including machine learning, where it can significantly affect the effectiveness of predictive models.

The possibility of a bias in the model in favor of the majority class is one of the key problems with class imbalance. This is because, during training, the model will be exposed to and may become more familiar with the majority class. As a result, the model may do poorly in this class because it is less likely to correctly predict the minority class.

Scale your annotation workflows and power your model performance with data-driven insights
medical banner

How do you fix the class imbalance in computer vision datasets?

Imbalanced training datasets in computer vision can be addressed using a variety of strategies. One strategy is to generate synthetic samples of the minority class or samples with replacements from the minority class in order to oversample the minority class. Another strategy is to select a subset of the majority class at random in order to under-sample the majority class data.

Another approach is to use weighted loss functions, where the loss for each example is scaled by a weight that is inversely proportional to the class frequency. This can help the model to learn more from the minority class, as it will be penalized more heavily for misclassifying examples from the minority class.

Another approach is to use algorithms that are specifically designed to handle class imbalance, such as cost-sensitive learning algorithms. These algorithms assign different costs to misclassifying examples from different classes, based on the relative importance of each class.

In conclusion, a class imbalance can be a big problem in machine learning since it might cause the model to be biased in favor of the dominant class. Several strategies, including oversampling, under-sampling, weighted loss functions, and cost-sensitive learning algorithms, can be utilized to overcome this problem.

Read More

cta banner

Discuss this blog on Slack

Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI

Join the community
cta banner

Automate 97% of your annotation tasks with 99% accuracy