Pool based Sampling

Encord Computer Vision Glossary

Pool-based sampling is a popular active learning method that selects informative examples for labeling. A pool of unlabeled data is created, and the model chooses the most informative examples for human annotation. These labeled examples are used to retrain the model, and the process is repeated. Pool-based sampling can be categorized into uncertainty sampling, query-by-committee, and density-weighted sampling.

Advantages

  • Reduced labeling cost: Pool-based sampling reduces the overall labeling cost as compared to traditional supervised learning methods since it only requires labeling the most informative sample. This can lead to significant cost savings, especially if you are dealing with large datasets
    Scale your annotation workflows and power your model performance with data-driven insights
    medical banner
  • Efficient use of expert time: Since the expert is only required to label the most informative samples, this strategy allows for efficient use of expert time, saving time and resources.
  • Improves model accuracy: The selected samples are more likely to be informative and representative of the data, so pool-based sampling can improve the accuracy of the model.

Disadvantages

  • Selection of the pool of unlabeled data: The quality of the selected data affects the performance of the model, so careful selection of the pool of unlabeled data is essential. This can be challenging, especially for large and complex datasets.
  • Quality of the selection method: The quality of the selection method used to choose the most informative sample can affect the model’s accuracy. If the selection method is not appropriate for the data or is poorly designed, the model’s accuracy may suffer. 
  • Not suitable for all data types: Pool-based sampling may not be suitable for all types of data, such as unstructured data or noisy data. In these cases, other active learning approaches may be more appropriate.
cta banner

Discuss this blog on Slack

Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI

Join the community
cta banner

Automate 97% of your annotation tasks with 99% accuracy