Population Stability Index (PSI)
Encord Computer Vision Glossary
The Population Stability Index (PSI) quantifies the shifts in the distribution of a variable between two datasets, indicating whether the underlying characteristics of the data have changed over time or across different segments. This is crucial in predictive modeling, particularly in scenarios where models need to maintain consistent performance across different populations or time frames.
The formula for calculating the population stability index is as follows:
Interpretation
The PSI measures the degree of change in the distribution of a variable between two datasets. This provides a comprehensive view of the shift in the data's characteristics. A lower PSI indicates that the distribution remains stable, while a higher PSI suggests that the distribution has changed significantly.
Use Cases
PSI has applications in various fields:
- Credit Risk Assessment: When evaluating credit risk models, PSI can assess if the distribution of variables (like income, age, etc.) has changed between the training and validation datasets, indicating potential model degradation.
- Marketing and Customer Analysis: In customer segmentation, PSI can identify shifts in customer demographics over time, which could affect targeted marketing strategies.
- Operational Monitoring: In operational analytics, PSI can track changes in key performance indicators (KPIs) across different periods to ensure consistent performance.
Limitations
While PSI is valuable, it has certain limitations:
- Dependence on Bin Size: PSI is sensitive to the number of bins used to divide the data. Smaller bins might capture minor variations, leading to higher PSI values.
- Variable Selection: PSI calculations should be restricted to relevant variables. Factors like seasonality or temporary shifts can distort the PSI if not taken into account.
Discuss this blog on Slack
Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI
Join the community