Data Drift
Encord Computer Vision Glossary
Data drift
Data drift is a phenomenon that occurs in machine learning when the characteristics of the data used to train a model change over time. This can lead to a decline in the performance of the model as it is unable to accurately predict outcomes based on the new data.\n\nThere are several types of data drift, including:
Target drift: This occurs when the output or target variable of the model changes over time. For example, if a machine learning model is trained to predict customer churn, but the definition of churn changes, the model may become less accurate.
What is data drift in machine learning?
Covariate shift: This occurs when the distribution of the input or predictor variables changes over time. For example, if a model is trained on data from a certain geographic region, but the data used to make predictions comes from a different region, the model may become less accurate.
Prior probability shift: This occurs when the probability of certain outcomes changes over time. For example, if a model is trained on data from a certain time period when the probability of a certain outcome was high, but the probability changes in the future, the model may become less accurate.
In machine learning, data drift can have major repercussions since it might result in inaccurate predictions and judgements. It's crucial to frequently update and retrain models on fresh data in order to reduce the effects of data drift.
Additionally, it's critical to keep a close eye on the performance of models and spot any data changes that might be contributing to drift. Additionally, data pretreatment methods like data standardization and normalization can lessen the effects of data drift.