Drift monitoring is the process of monitoring ML models for drift. As part of the MLOps process, drift monitoring ensures model performance and relevance.
As the world changes, input data for ML models changes as well. As a result, models that were previously accurate become unstable and produce unreliable predictions. This is known as “drift”.
Model drift occurs when the data changes in relation to the baseline data set (for example, the training set) and produces inaccurate results. In other words, production data drifts and creates data integrity challenges.
In other cases, drift can occur due to data integrity issues. For example, when data pipelines malfunction and produce erroneous data.
There are a few different types of drift. The main ones are:
Drift monitoring is the process of continuously tracking ML model’s performance in production. This ensures that new real-time data or data integrity did not degrade model quality. Drift monitoring includes ongoing analysis of the data, with techniques like sequential analysis, monitoring distribution between different time windows, adding timestamps to the decision tree based classifier, and more.
When drift is detected, a drift monitoring system will trigger alerts and update the existing models. This process takes place as part of the MLOps pipeline.
Monitoring drift helps us detect drift to ensure our models will continue to perform and provide accurate predictions. By alerting about drift and retraining models to ensure their reliability, data scientists and engineers can ensure the models remain accurate, fair and unbiased. This is fundamental for the relevance of ML and for providing business value.
It is recommended to continuously monitor models to detect drift and ensure model stability. Monitoring can either be manual or automated. Automated drift monitoring is more accurate and saves data scientists time. If your use cases include streaming data, the monitoring system will also need to support automated real-time detection.
Drift monitoring takes place through a drift-aware system. Such a system will monitor data and determine how to manage new data and models. A drift-aware systems consists of four parts:
Read more about drift-aware systems here.
Open source MLRun supports deployment and orchestration of production-ready AI applications. MLRun monitors models in production, and identifies and mitigates drift on the fly. Model drift detection is based on feature drift via the integrated feature store, and auto-triggers retraining. To see it in action, check out the MLRun Quickstart.