In a nutshell, model monitoring allows a data scientist or DevOps engineer to keep track of a machine learning model after it has gone into production. This includes keeping track of resources, latency, invocations, and data/concept drift. Using this set of tools, we can define several types of use cases:
- Keep track of how often and how quickly your model is being used. If a certain time of day is much busier than others, this will become immediately apparent and allow your team to plan as such. This is especially useful when used in conjunction with viewing the consumed cluster resources. This allows for overall capacity planning and may showcase some opportunities to configure auto-scaling behavior.
- Calculate data and concept drift. This will give insights on the incoming data as well as the outgoing predictions from your model – both of which can drift and can be indicative of problems.
- Drift-based event triggering. Based on a drift event, it is possible to start a re-training pipeline that will train and deploy an updated version of the model. This is known as continuous training. Additionally, it is also possible to start a statistical analysis of your training data vs the received live data to see where the differences lie.
These abilities are non-exhaustive but rather some of the many capabilities provided by proper model monitoring.