To maintain the accuracy of an ML model in production, and detect drops in performance, it can sometimes be useful to create custom metrics that are quite specific to the product.
Coming up with the right mix of metrics to monitor can be challenging. To get started, consider these two questions:
1. What are the expectations of the end user?
Think of the metric as a basic user story:
For a healthcare mobility use-case, a user story might be something like: “As a hospital worker who needs to triage patient care, I would like the most time-critical patient cases to be easily accessible, and therefore placed high on my screen.”
What metrics have an impact on your end-user’s perspective?
2. What is a bad user experience for your use case?
Instead of looking at ideal or typical experiences, consider some edge cases. What happens when your service delivers a bad user experience? This can be instances where the model delivers a fallback response, or a low certainty response, or even an empty response. Your model monitoring should be able to catch these instances.
Bugs in any number of code locations can and do happen regularly at every enterprise. For ML models serving critical functions, real-time metrics will ensure that any drop in performance can be addressed ASAP.