When deploying LLMs in production, monitoring prevents risks such as malfunction, bias, toxic language generation, or hallucinations. This allows for tracking application performance and helps identify issues before they impact users. In addition, logging interactions allows understanding user queries and the responses, which can be used for fine-tuning the model.
There are multiple levels of monitoring:
- Functional Monitoring - Ensures that the LLM is operating correctly and identifies what models and versions are currently in use within applications. This helps manage risks if a model version becomes problematic or poses a security threat.
- Governance and Compliance - Centralized governance, according to the "AI Factory" approach. This helps organizations know which models are being used, which versions, when updates or patches are needed, or when security risks require patching and action.
- Resource Monitoring - Involves tracking the consumption of resources like CPU and memory by different applications or departments. This helps in identifying inefficient use of resources or applications that might be consuming too much without adding sufficient value.