Model management is the component of MLOps that ensures a machine learning model is set up correctly and the processes around it behave as expected through all the steps of its lifecycle. In doing this, model management also ensures best practices are set and met for both data scientists and ML engineers.
Model management outlines the strategies and workflows necessary to guarantee that all the stages of the machine learning model lifecycle act and interact consistently. This is particularly relevant, as the lifecycle is intrinsically experimental and iterative, with each of its stages requiring specific processes, actors, dependencies, and guarantees.
Within the ML lifecycle, we can outline two main phases for the model: experimentation, where the exploration and training of the ML model occur, and deployment, where the ML model is served.
During experimentation, model management allows us to track training parameters, metrics, collaborators, owners, and artifacts (both inputs, i.e., data and code, and outputs, i.e., trained models and evaluation reports). In this phase, model management enables data science teams to collaborate and iterate on exploratory data analysis (EDA) and modeling with guaranteed repeatability, scalability, and traceability.
During deployment, model management defines how and which models should be packaged, deployed, monitored, and retrained. This process is automated following well-defined CI/CD configurations and steps that provide guaranteed automation, tracking, performance, and compliance. In this phase, model management enables ML engineering teams to focus on improving the machine learning pipeline operationalization, e.g., expanding testing coverage and infrastructure support rather than manually deploying, debugging, testing, and monitoring models.
Even though experimentation and deployment are two separate phases owned by different teams, it is fundamental that they communicate seamlessly. Model management ensures collaboration and a smooth handover between the two by supporting the concept of a centralized model registry, where lineage and version control are tracked throughout the ML model lifecycle. Tags play a fundamental role here, allowing us to organize models by their logical and functional behavior and requirements.
Without model management, data science and ML engineering teams need to rely on ad-hoc practices for experimentation and deployment that are manual, error-prone, not reproducible, and, ultimately, inefficient. In this scenario, iterating on models for performance improvements is a risky idea since on one side, reproducing experimentation is heavily reliant on the knowledge and memory of individuals and, on the other side, updates to the modeling are likely to introduce unknown and unsupported behaviors for deployment.
Through the multiple stages of automation introduced by model management, teams can instead rely on processes and practices designed to ensure best practices are set and met, including reproducibility, versioning, compliance, tracking, scalability, and collaboration. As a result, teams can make business value their main focus.
A not-so-trivial effect of having well-designed practices and policies for the model lifecycle is that responsibilities are also well-defined and clear. Data scientists own the experimentation phase, while machine learning engineers own the deployment and infrastructure.
Any communication, work, and handovers within and between the two teams thus follow a crafted path that allows for true agile development. This means that teams can build simple models fast to first prove their operability and business impact and then iterate on them quickly to maximize performance.
Tools for model management cover five main management areas. We present these below in an order where each subsequent area assumes teams have already implemented the previous area(s), e.g., experiment tracking expects logging and artifact versioning to be in place, thus leading to an increasingly mature MLOps.
This refers to the process of saving relevant metadata for the ML pipeline. This metadata includes:
With logging, teams can begin to track model performance with respect to specific training regimes.
The ML frameworks used for model training and hypertuning typically output this metadata out-of-the-box, with the exception of model performance results, which are often extended with custom visualizations and reports.
Relevant open-source tools for this management area are ML Metadata (MLMD) and MLRun.
Artifacts are the input or output of an ML pipeline and include:
Version control with full lineage allows teams to keep track of which artifacts belong to which training pipeline, thus tracing the changes of models in relationship to data sets and vice-versa. With artifact versioning, teams can reliably compare and recover inputs and outputs of experiments. Relevant open-source tools for this management area are DVC, MLMD, and MLRun.
This refers to storing and versioning the codeset used throughout the ML lifecycle, with a specific focus on the notebooks used during model training and hypertuning.
With experiment tracking, teams can reliably share, compare, and recover the codebase of each experiment. Together with logging and artifact versioning, this allows for the full collaboration and reproducibility of ML pipelines during experimentation.
Relevant open-source tools for this management area are Kubeflow Pipelines, Airflow, and MLRun.
A model registry is a centralized tracking system for models throughout their lifecycle. For each model, it stores information such as lineage, versioning, metadata, owners, configuration, tags, and producers (i.e., the function or pipeline that produced the model). Following this information, technical and non-technical teams can seamlessly understand at which stage the model is (training, staging, or deployment ) and act on it accordingly.
Relevant open-source tools for this management area are blob storage services such as MinIO or OpenIO, databases such as PostgreSQL or MongoDB, and MLRun.
Model monitoring allows teams to track the online performance of models during and after deployment. Monitoring involves setting up logs, alerts, summaries, dashboards, and triggers on events.
With model monitoring, teams can always rely on the model to satisfy SLAs and for automated processes to be activated if any error or noteworthy event occurs.
Relevant open-source tools for this management area are Prometheus, Seldon Core, and MLRun.
Different model management tools can support different stages of MLOps maturity. Let’s consider experiment tracking.
For a beginner/intermediate stage, a management tool should provide the capability to log the configuration (inputs) and outcome (trained model, evaluation metrics and reports, etc.) of ML experiments. For an intermediate stage, a management tool should do the above automatically. For an advanced stage, a management tool should finally offer full collaboration and the ability to compare experiments.
As a company improves its MLOps maturity, the need to connect or migrate to new tools to support extra requirements is bound to drastically increase the complexity and time required for ML adoption.
Hence, we recommend that companies consider from the very beginning a management tool such as MLRun that provides all five capabilities and a clear MLOps roadmap.