Implementing CI/CD (Continuous Integration/Continuous Delivery) in DevOps is relatively straightforward: You code, build, test, and release to production. Adopting DevOps practices in the machine learning development process (MLOps) can ensure that you reliably build and operationalize ML systems at scale.
However, applying CI/CD practices into the ML lifecycle as part of MLOps presents several unique challenges. This is due to the additional aspects of ML development that include data, model parameter, and configuration versioning, all of which contributes to additional complexities in operationalizing ML systems.
In this article, we review the basics of a CI/CD pipeline and explain what implementing a CI/CD practice for ML entails.
A CI/CD pipeline is an automated workflow that facilitates the software delivery process from source to production. It usually consists of the following stages:
There are some advantages of having a CI/CD pipeline. Good CI/CD pipelines enable developers to rapidly implement code changes and automatically build, test, and deploy new iterations of the software to production. This iterative approach of deploying incrementally and more frequently lowers the risk in each deployment, as you can identify issues in production arising from these small code changes faster than with larger and less frequent deployments.
Another key benefit of a good CI/CD pipeline is that it accurately automates the software delivery process, eliminating human inaccuracies that can result from repetitive manual testing and deployment.
Machine learning (ML) models often degrade in accuracy when they are deployed in real-world scenarios because they fail to adapt to changes in the data. To maintain the performance of ML models in production, you need to actively monitor your model’s performance, retrain it with more recent data, and continuously experiment with different approaches within the ML development lifecycle.
Every step of this iterative process is often manual and incurs significant overhead since ML engineers often have to re-run the entire model training pipeline and productionize the new models to adapt to code and data changes.
While an ML system is a software system, CI/CD for ML presents distinct challenges from other software systems.
First off, since ML is experimental in nature, the development process involves running ML experiments to determine the modeling techniques and parameter configurations that work best for the given problem. The challenge here is tracking and maintaining the reproducibility of these experiments to ensure you can reuse the code and replicate the model’s performance on the same dataset.
Testing an ML system also presents more areas of operational complexity compared to other software systems because it involves data and models in addition to code. Besides typical unit and integration tests, you need to test and validate the data and models to ensure that the ML model performs sufficiently well on a holdout test set.
Lastly, deploying an ML system is not just about deploying an ML model that has been trained offline as a prediction service in production; it also requires the deployment of a multi-step pipeline that automatically retrains and deploys another service (ML model prediction service) into production. This pipeline requires that you automate steps to train and validate new models before deployment, adding complexity to the continuous delivery process.
As mentioned above, implementing a CI/CD practice for ML pipelines entails automating the build, testing, and deployment of ML systems that continuously train and deploy ML models for prediction.
Figure 1: Stages of a CI/CD pipeline for machine learning (Source: Google Cloud)
A CI/CD workflow for ML pipelines can be described with the following two concepts:
During the pipeline continuous integration (CI) process, the ML pipeline and its components are built, tested, and packaged for delivery when changes are made on the source code repository (usually Git-based). The pipeline CI process consists of three stages: development, build, and testing.
During the development process, you iteratively experiment with new ML algorithms and modeling approaches where the experiment steps are orchestrated and tracked. You can then select an appropriate model and push the source code of the ML pipeline to the source code repository.
When changes are detected in the source code repository, it triggers a Git-based CI/CD workflow that starts the automated CI/CD pipeline process. During the build stage, the ML pipeline and its components are built with its dependencies in the form of packages, container images, and executables.
After the ML pipeline and its components are successfully built with dependencies, you then proceed to the testing stage. In the pipeline CI process, there are three components of automated testing for ML systems:
Unit tests can include tests for feature engineering logic and methods implemented in the model. The purpose of these for ML systems is to capture bugs in feature-creation code and detect possible errors in your model specifications.
Besides unit tests, you also need data and model tests in the pipeline CI process. Data tests include validating the data to check its schema and statistical properties, as well as feature importance tests to measure data dependencies. Model tests include model training convergence to check that the loss of the ML model decreases over iterations and model performance validation to ensure the ML model is not overfitting or underfitting the training data.
Lastly, integration tests in the pipeline CI process include testing each component in the ML pipeline to check that they produce the expected output, as well as integration testing between pipeline components to validate that each stage of the ML pipeline is successfully completed and the resulting ML model performs as expected.
During the Pipeline continuous delivery (CD) process, the CI/CD pipeline continuously deploys new ML pipeline implementations that in turn perform continuous training (CT) of ML models for prediction.
During CT of the ML model in the pipeline CD process, the deployed ML pipeline is automatically triggered for model retraining in production based on triggers from the live ML pipeline environment. Model retraining could be triggered either per a schedule, due to model performance degradation, or due to significant changes in data distributions of the features used for prediction (also known as concept drift). The trained ML model is then automatically deployed as a model prediction service as part of the model CD process.
Continuous monitoring of your model’s performance based on live data in production enables the detection of model performance degradations and concept drift so that the CT pipeline can be automatically triggered to retrain and deploy an updated ML model based on more recent data.
Implementing ML in a production environment is not just about deploying an ML model for prediction. Setting up a CI/CD system for ML enables you to automatically build, test, and deploy new ML pipeline implementations and iterate rapidly based on changes in your data and business environments. You can gradually implement CI/CD practices in your ML model training and ML pipelines as part of your MLOps processes to reap the benefits of automating your ML system development and operationalization.