Model deployment is the process of putting machine learning models into production. This makes the model’s predictions available to users, developers or systems, so they can make business decisions based on data, interact with their application (like recognize a face in an image) and so on.
Model deployment is considered to be a challenging stage for data scientists. This is because it is often not considered their core responsibility, and due to the technological and mindset differences between model development and training and the organizational tech stack, like versioning, testing and scaling which make deployment difficult. These organizational and technological silos can be overcome with the right model deployment frameworks, tools and processes.
Only models that are deployed to production provide business value to customers and users. Anywhere between 60%-90% of models don’t make it to production, according to various analyses. Deploying machine learning models makes them available for decision-making, predictions and insights, depending on the specific end-product.
For example, let’s say a data scientist has built a model that runs a sentiment analysis on YouTube comments. After building, debugging and training the model, the model achieves excellent accuracy scores and the data scientist is happy with the results. Although a high accuracy score is great, while the model is in the research environment, its value is only theoretical, and can’t be tested on real life data (where it might perform differently). So, even if it’s the highest performing SOTA NLP analysis model in the world, the model only provides value after it has been tested and deployed into production, where it can analyze real data.
There are a number of reasons model deployment is a resource-intensive and challenging process:
Automating the deployment of models helps reduce friction and improve scalability and repeatability. By using CI/CD tools and integrating them into the MLOps pipeline, data scientists can continuously train their models and retrain if drift is detected.
When automating the model deployment pipeline, it is important to monitor that retraining is conducted correctly and that the outputs make sense. If the metrics show anomalies, the retrained model should probably not be deployed. So, automate with care: add alerts and triggers to your automation to ensure an accurate model is deployed.
Model deployment can be a complex and time consuming process. That’s why many ML teams turn to MLOps tools to ease the burden. MLRun is Iguazio’s open source ML orchestration tool that, among other things, automates the deployment of real time production pipelines.
With MLRun Serving, the ML team can work together to compose a series of steps (which can include data processing, model ensembles, model servers, post-processing steps, and so on). To see an example of how this works, check out this Advanced Model Serving Graph Notebook Example. Complex and distributed graphs can be composed with MLRun Serving, and they can include elements like streaming datac, data/document/image processing, NLP, model monitoring and more.
With MLRun you can: