Webinar

MLOps Live #34 - Agentic AI Frameworks: Bridging Foundation Models and Business Impact - January 28th

What Is Kubernetes for MLOps?

All machine learning (ML) models follow a lifecycle that is divided into two cyclical phases: model development and model deployment.

While model deployment is typically done in an interactive Jupyter notebook, as it is a highly experimental phase, putting models in production demands automation and scalability. Moving between these two phases is most typically where ML projects fail.

Building on top of well-established DevOps practices, MLOps processes and tools focus on combining development and operations to enable production-grade ML lifecycles. 

Kubernetes, a DevOps tool open-sourced in 2014 and since adopted by over half of organizations worldwide, has similarly become the preferred MLOps tool to manage automated machine learning pipelines in a reproducible, safe, and scalable way.

What Is Kubernetes?

Kubernetes is a container orchestration tool. So to understand what it is exactly, we need to understand what containers are and how they revolutionized the industry.

Figure 1: The history behind Kubernetes (Source: Kubernetes)

Originally, applications used to run on a dedicated server and were thus constrained by the hardware and OS available.

To speed up deployment and improve portability, virtualization was introduced, allowing for the abstraction of application code and environment in a virtual machine. Multiple virtual machines can run on the same hardware to reduce resource waste, with the hypervisor allocating processors, memory, and storage among them.

Still, virtualizing the physical hardware is a slow process. Containers successfully reduce deployment time to seconds by sharing the machine OS kernel so that each container hosts only the code, configurations, and packages that the application depends on. 

Containers have the unique advantage of being lightweight and portable across on-premises and cloud systems. 

Figure 2: Architectural view of Kubernetes components (Source: Kubernetes)

As a container orchestration tool, Kubernetes lets you manage multiple running containers with zero downtime. It also features autoscaling, failovers, load balancing, and more—all of which would be near to impossible to achieve manually—plus well-defined deployment patterns and amazing community support.

What Is Docker for Kubernetes?

The collection of installations, application code, and dependencies required to configure an application environment is defined in an “image.” Docker is the preferred tool to create images:

Figure 3: An example Dockerfile (Source: Docker)

Images are defined in a Dockerfile and organized in layers, where the lower in the hierarchy a layer is, the less it is expected to change over time. This is to optimize deployment efficiency, as layers are automatically cached after the first build. The example shown covers the typical steps for defining most images:

  • Import the parent image using the FROM clause; refer to Docke​​rHub for a large list of ready-made parent images. 
  • Install some custom packages using the RUN clause.
  • Copy the relevant code in the preferred working director using the COPY and WORKDIR clauses.
  • Specify which command the image should run using the CMD clause.
  • Define 3000 as the port the application listens to using the EXPOSE clause.

When the image is running, it is then called a “container” and managed by container orchestration tools such as Kubernetes.

How to Run an Image on Kubernetes

To run an image on Kubernetes, we would typically have to define a series of YAML configuration files with a minimal setup containing the following files:

  • configmap.yaml stores non-confidential input key<>value pairs; if confidential, use a secret.yaml instead.
  • deployment.yaml starts n replicas with the specified container image, mounted volumes, environment variables, configmap/secret, hardware selection, and target port.
  • service.yaml sends requests from the public node port to the deployment’s target port.
  • autoscale.yaml defines a scaling policy typically based on CPU or memory.

We can then use Kubernetes’ command-line tool, kubectl, to apply the configurations.

What Is Kubeflow?

Kubeflow is Kubernetes for ML. 

More specifically, it is an open-source machine learning platform built on top of Kubernetes that makes the development and deployment of ML workflows simple, portable, and scalable.

Figure 4: Architectural view of Kubeflow components (Source: Kubeflow)

While Kubeflow supports model deployment by providing direct integrations with model serving frameworks such as TFServing, KFServing, MLRun and Seldon, its main focus is model development.

There are three main Kubeflow components supporting model development:

  • Kubeflow Notebooks is a web-based development environment that allows users to create JupyterLab, RStudio, and Visual Studio Code (code-server) notebooks directly in the cluster with easy sharing and scalability.
  • Kubeflow Pipelines is an end-to-end pipeline orchestration that bridges the gap between ML experimentation and productionization. As each step of the pipeline is a containerized task, any ML framework for data processing and training can be supported.
  • Katib provides advanced, out-of-the-box AutoML functionalities such as hyperparameter tuning with Bayesian optimization, hyperband, and neural architecture search. 

Figure 5: Kubeflow’s central dashboard with a view of multiple tools on the left menu (Source: Kubeflow)

Last but not least, Kubeflow comes with a pre-built central user interface, which makes it easy and enjoyable for practitioners to adopt it.

How Can Kubernetes Support an End-to-End Model Lifecycle?

As mentioned before, Kubernetes supports end-to-end production-grade ML lifecycles.

Kubernetes for (Re-)Training Pipeline Automation

Automating training pipelines with Kubernetes is not a common practice because of its steep learning curve. While it is common for engineers to use Kubernetes for a broad set of applications, it is not a common skill required for data scientists.

Still, containerizing training pipelines is bound to become the new standard, as it leads to the well-known benefits of scalability, modularity, portability, and reproducibility. 

In a Kubernetes-based training or batch inference pipeline, each task is a Dockerized container with well-defined inputs, logic, and outputs. A pipeline is composed of multiple tasks, each running with its own hardware and software specifications to optimize performance.

As previously mentioned, Kubeflow provides this capability with its Pipelines.

Kubernetes for Real-Time Deployment Automation

The most common application of Kubernetes is to deploy microservices, so real-time ML inference is the most natural application of Kubernetes for MLOps.

Not only is it easy to select and deploy any framework, but Kubernetes’ inherent scalability makes it easy to optimize real-time applications for latency and throughput.

Figure 6: A simple FastAPI app for model serving

Since Kubernetes abstracts container orchestration for us, model serving is as simple as creating an app like the one in the demo code above, creating a Docker image to deploy it from and running it on Kubernetes.

Should You Opt for Self-Hosted or Managed Kubernetes/Kubeflow?

Kubernetes and Kubeflow are open-source systems, so it can be difficult to choose between self-managing or selecting one of their many managed solutions.

After all, most MLOps products are built on top of them.

Having said that, unless you demand extremely custom requirements, it is never recommended to opt for a self-managed solution. Managed Kubernetes and Kubeflow solutions such as Iguazio are the best option, as they minimize engineering overhead and maintenance while providing a set of well-established MLOps best practices to follow.