#MLOPSLIVE WEBINAR SERIES
Session #11
Handling Large Datasets in Data Preparation & ML Training Using MLOps
In this technical training session, we’ll explore how to use Dask, Kubernetes, and MLRun to scale data preparation and training with maximum performance.
Dask is an open-source library for parallel computing written in Python, which can be used in conjunction with open source MLOps orchestration tool MLRun over Kubernetes to handle large-scale datasets.
In this session, we will provide a demonstration of how to use these tools to scale your data prep and ML training with ease.
Watch this session to explore:
- An overview of the tools available for large-scale data processing in Python (PySpark, Dask, Vaex, and more), and how they are used with existing ML frameworks
- Understanding Dask and how to use the same native Python code at scale, without the need to learn other technologies like Spark
- How to run Dask in a distributed and elastic way over Kubernetes to improve resource utilization
- How to deploy Dask-based data engineering and ML pipelines with MLRun and Kubeflow, in one click
- Further optimizations for handling large-scale data effectively and efficiently