The MPI-Operator Horovod Service
The platform has a default (pre-deployed) shared single-instance tenant-wide Kubeflow MPI Operator service (mpi-operator
), which facilitates Uber's Horovod distributed deep-learning framework.
Horovod, which is already preinstalled as part of the platform's Jupyter Notebook service, is widely used for creating machine-learning models that are trained simultaneously over multiple GPUs or CPUs.
For more information about using the Horovod to run applications over GPUs, see Running Applications over GPUs.