Machine learning (ML) has the potential to impact businesses of all types and sizes by delivering an accelerated return on investment (ROI) through improved efficiency. For this reason, ML adoption is steadily increasing and is projected to gain yet more use and popularity.
Many organizations have established internal data science and machine learning teams, but developing custom machine learning solutions can still require excessive amounts of time and resources, which can discourage the adoption of new ML use cases.
Automated machine learning aims to simplify the entry requirements and ongoing resource requirements for developing real life AI solutions. It does so by automating both ML processes and best practices—making standardized, expert-created processes available to all.
While AutoML typically refers to the automatic selection of the model training approach alone, this term has grown to encompass the complete training phase—algorithm selection, feature engineering, feature selection, hyperparameter selection, and evaluation metric definition. The training phase is a central step of the end-to-end ML lifecycle, but not the only one. At Iguazio, we extend the concept of artificial intelligence automation to embrace all ML operations. This is AutoMLOps: the extension of AutoML to the entire ML lifecycle.
AutoMLOps provides automation in the form of pre-built and customizable components, pipelines, and infrastructure, to accelerate ML productionization using a production-first approach.
This article presents an introduction to AutoML and AutoMLOps, an overview of their benefits, a discussion of their controversial nature, and a walkthrough of how teams can adopt automated machine learning to boost their ML productivity.
In this article, AutoML refers to automated model training only, AutoMLOps refers to the automated productionization of ML models following MLOps best practices; automated machine learning refers to both.
Automated machine learning is the process of using automation to provide ML solutions for real-world business applications.
AutoML is the process of automating the selection and parametrization of the machine learning algorithm for model training. AutoML is mostly implemented for gradient boosting with tabular data, and deep learning. Two prevalent strategies deserve a deeper dive: neural architecture search, and transfer learning.
NAS automates the design of one or more neural network (NN) architectures. The NAS algorithm defines, trains, and evaluates multiple NN architectures selected from its search space given its search strategy, and proposes candidate architectures that optimize the performance estimation strategy.
Most commonly, the candidate architecture(s) optimizes for predictive performance, but multi-objective search should be considered for advanced applications that need to optimize for multiple objectives, such as memory consumption and latency.
TL enables developers to use pre-trained models—typically state-of-the-art models trained on large datasets—to train a new model for their ML task. The pre-trained model belongs to the same ML application as the new task, e.g., a pre-trained Inception model on ImageNet for an image classification task. This model is extended with a (small) new set of tunable parameters and layers that are trained on the new dataset to perform the new task. TL can reduce development time and costs considerably, and also unlock use cases where only small datasets are available.
AutoMLOps is the process of automating the deployment and productionization of machine learning models. Building on MLOps best practices, AutoMLOps focuses on providing serverless and automated ML pipelines and ML operations. This includes:
AutoMLOps is a recent process that extends AutoML beyond model training to encompass the end-to-end ML lifecycle. For more information on this development, check out Iguazio’s webinar episode.
Automated machine learning can unlock new ML potential for teams of all sizes, democratizing access to ML use cases. Its most obvious benefit is improved efficiency: Automated machine learning can reduce the effort needed to develop AI applications from weeks or months to just days—or even hours. A greater number of AI applications can thus be tested and productionized via a production-first approach which has the potential to deliver value to businesses in a short time frame.
AI applications powered by automated ML are often better than non-automated ML applications with respect to both operational performance and model performance. Operation performance is improved because automated machine learning applications come with a combination of:
Automated machine learning is based on established out-of-the-box best practices, and reduces the possibility of human error and bias. More accurate models and model pipelines lead to increased revenue, access to new business opportunities, and a better customer experience.
AutoML has risen to fame for being a step towards general AI but, at the same time, it has been criticized for being a “black box”—the impossibility of knowing the inner workings of the automated solution makes it hard for practitioners to trust AutoML and tailor it to their applications.
Both perspectives have some truth, but are based on an incorrect view of automated ML: practitioners should look at it not as a complete solution, but as an enabler or facilitator.
AutoML can be applied to experiment on data science solutions. Data scientists can use AutoML to:
Recommended approaches by AutoML should always be reviewed by the data science team for efficiency, performance, and transparency.
AutoMLOps can be applied to standardize experimentation and productionize ML applications by deploying pre-built components, pipelines, and infrastructure. These can—and should—be parameterized and customized by the ML engineering team for optimal performance for the specific use case.
With this perspective, it becomes evident that automated ML is not replacing data scientists and ML engineers, but enhancing their productivity by:
While it is possible to create an automated machine learning solution from scratch, the complexity of the task warrants the exploration and use of tools made available by third-party providers.
All cloud providers provide tools for AutoML, such as Google Vertex AI AutoML and Azure Automated ML. These are proprietary, black-box implementations that support image, tabular, text, and video data formats for a variety of ML tasks within a no-code UI workflow or Python SDK. Open-source solutions exist too, such as AutoKeras and Auto-Sklearn.
For AutoMLOps, all clouds, numerous third-party providers, and some open-source solutions exist. Typically, these are mixed and matched to create a complete automated solution for the end-to-end ML lifecycle. Examples of such tools are Amazon SageMaker for various MLOps tasks, Neptune for experiment tracking and Feast for feature store. Alternatively, MLRun covers all of these components and more, in one completely open, managed and integrated framework..
Iguazio’s MLOps platform offers an enterprise-grade integrated platform for data science and MLOps, providing automation for the end-to-end ML lifecycle. Try it for free today and see how you can optimize your ML pipelines!