# LLMs.txt - Sitemap for AI content discovery
# Iguazio
> The Iguazio AI platform operationalizes and de-risks ML & gen AI applications at scale. Turn your AI projects into real business impact.
---
## Pages
- [De-Risk Your Gen AI Applications](https://www.iguazio.com/solutions/de-risk-your-gen-ai-applications/): Eliminate LLM risks for Gen AI apps. Meet compliance, reduce toxicity and biad and ensure peak model performance.
- [Company](https://www.iguazio.com/company/): We’re building a faster way for data science to flow seamlessly from raw idea to real-world impact. This is data science brought to life.
- [Gen AI Ops](https://www.iguazio.com/mlops-for-generative-ai/): Generative AI Ops streamlines the way you operationalize and scale your generative AI models with an approach that focuses on business value.
- [CI/CD for ML](https://www.iguazio.com/solutions/ci-cd-for-ml/): Automate and simplify the building, testing, deployment and monitoring of AI=, Continuously train, test and deploy your LLMs and ML models
- [AI in Secure IT Environments: AWS GovCloud & SCIF](https://www.iguazio.com/secure-it-environments-aws-govcloud/): The Iguazio AI Platform allows data science teams to run their machine learning pipelines on AWS GovCloud or in a SCIF.
- [Data Mesh](https://www.iguazio.com/solutions/data-mesh-for-mlops/): Implement a Data Mesh approach to data architecture to make data accessible, interconnected and valuable across the organization
- [Gaming](https://www.iguazio.com/solutions/gaming/): Use data science to optimize gaming experiences. Anticipate users' next move, reduce church, increase ROI and provide the best user experience possible.
- [Security](https://www.iguazio.com/security/): Securely develop, deploy and manage real-time AI applications at scale. Secure Environment for Development.
- [ODSC MLOps Resource Center](https://www.iguazio.com/mlops-resource-center/): Thanks for meeting with us! Enjoy the full MLOps resource center below. Find here the best resources from our experts to you.
- [Technology](https://www.iguazio.com/technology/): The Iguazio AI Platform enables you to develop, deploy and manage real-time AI applications at scale.
- [Iguazio’s ESG Strategy](https://www.iguazio.com/esg-strategy/): Iguazio is fully committed to sustainability and to helping our customers and partners make real-world impact where it matters most.
- [Questions](https://www.iguazio.com/questions/): Find here questions about data science, machine learning and serverless automation and meet Iguazio experts at the industry's leading events
- [Energy and Utilities](https://www.iguazio.com/solutions/energy-and-utilities/): Energy and utilities operate in data-rich environments of customer and sensor data which can be leveraged using AI to transform the way they do business.
- [Hackathon Terms](https://www.iguazio.com/hackathon-terms/): Find here the MLOps for Good Hackathon Terms and Conditions, read everything and contact us in case of doubts and suggestions.
- [MLOps Glossary](https://www.iguazio.com/glossary/): This glossary will help you understand the main terms, acronyms, and abbreviations related to AI apps at scale in an end-to-end MLOps platform.
- [Technology OLD](https://www.iguazio.com/technology-old-2/): The Iguazio Data Science Platform enables you to develop, deploy and manage real-time AI applications at scale.
- [MLRun](https://www.iguazio.com/open-source/mlrun/): Open source AI orchestration open source framework, to run code either locally on your PC for or on a large scale Kubernetes cluster.
- [MLOps](https://www.iguazio.com/mlops/): MLOps is the practice of creating continuous development, integration and delivery (CI/CD) of data and ML intensive applications.
- [Real-Time Feature Engineering](https://www.iguazio.com/real-time-feature-engineering/): Feature engineering made simple: Ingest real-time data, perform data transformation, generate and share real-time features across teams.
- [Model Monitoring](https://www.iguazio.com/solutions/model-monitoring/): Continuously track models in production to automatically detect drift and maintain accuracy in rapidly changing live environments
- [Healthcare](https://www.iguazio.com/solutions/healthcare/): The Iguazio Data Science Platform for Healthcare enables healthcare facilities to develop, deploy and manage AI applications.
- [Home - New](https://www.iguazio.com/): The Iguazio AI platform operationalizes and de-risks ML & gen AI applications at scale. Turn your AI projects into real business impact.
- [Customers](https://www.iguazio.com/customers/): Transforming AI initiatives into real-world outcomes with MLOps. Learn how customers use the Iguazio MLOps Platform.
- [Integrated Feature Store](https://www.iguazio.com/feature-store/): Engineer online and offline features, and make them accessible to everyone with one fully integrated feature store.
- [What Is A Machine Learning Pipeline?](https://www.iguazio.com/machine-learning-pipeline/): A machine learning pipeline helps to streamline and speed up the process by automating these workflows and linking them together.
- [What Are Machine Learning Pipeline Tools?](https://www.iguazio.com/machine-learning-pipeline-tools/): A machine learning pipeline tool helps automate and streamline machine learning pipelines. Learn more on our page.
- [What Is Enterprise Data Science?](https://www.iguazio.com/enterprise-data-science/): Enterprise data science combines data scientists, data engineers, IT teams, and more to generate value out of big data.
- [Machine Learning Operations (MLOps)](https://www.iguazio.com/machine-learning-operations-mlops/): Machine learning operations (MLOps) is considered to be the backend supporting ML applications in business.
- [What is Operationalizing Machine Learning?](https://www.iguazio.com/operationalizing-machine-learning/): Operationalizing machine learning is one of the final stages before deploying and running an ML model in a production environment.
- [MLOps Live Webinar Series](https://www.iguazio.com/mlops-live-webinar-series/): Webinars with industry leaders sharing practical advice & demonstrating how they’ve made real business impact by bringing data science to life.
- [Iguazio Support Policy](https://www.iguazio.com/supportpolicy/): Explore Iguazio's support policy for seamless assistance. Find out how we are dedicated to helping you every step of the way.
- [Customer Support](https://www.iguazio.com/support/): Iguazio’s dedicated specialists are here to support you in deploying real-world data science applications for business impact.
- [Terms of Use](https://www.iguazio.com/terms-of-use/): Discover the comprehensive terms of use for the Iguazio platform. Ensure safe & compliant use by acquainting yourself with our policies.
- [Privacy Policy](https://www.iguazio.com/privacy-policy/): Understand your privacy rights on the Iguazio wesbite. Learn how we protect your data and prioritize your privacy.
- [Career Inner Page](https://www.iguazio.com/career-inner-page/): We’re building the future of data science. Join our team of experts, see our open positions.
- [AI Pipeline Orchestration](https://www.iguazio.com/solutions/pipeline-orchestration/): Manage your gen AI and ML workflow at scale to get from pilot to production faster. Hybrid and on prem deployment. Automations.
- [News & Events](https://www.iguazio.com/news-events/): Read the latest news on Iguazio and meet our experts at the industry's leading events for data science, ML and AI.
- [Nuclio](https://www.iguazio.com/open-source/nuclio/): Discover Nuclio for seamless serverless pipeline automation, enhancing scalability and performance for your data applications.
- [Careers](https://www.iguazio.com/careers/): We're building the future of AI. If you think you're the perfect fit for Iguazio send us your CV. Join Us!
- [Serverless Automation](https://www.iguazio.com/solutions/serverless-automation/): Automate each step of the pipeline with enterprise performance, scale and reliability using Nuclio, the open source serverless framework
- [Ad-Tech](https://www.iguazio.com/solutions/ad-tech/): Use data science to optimize ad-tech. Anticipate users' next move, increase engagement and provide the best user experience possible.
- [Smart Mobility](https://www.iguazio.com/solutions/smart-mobility/): The Iguazio AI platform for Smart Mobility enables you to operationalize and de-risk AI applications.
- [Retail](https://www.iguazio.com/solutions/retail/): Transform retail with Iguazio's AI-driven platform. Boost sales, optimize operations, and enhance customer experience.
- [Manufacturing](https://www.iguazio.com/solutions/manufacturing/): Enhance manufacturing efficiency with Iguazio's AI platform. Real-time insights and AI-driven solutions.
- [Telecommunications](https://www.iguazio.com/solutions/telecommunications/): Create automatically self-healing systems, improving network optimization, churn reduction, customer engagement and more.
- [Solutions](https://www.iguazio.com/solutions/): Leverage artificial intelligence to power true business impact with AI across industries, technologies and use-cases.
- [GPU Management](https://www.iguazio.com/solutions/gpu-as-a-service/): Accelerate AI, deep learning and data processing using a GPU provisioning solution, for faster and scalable AI-based applications
- [Open Source](https://www.iguazio.com/open-source/): Iguazio's open source projects for Real-time Serverless Functions and ML Pipeline Orchestration
- [Partners](https://www.iguazio.com/partners/): Join our channel and technology partner programs to work with a pioneering data science platform and benefit from Iguazio's growing partner ecosystem.
- [Contact](https://www.iguazio.com/contact/): Have a question or just need more info about Iguazio? Contact us and we’ll get back to you!
- [Financial Services](https://www.iguazio.com/solutions/financial-services/): From mitigating risks proactively, to creating personalized customer services, harness the power of AI to transform your bottom line.
- [AI Blog](https://www.iguazio.com/blog/): Keep up to date on the latest trends in AI, genAI, data science & MLOps (operationalizing machine learning).
- [Platform](https://www.iguazio.com/platform/): Meet the platform that automates AI and cuts the time to impact of your gen AI and ML applications. Click for more.
## Posts
- [Introducing Agentic RAG: The Best of Both Worlds](https://www.iguazio.com/blog/introducing-agentic-rag-the-best-of-both-worlds/): How Agentic RAG revolutionizes AI-powered applications by making them more autonomous, intelligent, and context-aware.
- [Gen AI Trends and Scaling Strategies for 2025](https://www.iguazio.com/blog/gen-ai-trends-and-scaling-strategies-for-2025/): How should enterprises respond to the turbo charged changes in gen AI trends? Gartner's strategies for the year ahead.
- [AI Agent Training: Essential Steps for Business Success](https://www.iguazio.com/blog/ai-agent-training-essential-steps-for-business-success/): How to train AI agents that are not only powerful but also reliable, scalable, and aligned with business goals. Read more here.
- [Best 13 Free Financial Datasets for Machine Learning [Updated]](https://www.iguazio.com/blog/best-13-free-financial-datasets-for-machine-learning/): Developing ML models for financial or economic use cases? Here are 13 great open financial datasets to develop and train ML models.
- [Gen AI or Traditional AI: When to Choose Each One](https://www.iguazio.com/blog/gen-ai-or-traditional-ai-when-to-choose-each-one/): How do you decide when to use traditional AI vs. generative AI? Here's our roadmap to achieving your business goals with each.
- [Top Gen AI Demos of AI Applications With MLRun](https://www.iguazio.com/blog/top-gen-ai-demos-of-ai-applications-with-mlrun/): A roundup of our top gen AI demo videos showing how to build and manage AI applications with open-source MLRun.
- [6 Best Practices for Implementing Generative AI](https://www.iguazio.com/blog/6-best-practices-for-implementing-generative-ai/): This guide outlines six best practices to ensure your generative AI initiatives are effective: valuable, scalable, compliant and future-proof.
- [2025 Gen AI Predictions: What Lies Ahead?](https://www.iguazio.com/blog/2025-gen-ai-predictions-what-lies-ahead/): Cheers to a successful 2025! Who will thrive in this new ecosystem? Here are my predictions for the upcoming year.
- [Choosing the Right-Sized LLM for Quality and Flexibility: Optimizing Your AI Toolkit](https://www.iguazio.com/blog/choosing-the-right-sized-llm-for-quality-and-flexibility-optimizing-your-ai-toolkit/): Are you using an LLM that is fit to your specific use case? Here's our guide on choosing and optimizing LLMs for performance and cost.
- [MLRun v1.7 Launched — Solidifying Generative AI Implementation and LLM Monitoring](https://www.iguazio.com/blog/mlrun-v1-7-launched-solidifying-generative-ai-implementation-and-llm-monitoring/): MLRun 1.7 is now available with powerful features for GenAI implementation, with a special emphasis on LLM monitoring.
- [Gen AI for Marketing - From Hype to Implementation](https://www.iguazio.com/blog/gen-ai-marketing-hype-implementation/): Gen AI for marketing use cases - how? Read to learn a staged approach for rolling out gen AI, use cases, a demo & examples
- [Implementing Gen AI in Regulated Sectors: Finance, Telecom, and More](https://www.iguazio.com/blog/how-to-implement-gen-ai-in-highly-regulated-environments/): Gen AI challenges are exacerbated in highly-regulated industries, such as financial services and telecommunications. Here's what to do.
- [Building Scalable Gen AI Apps with Iguazio and MongoDB](https://www.iguazio.com/blog/building-gen-ai-applications-with-iguazio-and-mongodb/): Discover how to build & scale gen AI apps with Iguazio (acquired by McKinsey) and MongoDB - with simplicity, performance & risk mitigation
- [RAG vs Fine-Tuning: Navigating the Path to Enhanced LLMs](https://www.iguazio.com/blog/rag-vs-fine-tuning/): RAG uses external resources to train models, fine-tuning involves further training on specialized datasets. Learn when to use each one and why
- [Commercial vs. Self-Hosted LLMs: A Cost Analysis & How to Choose the Right Ones for You](https://www.iguazio.com/blog/commercial-vs-self-hosted-llms/): How to choose a proprietary LLM vs. self-hosting open-source. See use cases, requirements and benchmarks, and optimization tools for LLMs.
- [Transforming Enterprise Operations with Gen AI](https://www.iguazio.com/blog/transforming-enterprise-operations-with-gen-ai/): Effectively implement and scale gen AI while avoiding risk. See use cases, from R&D to automotive to the supply chain.
- [Future-Proofing Your App: Strategies for Building Long-Lasting Apps](https://www.iguazio.com/blog/future-proofing-your-gen-ai-aplications-strategies-for-building-long-lasting-apps/): Learn to build gen AI pipelines that are modular, so they can support up-to-date LLM deployment and management
- [LLM Validation and Evaluation](https://www.iguazio.com/blog/llm-validation-and-evaluation/): LLM validation & evaluation is about assessing the performance and capabilities. Learn LLM methods and see a demo with crowd sourcing.
- [Integrating LLMs with Traditional ML: How, Why & Use Cases](https://www.iguazio.com/blog/integrating-llms-with-traditional-ml-how-why-use-cases/): Integrating LLMs with traditional ML models enhances each model’s capabilities. Discover the benefits of integration and example use cases.
- [LLM Metrics: Key Metrics Explained](https://www.iguazio.com/blog/llm-metrics-key-metrics-explained/): The top LLM metrics to measure for higher performing models at higher efficiency, while ensuring privacy and eliminating bias and toxicity.
- [Generative AI in Call Centers: How to Transform and Scale Superior Customer Experience](https://www.iguazio.com/blog/generative-ai-in-call-centers-how-to-transform-and-scale-superior-customer-experience/): The coming transformative impact of gen AI on call centers, and how to build a gen AI call center analysis app now.
- [Why You Need GPU Provisioning for GenAI](https://www.iguazio.com/blog/why-you-need-gpu-as-a-service-for-genai/): GPU provisioning simplifies the management of GPUs to improve performance and save significant costs. Here's how and why to leverage
- [Best 10 Free Datasets for Manufacturing [UPDATED]](https://www.iguazio.com/blog/free-manufacturing-datasets/): Here are 10 excellent open manufacturing datasets and data sources for manufacturing data for machine learning. Get now.
- [Implementing Gen AI for Financial Services](https://www.iguazio.com/blog/implementing-gen-ai-for-financial-services/): How Gen AI revolutionizes financial services, its potential impact, pitfalls, and strategies. Insights from industry experts included.
- [LLMOps vs. MLOps: Understanding the Differences](https://www.iguazio.com/blog/llmops-vs-mlops-understanding-the-differences/): LLMOps applies MLOps principles to LLMs. This post delves into the concepts of LLMOps and MLOps, explaining how and when to use each one.
- [Implementing Gen AI in Practice](https://www.iguazio.com/blog/implementing-genai-in-practice/): How to build an ‘AI Factory’ that streamlines and simplifies the process of rolling out new generative AI applications. Read more here.
- [How HR Tech Company Sense Scaled their ML Operations using Iguazio](https://www.iguazio.com/blog/how-hr-tech-company-sense-scaled-their-ml-operations-using-iguazio/): Customer story: How Sense built their AI chatbot using a complex NLP serving pipeline, and how they overcame some ML challenges with Iguazio.
- [What Lays Ahead in 2024? AI/ML Predictions for the New Year](https://www.iguazio.com/blog/ai-ml-predictions-for-the-new-year/): Iguazio CTO Yaron Haviv's predictions for the upcoming year: What will be different in the way businesses approach generative AI this year?
- [16 Best Free Human Annotated Datasets for Machine Learning](https://www.iguazio.com/blog/best-free-human-annotated-datasets-for-ml/): Human-annotated datasets offer a level of precision, nuance, and contextual understanding that automated methods struggle to match.
- [Introducing our New Book: Implementing MLOps in the Enterprise](https://www.iguazio.com/blog/introducing-our-new-book-implementing-mlops-in-the-enterprise/): A practical guide which guides IT leaders to bring data science to life for a variety of real-world MLOps scenarios. Find here.
- [Scaling MLOps Infrastructure: Components and Considerations for Growth](https://www.iguazio.com/blog/scaling-mlops-infrastructure-components-and-considerations-for-growth/): When it comes to scaling your MLOps operations, a high-quality, reliable and effective MLOps platform is essential for growth.
- [11 Best Free Retail Datasets for Machine Learning](https://www.iguazio.com/blog/13-best-free-retail-datasets-for-machine-learning/): Retail datasets for ML can be hard to find. Here are 17 excellent open retail datasets and data sources for your next ML project.
- [How to Build a Smart GenAI Call Center App](https://www.iguazio.com/blog/how-to-build-a-smart-genai-call-center-app/): Building a generative AI smart call center app is a promising solution for improving the customer experience and call center efficiency.
- [Top 27 Free Healthcare Datasets for Machine Learning](https://www.iguazio.com/blog/top-22-free-healthcare-datasets-for-machine-learning/): We pulled together 27 excellent open datasets in the field of healthcare for your next machine learning project. Learn more about it here.
- [Top 10 ODSC West Sessions You Must Attend in 2023](https://www.iguazio.com/blog/odsc-west-sessions-you-must-attend/): Planning to attend ODSC West 2023? We've compiled the top sessions at this conference that we're most looking forward to.
- [28 Best Free NLP Datasets for Machine Learning](https://www.iguazio.com/blog/nlp-datasets/): Building an AI application with NLP? You'll need a robust dataset. Here are some of the top open NLP datasets for you to leverage.
- [How to Mask PII Before LLM Training](https://www.iguazio.com/blog/how-to-mask-pii-before-llm-training/): The PII Recognizer is an open source function that can detect and anonymize PII data in datasets, so you can build Gen AI apps securely.
- [Model Observability and ML Monitoring: Key Differences and Best Practices](https://www.iguazio.com/blog/model-observability/): We delve into the distinctions between model observability and ML monitoring, shedding light on their unique attributes and functionalities.
- [Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects](https://www.iguazio.com/blog/implementing-mlops-5-key-steps-for-successfully-managing-ml-projects/): MLOps accelerates the ML deployment process. Here are the critical steps of MLOps and what to look for in an MLOps platform.
- [MLOps for Generative AI in the Enterprise](https://www.iguazio.com/blog/mlops-for-generative-ai-in-the-enterprise/): How to leverage LLMs in live enterprise applications, and embed Responsible AI principles into the process. Find more here.
- [Mastering ML Model Performance: Best Practices for Optimal Results](https://www.iguazio.com/blog/mastering-ml-model-performance-best-practices-for-optimal-results/): Evaluating ML model performance is essential for ensuring the reliability, quality, accuracy and effectiveness of your ML models.
- [What are the Advantages of Automated Machine Learning Tools?](https://www.iguazio.com/blog/automl-advantages/): The benefits of AutoML, the available tools and how to choose the right tool for your industry, plus how to implement AutoMLOps
- [Integrating MLOps with MLRun and Databricks](https://www.iguazio.com/blog/integrating-mlops-with-mlrun-and-databricks/): How to leverage multiple MLOps tools to streamline model serving for complex real-time use cases. Find here.
- [Deploying Machine Learning Models for Real-Time Predictions Checklist](https://www.iguazio.com/blog/deploying-machine-learning-models-for-real-time-predictions-checklist/): Here are our recommendations for data professionals who want to improve and streamline their real-time model deployment process.
- [Top 7 ODSC East Sessions You Can’t Afford to Miss](https://www.iguazio.com/blog/odsc-east-2023/): We've compiled the top sessions at ODSC East Boston 2023 that we're most looking forward to, covering topics like real-time ML, LLMs, data privacy and more.
- [Kubeflow Vs. MLflow Vs. MLRun: Which One is Right for You?](https://www.iguazio.com/blog/kubeflow-vs-mlflow-vs-mlrun/): We dive into these three tools to better understand their capabilities, and how they fit into the ML lifecycle. Learn more here.
- [How Seagate Runs Advanced Manufacturing at Scale With Iguazio](https://www.iguazio.com/blog/how-seagate-runs-advanced-manufacturing-at-scale-with-iguazio/): How Seagate tackled their predictive manufacturing use case with continuous data engineering at scale, keeping costs low and productivity high.
- [McKinsey Acquires Iguazio: Our Startup’s Journey](https://www.iguazio.com/blog/mckinsey-acquires-iguazio-our-startups-journey/): When I founded Iguazio with my co-founders, I never thought I would be making this announcement on our company blog: McKinsey acquired Iguazio!
- [HCI’s Journey to MLOps Efficiency](https://www.iguazio.com/blog/hcis-journey-to-mlops-efficiency/): Deploying ML in the enterprise is a complex process. Jiri Steuer from HCI shares his top tips and ideas for achieving MLOps efficiency.
- [Distributed Feature Store Ingestion with Iguazio, Snowflake, and Spark ](https://www.iguazio.com/blog/distributed-feature-store-ingestion-with-iguazio-snowflake-and-spark/): Our step-by-step guide on how to leverage distributed ingestion into the Iguazio feature store with Snowflake and Spark for ingestion.
- [Looking into 2023: Predictions for a New Year in MLOps](https://www.iguazio.com/blog/mlops-predictions-for-2023/): As we raise our glasses to the upcoming year, here are my predictions of what we'll see in the MLOps industry in 2023
- [Iguazio Named a Major Player in the IDC MLOps MarketScape 2022](https://www.iguazio.com/blog/idc-mlopmarketscape-2022/): IDC MarketScape is based on a rigorous framework that highlights the factors expected to be the most influential in the both short- and long term.
- [Iguazio Named a Leader and Outperformer In GigaOm Radar for MLOps 2022](https://www.iguazio.com/blog/iguazio-named-a-leader-and-outperformer-in-gigaom-radar-for-mlops-2022/): Iguazio is thrilled to be named an Outperforming Leader in GigaOm’s latest 2022 report on MLOps.
- [Deploying Your Hugging Face Models to Production at Scale with MLRun](https://www.iguazio.com/blog/deploying-your-hugging-face-models-to-production-at-scale-with-mlrun/): Here's how to continuously deploy Hugging Face models into real business environments at scale, along with the required application logic.
- [Top 10 ODSC West Sessions You Must Attend!](https://www.iguazio.com/blog/top-10-odsc-west-sessions-you-must-attend/): We've compiled the sessions at ODSC West 2022 in San Francisco that we're most looking forward to, covering a wide range of topics.
- [How to Run Workloads on Spark Operator with Dynamic Allocation Using MLRun](https://www.iguazio.com/blog/how-to-run-workloads-on-spark-operator-with-dynamic-allocation-using-mlrun/): With Spark Operator, dynamic executor allocation saves big costs. Here's how to abstract away the implementation complexities with MLRun.
- [Building an Automated ML Pipeline with a Feature Store Using Iguazio & Snowflake](https://www.iguazio.com/blog/building-an-automated-ml-pipeline-with-a-feature-store-using-iguazio-snowflake/): Here's how to activate your Snowflake data by using the Iguazio feature store to build, store and share features from your Snowflake data.
- [Iguazio Product Update: Optimize Your ML Workload Costs with AWS EC2 Spot Instances](https://www.iguazio.com/blog/iguazio-product-update-optimize-your-ml-workload-costs-with-aws-ec2-spot-instances/): Choosing a spot instance is a cost-saving choice if you can be flexible about when your applications run and if can be interrupted.
- [From AutoML to AutoMLOps: Automated Logging & Tracking of ML](https://www.iguazio.com/blog/from-automl-to-automlops/): Here's how to embrace AutoMLOps, to automate ML engineering tasks so that your code is automatically ready for production.
- [How to Deploy an MLRun Project in a CI/CD Process with Jenkins Pipeline ](https://www.iguazio.com/blog/how-to-deploy-an-mlrun-project-in-a-ci-cd-process-with-jenkins-pipeline/): In this article, we will walk you through steps to run a Jenkins server in docker and deploy the MLRun project using Jenkins pipeline.
- [Beyond Hyped: Iguazio Named in 8 Gartner Hype Cycles for 2022](https://www.iguazio.com/blog/beyond-hyped-iguazio-named-in-8-gartner-hype-cycles-for-2022/): We’re very excited to share that Iguazio has been named a sample vendor in eight Gartner Hype Cycles for 2022.
- [Build an AI App in Under 20 Minutes](https://www.iguazio.com/blog/build-an-ai-app-in-under-20-minutes/): Here's how to build simple AI applications that leverage pre-built ML models and allow you to interact with a UI to visualize the results.
- [Machine Learning Experiment Tracking from Zero to Hero in 2 Lines of Code ](https://www.iguazio.com/blog/machine-learning-experiment-tracking-from-zero-to-hero-in-2-lines-of-code/): Here's how to turn your existing model training code into an MLRun job and get the benefit of all the experiment tracking, plus more.
- [Iguazio Recognized in Gartner's 2022 Market Guide for DSML Engineering Platforms](https://www.iguazio.com/blog/gartner-2022-market-guide-for-dsml-engineering-platforms/): We’re proud to share that Iguazio has been included in Gartner's 2022 Market Guide for DSML Engineering Platforms.
- [The Easiest Way to Track Data Science Experiments with MLRun](https://www.iguazio.com/blog/experiment-tracking/): Learn how to solve experiment tracking complexity concerns with MLRun, a new open source framework which optimizes the management of ML operations.
- [Top 9 ODSC Europe Sessions You Can’t Miss!](https://www.iguazio.com/blog/top-9-odsc-europe-sessions-you-cant-miss/): See you soon at ODSC Europe in London! Here's our list of the top recommended sessions we're looking forward to attending.
- [Best Practices for Succeeding with MLOps ](https://www.iguazio.com/blog/best-practices-for-succeeding-with-mlops/): In our latest webinar, Noah Gift sat down with us to discuss best practices for succeeding with MLOps, and much more.
- [Top 8 Recommended MLOps World 2022 Sessions](https://www.iguazio.com/blog/top-8-recommended-mlops-world-2022-sessions/): We've compiled the top sessions at MLOps World 2022 in Toronto that we're most looking forward to. See you there!
- [Using Snowflake and Dask for Large-Scale ML Workloads](https://www.iguazio.com/blog/using-snowflake-and-dask-for-large-scale-ml-workloads/): Snowflake’s Connector for Python is a great fit for large-scale ML workloads with Dask. Here’s how to use it.
- [ODSC East Boston 2022 - Top 11 Sessions for AI and ML Professionals to Attend](https://www.iguazio.com/blog/odsc-east-boston-2022-top-11-sessions-for-ai-and-ml-professionals-to-attend/): ODSC East Boston is fast approaching! Our picks for the top sessions covering AI/ML, MLOps, ML use cases, AI explainability, and more.
- [Real-Time Streaming for Data Science](https://www.iguazio.com/blog/real-time-streaming-for-data-science/): Explore real-time streaming for data science with serverless architecture and time series analysis to drive smarter insights.
- [GigaOm Names Iguazio a Leader and Outperformer for 2022](https://www.iguazio.com/blog/gigaom-names-iguazio-a-leader-and-outperformer-for-2022/): GigaOm gave Iguazio top scores on several evaluation metrics in the GigaOm Radar for Data Science Platforms 2022 report.
- [Top 8 Machine Learning Resources for Data Scientists, Data Engineers and Everyone](https://www.iguazio.com/blog/top-8-machine-learning-resources-for-data-scientists-data-engineers-and-everyone/): Explore our list of the top ML and MLOps learning resources, with blogs, video series, online communities and more.
- [Iguazio named in Forrester's Now Tech: AI/ML Platforms](https://www.iguazio.com/blog/iguazio-named-in-forresters-now-tech-ai-ml-platforms-q1-2022/): We are delighted to share that Iguazio has been named in Forrester’s Overview of the leading AI/ML Platform Providers
- [ML Workflows: What Can You Automate?](https://www.iguazio.com/blog/ml-workflows-what-can-you-automate/): How automating various steps in the ML workflow using a MLOps approach can help data teams achieve faster deployment of ML models
- [Orchestrating ML Pipelines at Scale with Kubeflow](https://www.iguazio.com/blog/orchestrating-ml-pipelines-scale-kubeflow/): It’s okay if you’re a hobbyist, but data science models are meant to be incorporated into real business applications. Read more here.
- [Automating MLOps for Deep Learning: How to Operationalize DL With Minimal Effort](https://www.iguazio.com/blog/automating-mlops-for-deep-learning-how-to-operationalize-dl-with-minimal-effort/): How to use MLRun to orchestrate and automate the process of taking deep learning pipelines from research to production.
- [What Are Feature Stores and Why Are They Critical for Scaling Data Science?](https://www.iguazio.com/blog/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science/): Feature stores enable data scientists to reuse features instead of rebuilding these features again for different models, saving them valuable time.
- [The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 4](https://www.iguazio.com/blog/the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml-part-4/): Part 4 of our 4-part blog series that will take you through how the Iguazio feature store works with Microsoft Azure Cloud.
- [The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 3](https://www.iguazio.com/blog/the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml-part-3/): Part 3 of a 4-part blog series that will take you through how the Iguazio feature store works with Microsoft Azure Cloud.
- [The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 2](https://www.iguazio.com/blog/the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml-part-2/): Part 2 of a 4-part blog series that will take you through how the Iguazio feature store works with Microsoft Azure Cloud.
- [The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 1](https://www.iguazio.com/blog/part-one-the-complete-guide-to-using-the-iguazio-feature-store-with-azure-ml/): This 4-part blog series will take you through how the Iguazio feature store works with Microsoft Azure Cloud.
- [Looking into 2022: Predictions for a New Year in MLOps](https://www.iguazio.com/blog/2022-predictions/): Before we toast to a new year ahead, here are my predictions of what awaits the MLOps industry in 2022.
- [Adopting a Production-First Approach to Enterprise AI](https://www.iguazio.com/blog/adopting-a-production-first-approach-to-enterprise-ai/): Modern AI applications require a continuous operational pipeline and a production-first approach to make it all feasible.
- [Introduction to TF Serving](https://www.iguazio.com/blog/introduction-to-tf-serving/): Our in-depth guide to what Tensorflow Serving is, why you need it, and how to use it, for beginners to experts.
- [ODSC West Conference- Top 6 Sessions You Must Attend](https://www.iguazio.com/blog/odscwest2021/): ODSC West Reconnect is the place to be for MLOps, data science, and AI. Here are our top 6 recommended sessions for this year's conference.
- [It Worked Fine in Jupyter. Now What? ](https://www.iguazio.com/blog/it-worked-fine-in-jupyter-now-what/): How to use MLRun to quickly deploy applications, and run on Kubernetes without changing code or learning a new technology.
- [How to Bring Breakthrough Performance and Productivity To AI/ML Projects](https://www.iguazio.com/blog/how-to-bring-breakthrough-performance-and-productivity-to-ai-ml-projects/): Pure Storage and Iguazio empower enterprises to focus on their business applications and not the underlying infrastructure.
- [Building Machine Learning Pipelines with Real-Time Feature Engineering](https://www.iguazio.com/blog/building-real-time-ml-pipelines-with-a-feature-store/): An online feature store enables ML teams to harness real-time data, perform complex calculations in real time and make fast decisions based on fresh data.
- [Implementing Automation and an MLOps Framework for Enterprise-scale ML](https://www.iguazio.com/blog/implementing-automation-and-an-mlops-framework-for-enterprise-scale-ml/): Enterprise AI in production is still immature. Companies are implementing MLOps frameworks to get to production and scale up.
- [Using Automated Model Management for CPG Trade Success](https://www.iguazio.com/blog/using-automated-model-management-for-cpg-trade-success/): AI/ML can help CPG companies execute promotions with greater precision to obtain impactful results and higher ROI.
- [All That Hype: Iguazio Listed in 7 Gartner Hype Cycles for 2021](https://www.iguazio.com/blog/iguazio-listed-in-7-gartner-hype-cycles-for-2021/): We're proud to be mentioned in 7 Gartner Hype Cycles for the second year in a row!
- [Announcing the Winners of the MLOps for Good Hackathon](https://www.iguazio.com/blog/announcing-the-winners-mlops-for-good-hackathon/): We just wrapped up the first-ever MLOps for Good hackathon, and we are so thrilled by the incredible response we’ve gotten from the ML community.
- [MLOps for Good Hackathon Roundup](https://www.iguazio.com/blog/mlops-for-good-hackathon-roundup/): Our roundup of some impactful ML projects for the MLOps for Good hackathon, that bring data science to production with MLOps.
- [Operationalizing Machine Learning for the Automotive Future](https://www.iguazio.com/blog/operationalizing-machine-learning-for-the-automotive-future/): New mobility application scenarios require complex MLOps planning, to process data efficiently and cost-effectively for a connected car world.
- [Building Unified Data Integration and ML Pipelines with Azure Synapse](https://www.iguazio.com/blog/azure-synapse-analytics-and-iguazio/): By implementing Azure Synapse along with Iguazio's feature store, enterprises can build a single end-to-end pipeline and rapidly run AI/ML.
- [Top 10 Recommended MLOps World 2021 Sessions](https://www.iguazio.com/blog/top-10-recommended-mlops-world-2021-sessions/): MLOps World begins (soon!) on June 14, and it’s full of interesting topics. We’ve put together our pick of the top 10 MLOps World sessions.
- [Top 9 Recommended ODSC Europe 2021 Sessions](https://www.iguazio.com/blog/top-9-recommended-odsc-europe-2021-sessions/): ODSC Europe is a great opportunity to meet with and learn from the top data science professionals in the industry.
- [Announcing Iguazio Version 3.0: Breaking the Silos for Faster Deployment](https://www.iguazio.com/blog/announcing-iguazio-version-3-0-breaking-the-silos-for-faster-deployment/): We’re delighted to announce the release of the Iguazio Data Science Platform version 3.0, with features to help you get to production, fast.
- [Iguazio Named A Fast Moving Leader by GigaOm in the ‘Radar for MLOps’ Report](https://www.iguazio.com/blog/iguazio-named-a-fast-moving-leader-by-gigaom-in-the-radar-for-mlops-report/): We’re proud to share that the Iguazio Data Science Platform has been named a fast moving leader in the GigaOm Radar for MLOps report.
- [Join us at NVIDIA GTC 2021](https://www.iguazio.com/blog/join-us-at-nvidia-gtc-2021/): Join Iguazio at GTC21 for four sessions on our favorite topics, MLOps and AI!
- [Simplify Your AI/ML Journey with Higher-Level Abstraction, Automation](https://www.iguazio.com/blog/how-to-tap-into-higher-level-abstraction-efficiency-automation-to-simplify-your-ai-ml-journey/): Explore how to use NetApp Astra and the Iguazio Data Science Platform together to abstract away complexity and accelerate AI deployment
- [Iguazio Honored in 2021 Gartner Magic Quadrant for Data Science](https://www.iguazio.com/blog/iguazio-receives-an-honorable-mention-in-the-2021-gartner-magic-quadrant-for-data-science-and-machine-learning-platforms/): Iguazio has received an honorable mention in the 2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms.
- [Concept Drift Deep Dive: How to Build a Drift-Aware ML System](https://www.iguazio.com/blog/concept-drift-deep-dive-how-to-build-a-drift-aware-ml-system/): Can your ML applications cope with the unexpected? We're sharing a deep dive into building a drift-aware ML system.
- [Accelerating ML Deployment in Hybrid Environments](https://www.iguazio.com/blog/accelerating-ml-deployment-in-hybrid-environments/): Deploying AI on local AWS Outposts using Iguazio provides a simple way for ML teams to work across hybrid cloud & edge environments.
- [Handling Large Datasets in Data Preparation & ML Training Using MLOps](https://www.iguazio.com/blog/handling-large-datasets-with-mlops-dask-on-kubernetes/): Learn MLOps best practices for scaling data preparation and ML training, plus code examples of running Dask on Kubernetes.
- [The Importance of Data Storytelling in Shaping a Data Science Product](https://www.iguazio.com/blog/the-importance-of-data-storytelling-in-shaping-a-data-science-product/): Skillful data storytelling delivers insights to audiences with compelling visuals and narratives, placing new perspectives on changing data.
- [How to Build Real-Time Feature Engineering with a Feature Store](https://www.iguazio.com/blog/how-to-build-real-time-feature-engineering-with-a-feature-store/): It’s already possible to build a real-time feature store, and it could revolutionize real-time feature engineering for a number of use cases.
- [Predictive Real-Time Operational ML Pipeline: Fighting First-Day Churn](https://www.iguazio.com/blog/predictive-real-time-operational-ml-pipeline-fighting-customer-churn/): Fight first-day churn with a data science platform that enables rapid deployment of a real-time operational ML pipeline at scale.
- [Kubeflow: Simplified, Extended and Operationalized](https://www.iguazio.com/blog/extending-kubeflow-into-an-end-to-end-ml-solution/): Extend Kubeflow functionality by enabling small teams to build complex real-time data processing and model serving pipelines.
- [Elevating Data Science Practices for the Media, Entertainment & Advertising Industries](https://www.iguazio.com/blog/data-science-salon-review-elevating-data-science-practices-for-media-entertainment-advertising/): Data Science Salon wrapped up last week with insightful talks on many topics in Media, Arts & Entertainment. Check out our roundup!
- [Building ML Pipelines Over Federated Data & Compute Environments](https://www.iguazio.com/blog/building-ml-pipelines-over-federated-data-compute-environments/): Enabling data engineers to take advantage of high-performance training infrastructure whether within on-premise data centers or in the cloud.
- [How to Run Spark Over Kubernetes to Power Your Data Science Lifecycle](https://www.iguazio.com/blog/spark-over-kubernetes/): Follow this step by step tutorial on working with Spark in a Kubernetes environment to modernize your data science ecosystem.
- [MLOps for Python: Real-Time Feature Analysis](https://www.iguazio.com/blog/mlops-for-python/): With MLOps you can deploy Python code straight into production without rewriting it, saving you time & resources without sacrificing accuracy or performance
- [What's All the Hype About? Iguazio Listed in Five 2020 Gartner Hype Cycles](https://www.iguazio.com/blog/iguazio-listed-in-five-2020-gartner-hype-cycle-reports/): Iguazio has been named a sample vendor in the The 2020 Gartner Hype Cycle for Data Science and Machine Learning and four additional Gartner Hype Cycles.
- [Predicting 1st Day Churn in Real Time](https://www.iguazio.com/blog/predicting-1st-day-churn-in-real-time/): The average for 1st day churn hovers at 70%. The solution? Predict user retention in the crucial first seconds and minutes after a new user onboards.
- [Breaking the Silos Between Data Scientists, Engineers & DevOps with New MLOps Practices](https://www.iguazio.com/blog/breaking-the-silos-between-data-scientists-engineers-and-devops-with-new-mlops-practices/): Effectively bringing machine learning to production is one of the biggest challenges that data science teams today struggle with. MLOps is the solution.
- [Git-based CI / CD for Machine Learning & MLOps](https://www.iguazio.com/blog/git-based-ci-cd-for-machine-learning-mlops/): ML teams should be able to achieve MLOps by using their preferred frameworks, platforms, and languages to experiment, build & train their models.
- [Iguazio Releases v2.8 with Automated Pipeline Management, Monitoring](https://www.iguazio.com/blog/iguazio-releases-data-science-platform-version-2-8/): Iguazio Releases Data Science Platform Version 2.8, includes Enterprise-Grade Automated Pipeline Management, Model Monitoring & Drift Detection
- [5 Incredible Data Science Solutions For Real-World Problems](https://www.iguazio.com/blog/5-incredible-data-science-solutions-for-real-world-problems/): Data science has come a long way. Here are five of the incredible business and community challenges that data science has managed to solve.
- [Concept Drift and the Impact of COVID-19 on Data Science](https://www.iguazio.com/blog/concept-drift-and-the-impact-of-covid-19-on-data-science/): Data science needs to quickly adapt to the fast-paced changes happening all over the world. This is where the true value and impact of MLOps lies.
- [AI, ML and ROI – Why your balance sheet cares about your technology choices](https://www.iguazio.com/blog/ai-ml-and-roi-why-your-balance-sheet-cares-about-your-technology-choices/): As businesses continue to evolve it’s become imperative to include AI & ML in their strategic plans in order to remain competitive
- [How GPUaaS On Kubeflow Can Boost Your Productivity](https://www.iguazio.com/blog/how-gpuaas-on-kubeflow-can-boost-your-productivity/): Using GPUaaS in this way simplifies and automates data science, boosting productivity and significantly reducing time to market.
- [MLOps Challenges, Solutions and Future Trends](https://www.iguazio.com/blog/mlops-challenges-solutions-future-trends/): Summary of my MLOps NYC talk, major AI/ML & Data challenges and how they will be solved with emerging open source technologies
- [Top Trends for Data Science in 2020](https://www.iguazio.com/blog/data-science-trends-2020/): 2020 will be about simplifying the way from data science to production, with an emphasis on bringing real – and scalable - business value.
- [SUSE and Iguazio Offer Open Source Solution for Data Science Teams](https://www.iguazio.com/blog/suse-iguazio/): The notions of collaborative innovation, openness and portability are driving enterprises to embrace open source technologies.
- [Iguazio + NVIDIA EGX: Unleash Data Intensive Processing at the Intelligent Edge](https://www.iguazio.com/blog/iguazio-nvidia-edge/): Discover Iguazio's Intelligent Edge, powered by NVIDIA EGX, which enables data and compute intensive processing with seamless usability.
- [MLOps NYC Panel: Recorded Sessions](https://www.iguazio.com/blog/mlops-nyc-sessions/): Iguazio's MLOps NYC conference brought together top-tier speakers, discussing how their companies tackle machine learning pipeline automation.
- [Modernize IT Monitoring by Combining Time Series Databases, Machine Learning](https://www.iguazio.com/blog/modernize-it-infrastructure/): Explore the complexity of IT infrastructure and how to build a modern IT infrastructure monitoring solution,combining time series data with ML.
- [Python Pandas at Extreme Performance](https://www.iguazio.com/blog/python-pandas-performance/): What if you could write simple code in Python and run it faster than using Spark, without requiring any re-coding, and without devops overhead?
- [Why is it So Hard to Integrate Machine Learning into Real Business Applications?](https://www.iguazio.com/blog/machine-learning-hard/): You’ve played around with ML and now you feel ready to bring all this to real world impact. It’s time to build some real AI-based applications.
- [Automating Machine Learning Pipelines on Azure and Azure Stack](https://www.iguazio.com/blog/automating-ml-pipelines-on-azure-and-azure-stack/): Iguazio’s partnership with Microsoft creates new possibilities for Azure and Azure Stack customers to develop end-to-end ML-based apps.
- [Horovod for Deep Learning on a GPU Cluster](https://www.iguazio.com/blog/horovod-for-deep-learning-on-a-gpu-cluster/): The main benefits in Horovod are the minimal modification required to run code and the speed in which it enables jobs to run.
- [Data Science in the Post Hadoop Era](https://www.iguazio.com/blog/data-science-post-hadoop/): With all the turmoil surrounding large Hadoop distributors, many wonder what’s happening to the data framework we’ve all been working on for years?
- [Paving the Data Science Dirt Road](https://www.iguazio.com/blog/paving-the-data-science-dirt-road/): Organizations that unleash the potential of data, rapidly and at scale, have a tremendous advantage in a world in which data drives competitive value.
- [Kubernetes, The Open and Scalable Approach to ML Pipelines](https://www.iguazio.com/blog/kubernetes-the-open-scalable-approach-to-ml-pipelines/): It’s okay if you’re a hobbyist, but data science models are meant to be incorporated into real business applications.
- [Serverless: Can It Simplify Data Science Projects?](https://www.iguazio.com/blog/serverless-can-it-simplify-data-science-projects/): Simplify data science development and accelerate time to production by adopting a serverless architecture for collection, exploration, training and serving.
- [Operationalizing Data Science](https://www.iguazio.com/blog/operationalizing-data-science/): Imagine a system where one can easily develop an ML model, click on a magic button and run the code in production without any heavy lifting.
- [Intelligent Cloud-to-Edge Solution with Google Cloud](https://www.iguazio.com/blog/intelligent-edge-iguazio-google/): Iguazio’s Intelligent Cloud-to-Edge solution with Google Cloud addresses the challenges of various industries including leading retail companies like Trax
- [Will Kubernetes Sink the Hadoop Ship?](https://www.iguazio.com/blog/will-kubernetes-sink-the-hadoop-ship/): Explore the potential impacts of Kubernetes on Hadoop’s future in the landscape of data processing and cloud orchestration.
- [Wrapping Up Serverless NYC 2018](https://www.iguazio.com/blog/wrapping-up-serverless-nyc-2018/): Recap the insights and discussions from the Serverless NYC 2018 event focused on innovations in serverless technologies.
- [Can Open-Source Serverless Be Simpler than Lambda?](https://www.iguazio.com/blog/can-open-source-serverless-be-simpler-than-lambda/): Investigate how open-source serverless solutions like Nuclio simplify deployment and scalability compared to traditional options.
- [Big Data Must Begin with a Clean Slate](https://www.iguazio.com/blog/big-data-must-begin-with-clean-slate/): More than a decade has passed since we coined the term “big data,” and a decade in the tech world is almost infinity. Is big data now obsolete?
- [CNCF Webinar on Serverless and AI](https://www.iguazio.com/blog/cncf-webinar-serverless-ai/): iguazio's Yaron Haviv and Microsoft's Tomer Rosenthal provide an overview of serverless architectures and the efforts to encourage collaboration.
- [In 2018, Can Cloud, Big Data and AI Stand More Turmoil?](https://www.iguazio.com/blog/2018-can-cloud-big-data-ai-stand-turmoil/): We will see several trends emerge in 2018, and their key focus will be on making new technology easy and consumable.
- [Tutorial: Faster AI Development with Serverless](https://www.iguazio.com/blog/faster-ai-development-serverless/): Serverless platforms such as nuclio help test, develop and productize AI faster.
- [Cloud Native Storage: A Primer](https://www.iguazio.com/blog/cloud-native-storage-primer/): We recently debated at a technical forum what cloud native storage is, which led me to believe that this topic deserves a deeper discussion and more clarity.
- [NYC Meetup: How to Go Serverless to Enable Faster and Simpler Analytics](https://www.iguazio.com/blog/nyc-meetup-jan2018/): Watch this video of Yaron Haviv talking about using serverless in Big Data and AI, at a Meetup as part of the NYC Database Month series.
- [AWS re:Invent is about Data, Serverless, and AI](https://www.iguazio.com/blog/aws-reinvent-data-serverless-ai/): AWS re:Invent is all about how managed data services, serverless, and AI work together to enable new business applications.
- [The Future of Serverless Computing](https://www.iguazio.com/blog/nuclio-future-serverless-computing/): Serverless computing allows developers to focus on building and running auto-scaling applications without worrying about managing servers.
- [VMworld 2017: VMware Feeds Off OpenStack Decay](https://www.iguazio.com/blog/iguazio-rvmworld-2017-vmware-feeds-off-openstack-decay/): Explore how VMware capitalized on OpenStack's decline at VMworld 2017, reshaping the cloud landscape and enterprise strategies.
- [iguazio Raises $33M to Accelerate Digital Transformation](https://www.iguazio.com/blog/iguazio-raises-33m-accelerate-digital-transformation/): Today we announced a $33M investment from top VCs - Verizon Ventures, Robert Bosch Venture Capital, CME Ventures and Dell Technologies Capital.
- [IT Vendors Don't Stand a Chance Against the Cloud](https://www.iguazio.com/blog/it-vendors-dont-stand-a-chance-against-the-cloud/): Last week I sat in on an AWS event in Tel Aviv. I didn’t hear a single word about infrastructure or IT and nothing about VMs or storage, either.
- [Using Containers As Mini-VMs is NOT Cloud-Native!](https://www.iguazio.com/blog/using-containers-as-mini-vms-is-not-cloud-native/): Examine why utilizing containers as mini-VMs doesn’t align with genuine cloud-native principles and practices.
- [Continuous Analytics: Real-time Meets Cloud-Native](https://www.iguazio.com/blog/continuous-analytics-real-time-meets-cloud-native/): Traditional big data solutions involve building complex data pipelines that have separate stages for collecting and preparing data, ingestion, analytics, etc.
- [AWS S3 Outage Signals We MUST Decentralize Cloud](https://www.iguazio.com/blog/aws-s3-outage-signals-we-must-decentralize-cloud/): A significant portion of the internet was down due to an Amazon Web Services’ S3 outage and why have we created such a dependency on services like AWS?
- [Serverless: Background, Challenges and Future](https://www.iguazio.com/blog/serverless-background-challenges-and-future/): Serverless computing is the latest buzz, driven by the digital economy’s demand for instant results without the hassle. Learn more here.
- [2017 Predictions: Clouds, Thunder and Fog](https://www.iguazio.com/blog/2017-predictions-clouds-thunder-and-fog/): Reflect on 2017’s IT trends in cloud computing, predicting future shifts towards digital and hybrid cloud environments.
- [Did Amazon Just Kill Open Source?](https://www.iguazio.com/blog/did-amazon-just-kill-open-source/): AWS announced many more products, all fully integrated and simple to use and if you thought infrastructure companies are its competition, think again.
- [iguazio Collaborates with Equinix to Offer Data-Centric Hybrid Cloud Solutions](https://www.iguazio.com/blog/iguazio-collaborates-with-equinix-to-offer-data-centric-hybrid-cloud-solutions/): The platform also accelerates analytics processing for real-time insights, while maintaining strict security and governance requirements.
- [VMware on AWS: A Scorecard for Winners and Losers](https://www.iguazio.com/blog/vmware-on-aws-a-scorecard-for-winners-and-losers/): Amazon and VMware announced a partnership which will enable VMware software to run on a dedicated space within Amazon cloud.
- [Streamlined IoT at Scale with iguazio](https://www.iguazio.com/blog/streamlined-iot-at-scale-with-iguazio/): We built an end to end IoT application for monitoring and controlling smart connected cars to demonstrate the magnitude of innovation.
- [Cloud Data Services Sprawl … it’s Complicated](https://www.iguazio.com/blog/cloud-data-services-sprawl-its-complicated/): You’d buy storage from your vendor of choice, add a database on top and use it for all your workloads. Learn more here.
- [The Next Gen Digital Transformation: Cloud-Native Data Platforms](https://www.iguazio.com/blog/the-next-gen-digital-transformation-cloud-native-data-platforms/): The software world is rapidly transitioning to agile development using micro-services and cloud-native architectures. Read about it more here.
- [It’s Time for Reinventing Data Services](https://www.iguazio.com/blog/reinventing-data-services/): Everything around those stacks changed from the ground up — including new storage media, distributed computing, NoSQL, and the cloud.
- [DC/OS Enables Data Center “App Stores”](https://www.iguazio.com/blog/dcos-apps/): This is the case today with cloud native Apps deployed in Amazon, they use managed data services like S3, DynamoDB, Kinesis, RedShift, etc.
- [Re-Structure Ahead in Big Data & Spark](https://www.iguazio.com/blog/re-structure-in-big-data/): Big Data has evolved since – The need for Real-Time performance, Data Governance and Higher efficiency is forcing back some structure and context.
- [Wanted! A Storage Stack at the speed of NVMe & 3D XPoint](https://www.iguazio.com/blog/wanted-a-faster-storage-stack/): Major changes are happening in storage media hardware – Intel announced a 100X faster storage media, way faster than the current software stack.
- [Cloud-Native Will Shake Up Enterprise Storage!](https://www.iguazio.com/blog/cloud-native-will-shake-up-enterprise-storage/): If you are about to deploy a micro-services and agile IT architecture don’t be tempted to reuse your existing IT practices. Learn how hore.
- [Architecting BigData for Real Time Analytics](https://www.iguazio.com/blog/realtime-bigdata/): BigData is quite new, yet when we examine the common solutions and deployment practices it seems like we are going backwards in time.
## News
- [Iguazio CTO: Successful AI depends on data AND trust](https://www.iguazio.com/news-events/news/iguazio-cto-successful-ai-depends-on-data-and-trust/): "Building the right technologies for data pipelines and data storage, building the automation for how to move from development to...
- [Iguazio Named to the Constellation ShortList™ for MLOps Q1 2025](https://www.iguazio.com/news-events/news/iguazio-named-to-the-constellation-shortlist-for-mlops-q1-2025/): Iguazio has been named to the Constellation ShortList for MLOps for Q1 2025—marking this our third consecutive year on the...
- [An Architect’s Guide to the Top 10 Tools Needed to Build the Modern Data Lake](https://www.iguazio.com/news-events/news/an-architects-guide-to-the-top-10-tools-needed-to-build-the-modern-data-lake/): Iguazio and MLRun are listed in this top-10 list article of vendors and tools needed to build the modern data...
- [McKinsey offering aims to bridge the gap from AI prototypes to production](https://www.iguazio.com/news-events/news/mckinsey-offering-aims-to-bridge-the-gap-from-ai-prototypes-to-production-2/): McKinsey & Co. today announced an expanded artificial intelligence platform that offers enterprises a consolidated, software-based approach to building, implementing,...
- [2024 Top Performer MLOps Platform](https://www.iguazio.com/news-events/news/2024-top-performer-mlops-platform/): We are please to announce that Iguazio has been placed as a 2024 Top MLOps Platform Performer by FeaturedCustomers in...
- [AiThority Interview with Asaf Somekh, Co-Founder & CEO of Iguazio (acquired by McKinsey)](https://www.iguazio.com/news-events/news/aithority-interview-with-asaf-somekh-co-founder-ceo-of-iguazio-acquired-by-mckinsey/): "Open-source models will play a key role in the ecosystem... if built right, can achieve a high level of accuracy...
- [5 Best End-to-End Open Source MLOps Tools](https://www.iguazio.com/news-events/news/5-best-end-to-end-open-source-mlops-tools/): KDnuggets has recognized MLRun, Iguazio's open-source AI orchestration framework, as one of the top five end-to-end open-source MLOps tools.
- [The Architect’s Guide to the GenAI Tech Stack — 10 Tools](https://www.iguazio.com/news-events/news/mckinsey-offering-aims-to-bridge-the-gap-from-ai-prototypes-to-production/): MLRUN, Iguazio's open-source AI orchestration framework, listed as top 10 capabilities that can be found in the modern data lake...
- [McKinsey accelerates gen AI value creation with Iguazio](https://www.iguazio.com/news-events/news/mckinsey-accelerates-gen-ai-value-creation-with-iguazio/): McKinsey’s Iguazio platform offers enterprises a consolidated, software-based approach to build, implement, scale, and govern gen AI solutions. Working on...
- [Iguazio named to Constellation's ShortList™ MLOps – Feb 2024](https://www.iguazio.com/news-events/news/iguazio-named-to-constellations-shortlist-mlops-feb-2024/): Iguazio is proud to be listed amongst the top leaders in MLOps by Constellation Research, Inc. for the third year...
- [Iguazio named to the CB Insights LLMOps (Large Language Model Operations) Market Map](https://www.iguazio.com/news-events/news/iguazio-named-to-the-cb-insights-llmops-large-language-model-operations-market-map/): CB Insights is a leader in tech market intelligence that produces research on innovation and new technologies across industries. The...
- [McKinsey & Company receives the MongoDB 2024 Transformation Partner Award for its work with Iguazio](https://www.iguazio.com/news-events/news/mckinsey-company-receives-the-mongodb-2024-transformation-partner-award-for-its-work-with-iguazio/): "McKinsey’s transformational work with MongoDB on their Iguazio (Acquired by McKinsey) MLOps Platform makes them a fitting winner for this...
- [The 21 Best Artificial Intelligence Platforms Of 2024](https://www.iguazio.com/news-events/news/the-21-best-artificial-intelligence-platforms-of-2024/): "Why I Picked Iguazio: I like the platform's methodology for accelerating MLOps through various mechanisms. For example, its integrated feature...
- [Iguazio Listed in Constellation's ShortList™ MLOps](https://www.iguazio.com/news-events/news/iguazio-listed-in-constellations-shortlist-mlops/): Iguazio is proud to be listed amongst the top leaders in MLOps by Constellation Research, Inc. for the third year...
- [The (un)real world of Generative AI](https://www.iguazio.com/news-events/news/the-unreal-world-of-generative-ai/): "Gen AI will change the way we all work, change roles, create new roles and while it will make some...
- [Musk, AI, And 'Civilizational Destruction: Prophecy or Product Launch?](https://www.iguazio.com/news-events/news/musk-ai-and-civilizational-destruction-prophecy-or-product-launch/): Asaf Somekh observes that “We and our clients understand the moral responsibility that comes with AI. The key question is...
- [McKinsey acquires Iguazio, a leader in AI and machine-learning technology](https://www.iguazio.com/news-events/news/iguazio-acquired-by-mckinsey/): Global management consulting firm McKinsey & Company announced on Monday that it has acquired Iguazio. With the addition of Iguazio’s...
- [Iguazio Named a Major Player in the IDC MLOps MarketScape 2022](https://www.iguazio.com/news-events/news/iguazio-named-a-major-player-in-the-idc-mlops-marketscape-2022/): With Iguazio, Sense powers a wide range of AI products aimed at increasing the efficiency and scalability of their AI operations.
- [Iguazio Named a Leader and Outperformer In GigaOm Radar for MLOps 2022](https://www.iguazio.com/news-events/news/iguazio-named-a-leader-and-outperformer-in-gigaom-radar-for-mlops-2022/): With Iguazio, Sense powers a wide range of AI products aimed at increasing the efficiency and scalability of their AI operations.
- [Iguazio Named in 8 Gartner Hype Cycles for 2022](https://www.iguazio.com/news-events/news/iguazio-named-in-8-gartner-hype-cycles-for-2022/): With Iguazio, Sense powers a wide range of AI products aimed at increasing the efficiency and scalability of their AI operations.
- [Sense Selects Iguazio for AI Chatbot Automation ](https://www.iguazio.com/news-events/news/sense-selects-iguazio-for-ai-chatbot-automation/): With Iguazio, Sense powers a wide range of AI products aimed at increasing the efficiency and scalability of their AI operations.
- [Iguazio Partners with Snowflake to Automate and Accelerate MLOps](https://www.iguazio.com/news-events/news/iguazio-partners-with-snowflake-to-automate-and-accelerate-mlops/): The Iguazio Platform and built-in Feature Store now offers connectivity to the Snowflake Data Cloud, providing a full solution for...
- [Gartner 2022 Market Guide for DSML Engineering Platforms](https://www.iguazio.com/news-events/news/gartner-2022-market-guide-for-dsml-engineering-platforms/): Gartner’s new Market Guide for Data Science and Machine Learning Engineering Platform is a thorough deep dive into the best...
- [Iguazio named in The Coolest Data Science And Machine Learning Tool Companies Of The 2022 Big Data 100](https://www.iguazio.com/news-events/news/the-coolest-data-science-and-machine-learning-tool-companies-of-the-2022-big-data-100/): CRN’s Big Data 100 includes a look at the vendors solution providers should know in the data science and machine...
- [Iguazio named in Forrester's Now Tech: AI/ML Platforms, Q1 2022](https://www.iguazio.com/news-events/news/iguazio-named-in-forresters-now-tech-ai-ml-platforms-q1-2022/): We are delighted to share that Iguazio has been named along with Microsoft, Databricks, Cloudera, Alteryx and others in Now...
- [All That Hype: Iguazio Listed in 5 Gartner Hype Cycles for 2021](https://www.iguazio.com/news-events/news/all-that-hype-iguazio-listed-in-5-gartner-hype-cycles-for-2021/): We are proud to announce that Iguazio has been named a sample vendor in five 2021 Gartner Hype Cycles, including...
- [Iguazio Named a Fast-Moving Leader by Gigaom in the Radar for MLOps Report](https://www.iguazio.com/news-events/news/iguazio-named-a-fast-moving-leader-by-gigaom-in-the-radar-for-mlops-report-2/):
- [Iguazio Partners with Pure Storage to Operationalize AI for Enterprises](https://www.iguazio.com/news-events/news/iguazio-announces-mlops-for-good-virtual-hackathon-2/): "Together, Iguazio and Pure Storage empower enterprises to continuously roll out new AI services by adopting a production-first mindset using...
- [Iguazio MLOps Platform Launches in AWS Marketplace](https://www.iguazio.com/news-events/news/iguazio-mlops-platform-launches-in-aws-marketplace/): "“We believe all enterprises deserve a faster and easier path to rolling out new AI services,” commented Asaf Somekh, Co-Founder...
- [MLOps: The Latest Shift in the AI Market in Israel](https://www.iguazio.com/news-events/news/mlops-the-latest-shift-in-the-ai-market-in-israel/): "The speed at which technological breakthroughs are occurring has no precedent in previous periods of transformation. The adoption of AI...
- [Iguazio Announces ‘MLOps for Good’ Virtual Hackathon](https://www.iguazio.com/news-events/news/iguazio-announces-mlops-for-good-virtual-hackathon/): Iguazio announced its first ever global virtual hackathon, which is starting today and will take place until June 29th, 2021....
- [Boston Limited and Iguazio Partner to Operationalize AI for the Enterprise](https://www.iguazio.com/news-events/news/boston-limited-and-iguazio-partner-to-operationalize-ai-for-the-enterprise/): Iguazio, the data science & MLOps platform build for production, announced a strategic partnership with Boston Limited, an NVIDIA Elite...
- [Iguazio Named A Fast Moving Leader by GigaOm in the ‘Radar for MLOps’ Report](https://www.iguazio.com/news-events/news/iguazio-named-a-fast-moving-leader-by-gigaom-in-the-radar-for-mlops-report/): We’re proud to share that the Iguazio MLOps Platform has been named a fast moving leader in the GigaOm Radar...
- [The Coolest Data Science And Machine Learning Tool Companies Of The 2021 Big Data 100](https://www.iguazio.com/news-events/news/the-coolest-data-science-and-machine-learning-tool-companies-of-the-2021-big-data-100/): "Iguazio launched an integrated feature store within its Data Science Platform to accelerate the deployment of AI applications across hybrid-...
- [Iguazio Named Leader and Fast Mover in GigaOm Radar for Evaluating Machine Learning Operations (MLOps)](https://www.iguazio.com/news-events/news/gigaom-radar-for-evaluating-machine-learning-operations-mlops/): "We review the MLOps sector and the offerings of 14 vendors that aim to help customers automate, govern, and monitor...
- [The Next-Level of Operationalizing Machine Learning: Real-time Data Streaming into Data Science Environments](https://www.iguazio.com/news-events/news/the-next-level-of-operationalizing-machine-learning-real-time-data-streaming-into-data-science-environments/): Applying data science models to streaming data in real-time delivers several advantages. It can supercharge the performance and accuracy of...
- [Iguazio Receives an Honorable Mention in the 2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms](https://www.iguazio.com/news-events/news/iguazio-receives-an-honorable-mention-in-the-2021-magic-quadrant-for-data-science-and-machine-learning-platforms/): We’re proud to share that Iguazio has received an honorable mention in the Gartner Magic Quadrant for Data Science and...
- [The AI Infrastructure Alliance Launches With 25 Members to Create the Canonical Stack for Artificial Intelligence Projects](https://www.iguazio.com/news-events/news/the-ai-infrastructure-alliance-launches-with-25-members-to-create-the-canonical-stack-for-artificial-intelligence-projects/): "Today, the AI Infrastructure Alliance (AIIA), officially launched with the mission to create a robust collaboration environment for companies and...
- [Git-based CI/CD for Machine Learning and MLOps](https://www.iguazio.com/news-events/news/git-based-ci-cd-for-machine-learning-and-mlops/): ML engineers aiming to truly automate ML pipelines need a way to natively enable continuous integration of machine learning models...
- [Sheba, Iguazio to Develop Real-Time AI to Optimise Patient Care](https://www.iguazio.com/news-events/news/sheba-iguazio-to-develop-real-time-ai-to-optimise-patient-care/): Iguazio was selected to assist the digital transformation of Sheba with real-time AI and machine learning operations (MLOps) in several...
- [Iguazio Signs Strategic Agreement with Sheba Medical Center for Real-Time Covid-19 Treatment](https://www.iguazio.com/news-events/news/iguazio-signs-strategic-agreement-with-sheba-medical-center-for-real-time-covid-19-treatment/): "We are honored to be supporting Sheba... incorporating AI into these many real-time use cases is setting a new standard...
- [Iguazio Launches the First Integrated Feature Store Within its Data Science Platform](https://www.iguazio.com/news-events/news/iguazio-launches-the-first-integrated-feature-store-within-its-data-science-platform/): “Using Iguazio, we are revolutionizing the way we use data, by unifying real-time and historic data from different sources and...
- [The 12 Coolest Machine-Learning Startups Of 2020](https://www.iguazio.com/news-events/news/the-12-coolest-machine-learning-startups-of-2020/): "The 12 coolest machine-learning startups include companies that are developing leading-edge technology for building machine-learning models and collecting and managing...
- [An AI Engineer Walks Into A Data Shop...](https://www.iguazio.com/news-events/news/an-ai-engineer-walks-into-a-data-shop/): "These are data science software systems with extremely complicated convoluted structures... We have, perhaps unsurprisingly, gone beyond the point where...
- [Why You Need to Start Thinking About MLOps](https://www.iguazio.com/news-events/news/why-you-need-to-start-thinking-about-mlops/): Companies like Amazon and Uber achieved dominance by leveraging AI/ML as a core competency. In the vast majority of companies...
- [SFL Scientific And Iguazio Partner To Speed Up Custom AI Development For Fortune 1000 Companies](https://www.iguazio.com/news-events/news/sfl-scientific-and-iguazio-partner-to-speed-up-custom-ai-development-for-fortune-1000-companies/): Top tier consultancy partners with a leading data science platform company to simplify and expedite development and deployment of AI...
- [SFL Scientific and Iguazio Partner to Speed Up Custom AI Development for Fortune 1000 companies](https://www.iguazio.com/news-events/news/sfl-scientific-and-iguazio-partner-to-speed-up-custom-ai-development-for-fortune-1000-companies-2/): Top tier consultancy partners with a leading data science platform company to simplify and expedite development and deployment of AI...
- [NetApp Deploys Iguazio to Run AI-Driven Digital Advisor on Active IQ](https://www.iguazio.com/news-events/news/netapp-deploys-iguazio-to-run-ai-driven-digital-advisor-on-active-iq/): Iguazio announced a new strategic customer, NetApp, which is using its platform to analyze 10 trillion data points per month,...
- [Iguazio Becomes Certified for NVIDIA DGX-Ready Software Program](https://www.iguazio.com/news-events/news/iguazio-becomes-certified-for-nvidia-dgx-ready-software-program/): Iguazio's Data Science Platform is now part of the NVIDIA DGX-Ready Software program.
- [Iguazio & NetApp Partner to Accelerate Deployment of AI](https://www.iguazio.com/news-events/news/iguazio-netapp-partner-to-accelerate-deployment-of-ai/): NetApp and Iguazio provide a simple, end-to-end solution for deploying AI at scale and in real-time on top of the...
- [NetApp, Iguazio Build Joint Tech To Accelerate AI Deployments](https://www.iguazio.com/news-events/news/netapp-iguazio-build-joint-tech-to-accelerate-ai-deployments/): Iguazio is joining NetApp's flash storage and cloud and AI capabilities to shorten the pipeline between data storage and data...
- [The Coolest Data Science And Machine Learning Tool Companies Of The 2020 Big Data 100](https://www.iguazio.com/news-events/news/the-coolest-data-science-and-machine-learning-tool-companies-of-the-2020-big-data-100/): Iguazio featured in CRN’s 2020 Big Data 100 and highlighted as a vendor solution providers need to know in the...
- [Enabling end-to-end machine learning workflows with Iguazio](https://www.iguazio.com/news-events/news/enabling-end-to-end-machine-learning-workflows-with-iguazio-2/): Streamline machine learning processes using Iguazio's comprehensive end-to-end workflow solutions for seamless efficiency and innovation.
- [Iguazio Receives an Honorable Mention in the Gartner MQ for Data Science and ML Platforms](https://www.iguazio.com/news-events/news/gartner-gives-iguazio-an-honorable-mention-in-its-2020-magic-quadrant-for-data-science-and-machine-learning-platforms/): Iguazio Receives an Honorable Mention in the Gartner MQ for Data Science and ML Platforms. Read more here.
- [Iguazio raises $24 million for AI development and management](https://www.iguazio.com/news-events/news/iguazio-raises-24-million-for-ai-development-and-management-tools/): Iguazio's latest investment brings its total funding to $72M. Iguazio’s data science platform automates machine learning pipelines...
- [Dell Technologies Introduces New Solutions with Iguazio](https://www.iguazio.com/news-events/news/dell-technologies-introduces-new-solutions-to-advance-high-performance-computing-and-ai-innovation/): To further simplify AI deployments, Dell Technologies is introducing a new reference architecture for optimizing Dell EMC technologies with Iguazio.
- [PICSIX Partners with Iguazio](https://www.iguazio.com/news-events/news/picsix-launches-an-investigative-intelligence-platform-powered-by-iguazios-real-time-data-science-platform/): PICSIX uses Iguazio to provide an AI-based platform addressing the ever-changing threats to homeland security and public safety
- [The Rise of MLOps: What We Can All Learn from DevOps](https://www.iguazio.com/news-events/news/the-rise-of-mlops-what-we-can-all-learn-from-devops/): One of the most interesting technologies shared during the conference was Iguazio’s Nuclio. Orit Nissan-Messing, Iguazio's VP R&D sat down...
- [Bringing AI and Machine Learning to the Masses](https://www.iguazio.com/news-events/news/bringing-ai-and-machine-learning-to-the-masses/): “You want data scientists and analysts to spend as much time as possible on the data science, rather than the...
- [Takeaway from MLOps NYC: Open Source Frameworks Need TLC](https://www.iguazio.com/news-events/news/takeaway-from-mlops-nyc-open-source-frameworks-need-tlc/): A panel at last week’s MLOps NYC conference, discussed best practices for multiplatform MLOps with Kubeflow and MLflow that might...
- [Top 10 IoT Startups Of 2019](https://www.iguazio.com/news-events/news/top-10-iot-startups-of-2019/): Iguazio is a hot startup that provides a state-of-the-art data science platform for various verticals, including Industrial IoT, Smart Mobility...
- [Hitting the Reset Button on Hadoop](https://www.iguazio.com/news-events/news/hitting-the-reset-button-on-hadoop/): For Somekh, Hadoop still has value and the software is good. But instead of positioning Hadoop as the solution for...
- [CEO Q&A: Modern Platforms for Data Science](https://www.iguazio.com/news-events/news/ceo-qa-modern-platforms-for-data-science/): We see that enterprises looking to implement data science and AI in business applications struggle with complex, siloed data pipelines...
- [Iguazio Brings Its Data Science Platform to Azure and Azure Stack](https://www.iguazio.com/news-events/news/iguazio-brings-its-data-science-platform-to-azure-and-azure-stack/): Given that Azure and Azure Stack are essentially the same platform, as far as the APIs are concerned, Iguazio can...
- [Samsung SDS Backs Data Company Iguazio](https://www.iguazio.com/news-events/news/samsung-sds-backs-data-company-iguazio/): Herzliya-based Iguazio offers data management services and artificial intelligence tools designed to improve the performance, security, and scalability of machine...
- [Q&A with Iguazio: on Data Science, Data Analytics, and Serverless](https://www.iguazio.com/news-events/news/qa-with-iguazio-on-data-science-data-analytics-and-serverless/): "In reality, data scientists spend very little of their time on actual datascience. Instead, they dedicate too much time to...
- [How Serverless Platforms Could Power an Event-Driven AI Pipeline](https://www.iguazio.com/news-events/news/how-serverless-platforms-could-power-an-event-driven-ai-pipeline/): “With Iguazio, we are then able to provide a distributed application and database layer that can treat data very fast,...
- [Removing Data Blockage at the Edge](https://www.iguazio.com/news-events/news/removing-data-blockage-at-the-edge/): “Gartner released an amazing number in that 85 percent of such projects are failing. The main reasons for the failures...
- [The 10 Coolest New Open-Source Technologies And Tools Of 2018](https://www.iguazio.com/news-events/news/the-10-coolest-new-open-source-technologies-and-tools-of-2018/): Serverless computing is taking the industry by storm. While the red-hot paradigm originated in the public cloud, Nuclio is among...
- [Google Cloud collaborating with Iguazio to enable real-time AI across the cloud and intelligent edge](https://www.iguazio.com/news-events/news/google-cloud-collaborating-with-iguazio-to-enable-real-time-ai-across-the-cloud-and-intelligent-edge/): Iguazio, the serverless platform for intelligent applications, today announced it is partnering with Google Cloud to enable real-time AI across...
- [Don’t get cloudwashed: The case for cloud on-prem in hybrid computing](https://www.iguazio.com/news-events/news/dont-get-cloudwashed-the-case-for-cloud-on-prem-in-hybrid-computing/): Buyers of any product promising on-prem cloud had better thoroughly scan the ingredients list. The technologies that enable true cloud...
- [Even in the cloud, banking is tied to legacy tech](https://www.iguazio.com/news-events/news/even-in-the-cloud-banking-is-tied-to-legacy-tech/): Companies such as Iguazio show what’s possible for banks with serverless. The company is working with financial institutions to deliver...
- [Other Vendors to Consider for Operational DBMSs](https://www.iguazio.com/news-events/news/other-vendors-to-consider-for-operational-dbmss/): Many vendors did not qualify for the 2018 Magic Quadrant for OPDBMSs. This report identifies 15 such vendors that may...
- [Iguazio’s New Nuclio Release Enables Serverless Agility for Enterprises Deploying Real-time Intelligent Applications](https://www.iguazio.com/news-events/news/iguazios-new-nuclio-release-enables-serverless-agility-for-enterprises-deploying-real-time-intelligent-applications/): Leading open source serverless framework now includes capabilities that enable faster end-to-end enterprise and IoT deployments with reduced operational complexity
- [SD Times Blog: Getting a serverless reality check](https://www.iguazio.com/news-events/news/sd-times-blog-getting-a-serverless-reality-check/): Orit Nissan-Messing, chief architect and co-founder of Iguazio, found that while a lot of businesses are trying to move toward...
- [Iguazio revamps its Nuclio serverless computing platform](https://www.iguazio.com/news-events/news/iguazio-revamps-its-nuclio-serverless-computing-platform/): Data analytics company Iguazio Systems Ltd. today unveiled a major update to its Nuclio serverless framework for multicloud, on-premises and...
- [SD Times news digest: Iguazio’s Nuclio release, Kotlin 1.3, and reCAPTCHA v3](https://www.iguazio.com/news-events/news/sd-times-news-digest-iguazios-nuclio-release-kotlin-1-3-and-recaptcha-v3/): Continuous data platform Iguazio has announced a new version of Nuclio, an integrated multi-cloud, on-premise, and edge serverless platform.
- [Iguazio Selected for CNBC's Upstart 100 List of Promising Startups](https://www.iguazio.com/news-events/news/iguazio-selected-for-cnbcs-upstart-100-list-of-promising-startups/): Iguazio simplifies the development and deployment of high-volume, real-time, AI applications across clouds, edge and on-premises environments.
- [Instead of sending data to the cloud, why not send the cloud to the edge?](https://www.iguazio.com/news-events/news/instead-of-sending-data-to-the-cloud-why-not-send-the-cloud-to-the-edge/): "In telecommunications, edge computing serves as a vital tool for monitoring and predicting network health in real time. This can...
- [Is cloud native starting to kill Hadoop? This CTO says yes](https://www.iguazio.com/news-events/news/is-cloud-native-starting-to-kill-hadoop-this-cto-says-yes/): "If I can take the traditional tools people are now evolving in and using, like Jupyter Notebooks, Spark, Tensorflow ......
- [This startup thinks it knows how to speed up real-time analytics on tons of data](https://www.iguazio.com/news-events/news/this-startup-thinks-it-knows-how-to-speed-up-real-time-analytics-on-tons-of-data/): "Essentially, it's providing a very high-performance data engine for injection, cross-correlation enrichment, AI and analysis and then serving it all...
- [Equinix and Iguazio partner to drive smart mobility](https://www.iguazio.com/news-events/news/equinix-and-iguazio-partner-to-drive-smart-mobility/): “The ever-growing ride-hailing landscape in Southeast Asia deals with constant new vulnerabilities such as fraud and analytical accuracy,” said Jiffry...
- [Enabling smart transportation in today's interconnected world](https://www.iguazio.com/news-events/news/enabling-smart-transportation-in-todays-interconnected-world/): One such case is Iguazio, which digitally transforms businesses by streamlining data volumes for real-time, intelligent applications, to optimise performance...
- [Iguazio’s Nuclio Serverless Software Aims to Outrun AWS](https://www.iguazio.com/news-events/news/iguazios-nuclio-serverless-software-aims-to-outrun-aws/): Yaron Haviv, Chief Technology Officer and co-founder of Iguazio joins us for this episode of The New Stack Makers podcast.
- [The Car's Eyes and Ears at TC TLV 2018](https://www.iguazio.com/news-events/news/the-cars-eyes-and-ears-at-tc-tlv-2018/): Iguazio's Chief Architect, Orit Nissan-Messing spoke on this TechCrunch panel in Tel Aviv last week.
- [Enterprise startups in Israel worth getting to know](https://www.iguazio.com/news-events/news/enterprise-startups-in-israel-worth-getting-to-know/): Iguazio is a real time continuous analytics platform that runs as a Platform as a Service (PaaS).
- [Bigger than Linux: The rise of cloud native](https://www.iguazio.com/news-events/news/bigger-than-linux-the-rise-of-cloud-native/): Omri Harel, senior software developer at Iguazio, the company behind an open source serverless framework called Nuclio, told us that...
- [2018 Big Data 100: 45 Coolest Data Management And Integration Vendors](https://www.iguazio.com/news-events/news/2018-big-data-100-45-coolest-data-management-and-integration-vendors/): Earlier this year the company launched its inaugural channel program in a bid to globally recruit VAR, systems integrator and...
- [Serverless computing takes a big step into the multicloud world](https://www.iguazio.com/news-events/news/serverless-computing-takes-a-big-step-into-the-multicloud-world/): Iguazio is at the forefront of vendors providing flexible serverless fabrics that can be deployed into private, public or hybrid...
- [Serverless framework Nuclio released for enterprise customers](https://www.iguazio.com/news-events/news/serverless-framework-nuclio-released-for-enterprise-customers/): Nuclio, built by Iguazio, can be used in the cloud or on-premises, though the company has worked with Microsoft to...
- [This Israeli Startup Partners With Amazon -- But Could Compete With AWS In $8B Edge Cloud Market](https://www.iguazio.com/news-events/news/this-israeli-startup-partners-with-amazon-but-could-compete-with-aws-in-8b-edge-cloud-market/): Iguazio offers a service that enables companies to get the right data quickly so they can make better decisions.
- [Can Open-Source Serverless Be Simpler than AWS Lambda?](https://www.iguazio.com/news-events/news/can-open-source-serverless-be-simpler-than-aws-lambda/): I will prove my point with a common function use-case: watching a native cloud provider service (AWS S3) and acting...
- [Asaf Somekh, Founder & CEO at iguazio, talks delivering dreams](https://www.iguazio.com/news-events/news/asaf-somekh-founder-ceo-at-iguazio-talks-delivering-dreams/): In this podcast for Enterprise Management 360, Asaf Somekh talks about redesigning data stack to support real-time analytics, complimenting Amazon...
- [To reach its full promise, big data must begin with a clean slate](https://www.iguazio.com/news-events/news/to-reach-its-full-promise-big-data-must-begin-with-a-clean-slate/): Focus on a model which makes continuous use of data to improve the business bottom line, because data at rest...
- [Big Data Startup iguazio Debuts Its First Channel Program, Seeks VARs and SIs with Vertical Industry Expertise](https://www.iguazio.com/news-events/news/big-data-startup-iguazio-debuts-its-first-channel-program-seeks-vars-and-sis-with-vertical-industry-expertise/): "It's basically scale-up time for the business and the best way to scale is through partners and the channel," said...
- [Asia Pacific expansion on the horizon for iguazio with opening of Singapore headquarters](https://www.iguazio.com/news-events/news/asia-pacific-expansion-on-the-horizon-for-iguazio-with-opening-of-singapore-headquarters/): "We plan to keep growing our eco-system of channels by partnering with more system integrators and resellers. Additionally, we place...
- [iguazio Boldly Taunts AWS’ Lambda with nuclio Serverless Platform](https://www.iguazio.com/news-events/news/iguazio-boldly-taunts-aws-lambda-with-nuclio-serverless-platform/): iguazio CTO Yaron Haviv said nuclio can process up to 400,000 events per second, compared to just 2,000 events per...
- [Surge pricing: How it works and how to avoid it](https://www.iguazio.com/news-events/news/surge-pricing-how-it-works-and-how-to-avoid-it/): Ride-hailing apps like Uber and Grab price journeys higher when demand spikes. But how does surge pricing work and what...
- [Serverless Framework for Real-Time Apps Emerges](https://www.iguazio.com/news-events/news/serverless-framework-for-real-time-apps-emerges/): Dubbed nuclio and written in the Go programming language, iguazio CTO Yaron Haviv says this serverless computing framework is unique...
- [AI depends on having the right data for real-time decision-making](https://www.iguazio.com/news-events/news/ai-depends-on-having-the-right-data-for-real-time-decision-making/): Real-time actionable insights are enabling new solutions in telecommunications, financial services, and transportation, leading to smarter business decisions, better understanding...
- [With an eye on Asia, Israeli startup iguazio counts IPO in its roadmap](https://www.iguazio.com/news-events/news/with-an-eye-on-asia-israeli-startup-iguazio-counts-ipo-in-its-roadmap/): iguazio has recently opened its Asia Pacific headquarter in Singapore and kicked off a penetration phase for Asia’s big data...
- [iguazio releases high-speed serverless platform to open source](https://www.iguazio.com/news-events/news/iguazio-releases-high-speed-serverless-platform-to-open-source/): There are multiple cloud-based serverless platforms out there, but “none was developed by infrastructure people,” Haviv said. “nuclio is extremely...
- [Tutorial: Faster AI Development with Serverless](https://www.iguazio.com/news-events/news/tutorial-faster-ai-development-with-serverless/): nuclio’s stand-alone version can be deployed with a single Docker command on a laptop, making it simpler to play with...
- [This tech firm offers a big data solution for businesses](https://www.iguazio.com/news-events/news/this-tech-firm-offers-a-big-data-solution-for-businesses/): Asaf Somekh, CEO of iguazio, says his firm offers a secure and effective solution for companies struggling to set up...
- [2 lessons cloud native companies have for enterprise leaders](https://www.iguazio.com/news-events/news/2-lessons-cloud-native-companies-have-for-enterprise-leaders/): “Data is everything, and we are laser-focused on collecting all of the data we can to make the most optimized...
- [nuclio and the Future of Serverless Computing](https://www.iguazio.com/news-events/news/nuclio-and-the-future-of-serverless-computing/): Meet nuclio - a new advanced and high-performance open source serverless framework which takes usability and applicability to the next...
- [Actionable Insights: Obliterating BI, Data Warehousing as We Know It](https://www.iguazio.com/news-events/news/actionable-insights-obliterating-bi-data-warehousing-as-we-know-it/): The time is ripe for re-architecting analytics to maximize the value of machine learning and real-time streaming, drive actionable insights,...
- [Iguazio, the Anti-Hadoop, Goes GA](https://www.iguazio.com/news-events/news/iguazio-the-anti-hadoop-goes-ga/): “We have yet to see anything that compares directly to iguazio’s combination of data analytics and cloud architecture. " -...
- [Yaron Haviv, iguazio's CTO on theCube](https://www.iguazio.com/news-events/news/yaron-haviv-iguazios-cto-on-thecube/): Yaron Haviv, iguazio's CTO sits down with theCube at Strata NYC and explains how modern data architectures simplify the development...
- [As its data cloud launches, iguazio nabs Grab as a marquee customer](https://www.iguazio.com/news-events/news/as-its-data-cloud-launches-iguazio-nabs-grab-as-a-marquee-customer/): iguazio Systems Ltd. announced today that Singapore ride-hailing giant Grab will use its Unified Data Platform. At the same time,...
- [Entrepreneur Spotlight: Asaf Somekh, Founder & CEO of iguazio](https://www.iguazio.com/news-events/news/entrepreneur-spotlight-asaf-somekh-founder-ceo-of-iguazio/): Fresh from a Series B funding round of $33 million, Herzliya-based start-up iguazio now plans to go big. MarketBrains talks...
- [AI&ML tech talk: iguazio](https://www.iguazio.com/news-events/news/aiml-tech-talk-iguazio/): Fresh from a Series B funding round of $33 million, Herzliya-based start-up iguazio now plans to go big. MarketBrains talks...
- [Reimagining the Data Pipeline Paradigm as a Continuous Data Insights Platform](https://www.iguazio.com/news-events/news/reimagining-the-data-pipeline-paradigm-as-a-continuous-data-insights-platform/): Fresh from a Series B funding round of $33 million, Herzliya-based start-up iguazio now plans to go big. MarketBrains talks...
- [Verizon, CME Group, Bosch Invest In Continuous Data Analytics Start-Up Iguazio](https://www.iguazio.com/news-events/news/verizon-cme-group-bosch-invest-in-continuous-data-analytics-start-up-iguazio/): Iguazio raises $33 million in the second round from Verizon (NYSE: VZ), CME Group (NASDAQ: CME) and Bosch. Existing investors...
- [Extend Kubernetes 1.7 with Custom Resources](https://www.iguazio.com/news-events/news/extend-kubernetes-1-7-with-custom-resources/): Yaron Haviv explains how to customize Kubernetes by plugging in your own managed object and application as if it were...
- [Bosch: Investing In Data-Driven Innovation](https://www.iguazio.com/news-events/news/bosch-investing-in-data-driven-innovation/): RBVC’s most recent investment is iguazio, a data platform and analytics vendor who has rebuilt the data stack from the...
- [Iguazio nabs $33M to bring big data edge analytics to IoT, finance and other enterprises](https://www.iguazio.com/news-events/news/iguazio-nabs-33m-to-bring-big-data-edge-analytics-to-iot-finance-and-other-enterprises/): Big data analytics — where vast troves of information are structured and used to help businesses gain more insights into...
- [Robert Bosch Venture Capital invests in iguazio](https://www.iguazio.com/news-events/news/robert-bosch-venture-capital-invests-in-iguazio/): Robert Bosch Venture Capital GmbH (RBVC), the corporate venture capital company of the Bosch Group, has completed an investment in...
- [Data analytics startup Iguazio reaps $33m in second funding round](https://www.iguazio.com/news-events/news/data-analytics-startup-iguazio-reaps-33m-in-second-funding-round/): Data analytics startup Iguazio has raised $33m in a B-round. That takes total funding for the three-year-old Israeli firm to...
- [Israeli Startup Iguazio Attracts $33M Series B](https://www.iguazio.com/news-events/news/israeli-startup-iguazio-attracts-33m-series-b/): Analytics firm Iguazio scored $33 million in a Series B round from investor firm Pitango Venture Capital, with additional funds...
- [CRN: The 10 Coolest Big Data Startups Of 2017 (So Far)](https://www.iguazio.com/news-events/news/crn-the-10-coolest-big-data-startups-of-2017-so-far/): Case in point: big data ecosystems, where numerous overlapping implementations rarely share components or use common APIs and layers. They...
- [Opinion: We’ll Be Enslaved to Proprietary Clouds Unless We Collaborate](https://www.iguazio.com/news-events/news/opinion-well-be-enslaved-to-proprietary-clouds-unless-we-collaborate/): Case in point: big data ecosystems, where numerous overlapping implementations rarely share components or use common APIs and layers. They...
- [Opinion: It’s time open source focused on usability](https://www.iguazio.com/news-events/news/opinion-its-time-open-source-focused-on-usability/): Open source is slouching towards individualization as every new framework or open source architecture has its own particular API, layers,...
- [And the 2017 Cool Vendors Are…](https://www.iguazio.com/news-events/news/and-the-2017-cool-vendors-are/): iguazio (Cool Vendors in Data Management) solves a pervasive problem of unifying different data workloads and data types — records,...
- [Podcast: Intel and iguazio Processing with Continuous Analytics](https://www.iguazio.com/news-events/news/podcast-intel-and-iguazio-processing-with-continuous-analytics/): In this episode of Conversations in the Cloud, Yaron Haviv, Founder and CTO at iguazio discusses cloud native serverless processing...
- [To get the most from containers, go cloud-native or go home](https://www.iguazio.com/news-events/news/to-get-the-most-from-containers-go-cloud-native-or-go-home/): Sentiment around containers often sounds like: “Oh, here’s my lightweight VM that happens to be called a Docker container, and...
- [iguazio Re-Architects the Stack for Continuous Analytics](https://www.iguazio.com/news-events/news/iguazio-re-architects-the-stack-for-continuous-analytics/): According to iguazio co-founder and CTO Yaron Haviv, the platform can deliver 2 million transactions per second, with an average...
- [iguazio highlights continuous analytics use cases for converged data services platform](https://www.iguazio.com/news-events/news/iguazio-highlights-continuous-analytics-use-cases-for-converged-data-services-platform/): We continue to see opportunities for iguazio driven by its ability to reduce the complexity of deploying multiple data processing...
- [Opinion: Managing Data on the Edge](https://www.iguazio.com/news-events/news/opinion-managing-data-on-the-edge/): We need a new type of edge, one that is capable of processing data in real-time, rather than simply caching...
- [Cool Company: iguazio](https://www.iguazio.com/news-events/news/cool-company-iguazio/): iguazio’s unified solution simplifies management and maintenance for customers who might otherwise need to deploy multiple data platforms and services...
- [Strata: Cloudera, MapR and others focus on consolidating the sprawl](https://www.iguazio.com/news-events/news/strata-cloudera-mapr-and-others-focus-on-consolidating-the-sprawl/): Cloudera, MapR, Pentaho and iguazio have announcements around data science, edge computing and continuous data applications.
- [iguazio speeds up big data delivery with continuous analytics platform](https://www.iguazio.com/news-events/news/iguazio-speeds-up-big-data-delivery-with-continuous-analytics-platform/): iguazio’s latest solution employs what the company terms a “continuous data consumption” model – ingesting, enriching, analyzing and serving up...
- [Exclusive Interview with Asaf Somekh, CEO and Co-Founder, iguazio](https://www.iguazio.com/news-events/news/exclusive-interview-with-asaf-somekh-ceo-and-co-founder-iguazio/): Data stack design to accelerate performance in big data, IoT and cloud-native apps.
- [A Hacker’s Guide to Kubernetes Networking](https://www.iguazio.com/news-events/news/a-hackers-guide-to-kubernetes-networking/): Yaron Haviv explains how iguazio uses Kubernetes and the Container Networking Interface with some hacking tricks.
- [iguazio takes its performance-boosting data platform global with help from Equinix](https://www.iguazio.com/news-events/news/iguazio-takes-its-performance-boosting-data-platform-global-with-help-from-equinix/): The alliance will see the infrastructure giant make its global network of co-location centers available for organizations looking to deploy...
- [iguazio and Equinix join forces to deliver a new Data-Centric Processing platform](https://www.iguazio.com/news-events/news/iguazio-and-equinix-join-forces-to-deliver-a-new-data-centric-processing-platform/): iguazio continues its market penetration with a ecosystem announcement during the AWS re:invent conference in Las Vegas.
- [iguazio: Made from Kia parts but faster than a Ferrari with 1,000 drivers](https://www.iguazio.com/news-events/news/iguazio-made-from-kia-parts-but-faster-than-a-ferrari-with-1000-drivers/): Get yourself a virtual wetsuit and let yourself be drenched by a demo and presentation by Iguazio’s waterfall wizards. You...
- [The Hadooponomics Podcast – Building Big Data, Better: Why Integration, Not Infrastructure, Is Key](https://www.iguazio.com/news-events/news/the-hadooponomics-podcast-building-big-data-better-why-integration-not-infrastructure-is-key/): We are excited to welcome Yaron Haviv on the show. Yaron is an entrepreneur and thought leader in storage, networking,...
- [Organizations Look for Simplicity, Affordability in Data Lakes](https://www.iguazio.com/news-events/news/organizations-look-for-simplicity-affordability-in-data-lakes/): The biggest challenge in our view is that of secure data sharing, which is also differential. What we mean here...
- [Fast Enterprise Data Cloud Platform by iguazio](https://www.iguazio.com/news-events/news/fast-enterprise-data-cloud-platform-by-iguazio/): Storing up to 10PB per rack, costs starting at $0. 03/GB/month
- [Disruptive Technology, Monotonous Marketing At Strata+Hadoop World](https://www.iguazio.com/news-events/news/disruptive-technology-monotonous-marketing-at-stratahadoop-world/): iguazio’s disruption: most database technologies assume storage is too slow to keep up with queries, so they build in layers...
- [New products & solutions shaping the enterprise IT landscape](https://www.iguazio.com/news-events/news/new-products-solutions-shaping-the-enterprise-it-landscape/): We survey today’s emerging enterprise IT technology by speaking with six vendors about their newest products and solutions.
- [iguazio’s CTO, Yaron Haviv on theCUBE](https://www.iguazio.com/news-events/news/iguazios-cto-yaron-haviv-on-thecube/): Rethinking the cloud platform with an integrated, turnkey solution
- [iguazio launches Enterprise Data Cloud service to speed Big Data](https://www.iguazio.com/news-events/news/iguazio-launches-enterprise-data-cloud-service-to-speed-big-data/): iguazio today is introducing what it calls a data platform-as-a-service, with an aim to untangle the mess of technologies that...
- [Rethinking the cloud platform with an integrated, turnkey solution](https://www.iguazio.com/news-events/news/rethinking-the-cloud-platform-with-an-integrated-turnkey-solution/): Yaron Haviv, founder and CTO at iguazio, joined Dave Vellante and Peter Burris, cohosts of theCUBE, from the SiliconANGLE Media...
- [iguaz.io Unveils Virtualized Data Services Architecture](https://www.iguazio.com/news-events/news/iguaz-io-unveils-virtualized-data-services-architecture/): iguaz.io Unveils Virtualized Data Services Architecture. Read more here about our news and book a demo today.
- [Iguaz.io promises AWS-like storage in the data center](https://www.iguazio.com/news-events/news/iguaz-io-promises-aws-like-storage-in-the-data-center/): Newcomer iguaz. io is the latest software startup that will try and deliver the Holy Grail of storage — the...
- [iguaz.io Unveils Virtualized Data Services Architecture](https://www.iguazio.com/news-events/news/iguaz-io-unveils-virtualized-data-services-architecture-2/): This new architecture makes data services and big data tools consumable for mainstream enterprises that have been unable to harness...
- [Data services startup Iguaz.io aims to untangle Big Data hairball](https://www.iguazio.com/news-events/news/data-services-startup-iguaz-io-aims-to-untangle-big-data-hairball/): Big data processing platforms such as Hadoop and Spark enable many of today’s consumer and business applications, from Amazon and...
- [Harnessing data in real time | #SparkSummit](https://www.iguazio.com/news-events/news/harnessing-data-in-real-time-sparksummit/): Big data is all about harnessing high volumes of data, massive streams of information and powerful technological currents.
- [These are the coolest big data startups of 2015!](https://www.iguazio.com/news-events/news/these-are-the-coolest-big-data-startups-of-2015/): Big Data related industries are one of the fastest and coolest growing segments of IT world. And very recently, Wikibon...
- [These Are the Big Data Startups That Won 2015](https://www.iguazio.com/news-events/news/these-are-the-big-data-startups-that-won-2015/): If you think about just how much data is on the web right now, it’s not surprising that Big Data...
- [Israeli Stealthy Start-Up Iguaz.io Raises $15 Million in Series A](https://www.iguazio.com/news-events/news/israeli-stealthy-start-up-iguaz-io-raises-15-million-in-series-a/): Iguaz. io, a provider of data management and storage solutions for big data, IoT and cloud applications announced a $15...
- [Startup Iguaz.io is creating real-time Big Data analytics storage](https://www.iguazio.com/news-events/news/startup-iguaz-io-is-creating-real-time-big-data-analytics-storage/): One-year-old Iguaz. io, an Israeli Big Data startup, has just won a $15m A-round from Magma Venture Partners, JVP and...
- [Iguaz.Io raises $15 million in series a funding to disrupt big data storage](https://www.iguazio.com/news-events/news/iguaz-io-raises-15-million-in-series-a-funding-to-disrupt-big-data-storage/): Iguaz. io, a provider of innovative data management and storage solutions for Big Data, IoT and cloud applications today announced...
- [Data management start-up iguaz.io raises $15m](https://www.iguazio.com/news-events/news/data-management-start-up-iguaz-io-raises-15m/): Iguaz. io, a provider of innovative data management and storage solutions for Big Data, IoT and cloud applications, today announced...
## Events
- [MLOps Live #37 Building Agent Co-pilots for Proactive Call Centers](https://www.iguazio.com/events/mlops-live-37-building-agent-co-pilots-for-proactive-call-centers/): This webinar explores how GenAI-powered agent co-pilots can identify missed opportunities, deliver real-time personalized insights, and help agents close deals...
- [MLOps Live #36 How to Manage Thousands of Real-Time Models in Production](https://www.iguazio.com/events/mlops-live-36-how-to-manage-thousands-of-real-time-models-in-production/): We were thrilled to welcome back Seagate to share the progress they made with their ML pipeline & MLOps automation...
- [MWC 2025](https://www.iguazio.com/events/mwc-2025/): Our AI and gen AI experts will be on the ground at MWC 2025 and we'd love to meet you...
- [MLOps Live #35 - Beyond the Hype: Gen AI Trends and Scaling Strategies for 2025](https://www.iguazio.com/events/mlops-live-35-beyond-the-hype-gen-ai-trends-and-scaling-strategies-for-2025/): AI experts Svetlana Sicualar from Gartner and Yaron Haviv share strategies for scaling GenAI in 2025, tackling bottlenecks, trust, and...
- [MLOps Live #34 - Agentic AI Frameworks: Bridging Foundation Models and Business Impact](https://www.iguazio.com/events/mlops-live-34-agentic-ai-frameworks-bridging-foundation-models-and-business-impact/): Discover how conversational AI transforms customer experiences - watch to learn practical tips for building impactful AI agents.
- [MLOps Live #33 - Deploying Gen AI in Production with NVIDIA NIM & MLRun](https://www.iguazio.com/events/mlops-live-33-deploying-gen-ai-in-production-with-nvidia-nim-mlrun/): In this webinar, we explored how to successfully deploy your Gen AI applications while mitigating these challenges, using NVIDIA NIM...
- [MLOps Live #32 - Gen AI for Marketing - From Hype to Implementation](https://www.iguazio.com/events/mlops-live-32-gen-ai-for-marketing-from-hype-to-implementation/): In this session we were joined by a Modern Marketing Capabilities Leader at McKinsey, to delve into how data scientists...
- [MLOps Live #31 - Building Scalable Customer-Facing Gen AI Applications Effectively & Responsibly](https://www.iguazio.com/events/mlops-live-31/): In this session, we shared the Iguazio & MongoDB one-stop-shop solution for building gen AI applications that scale effectively and...
- [MLOPs Live #30 - Implementing Gen AI in Highly Regulated Environments](https://www.iguazio.com/events/mlops-live-30-implementing-gen-ai-in-highly-regulated-environmentsmlops-live-30/): Our CEO & CTO discussed implementing gen AI in highly regulated industries and innovative ways to mitigate them. We shared...
- [MLOps Live #29 - Transforming Enterprise Operations with Gen AI](https://www.iguazio.com/events/mlops-live-29-transforming-enterprise-operations-with-gen-ai/): In our webinar with McKinsey, we dove into the transformative impact of gen AI on enterprise operations, spotlighting advancements across...
- [MLOps Live #28 - Improving LLM Accuracy & Performance](https://www.iguazio.com/events/mlops-live-28-improving-llm-accuracy-performance/): Watch our session with Databricks to hear advice on improving the accuracy & performance of LLMs while mitigating challenges like...
- [MLOps Live #27 - LLM Validation & Evaluation](https://www.iguazio.com/events/mlops-live-27/): In this webinar we demonstrated how to effectively validate and evaluate your LLM, and dove into the pipeline to show...
- [MLOps Live #26 - Implementing a Gen AI Smart Call Center Analysis App](https://www.iguazio.com/events/mlops-live-26-implementing-a-gen-ai-smart-call-center-analysis-app/): In this session, Oana Cheta Partner & Lead Gen AI Service Ops NA at McKinsey & Company joined Yaron Haviv...
- [MLOps Live #25 - GenAI for Financial Services](https://www.iguazio.com/events/mlops-live-25-genai-for-financial-servicesmlops-live-25/): In this session we discussed how leading global Financial Service companies are using GenAI today to serve their clients better,...
- [MLOps Live #24 - How to Build an Automated AI ChatBot](https://www.iguazio.com/events/mlops-live-23-mlops-for-generative-ai/): In this session, Gennaro, Director of Data Science at Sense, share how he and his team have built and perfected...
- [ODSC West 2023](https://www.iguazio.com/events/odsc-west-2023/): Watch Yaron Haviv, our Co-Founder and CTO, present the MLOps Keynote at ODSC West 2023.
- [AI at Scale](https://www.iguazio.com/events/ai-at-scale/): We are excited to be speaking at AIIA's upcoming event! Join us as we dive deep into scaling AI, including...
- [NVIDIA GTC](https://www.iguazio.com/events/nvidia-gtc-2/): We are excited to be participating in NVIDIA GTC 2023. Yaron Haviv, Iguazio's Co-Founder and CTO, will be presenting as...
- [Future of AI 2022](https://www.iguazio.com/events/future-of-ai-2022/): We are excited to be sponsoring the Future of AI Conference 2022. Future of AI brings together the leading Israeli...
- [MLOps Live #22 How Seagate Handles Data Engineering at Scale](https://www.iguazio.com/events/mlops-live-21-hcis-journey-to-mlops-efficiency/): Join Vamsi and Yaron for a fascinating deep dive into Seagate's journey to MLOps efficiency. Hear Seagate’s story, watch a...
- [MLOps Live #20: How to Easily Deploy Your Hugging Face Model to Production at Scale](https://www.iguazio.com/events/mlops-live-20-how-to-easily-deploy-your-hugging-face-model-to-production-at-scale/): Join us for this technical session with Hugging Face, and learn from the experts!
- [AIIA: Data-Centric AI Summit](https://www.iguazio.com/events/aiia-data-centric-ai-summit/): In our session, we will demonstrate how to use Iguazio & Snowflake to create a simple, seamless, and automated path...
- [MLOps LATAM Micro-Summit (Hybrid)](https://www.iguazio.com/events/mlops-latam-micro-summit-hybrid/):
- [TMLS MLOps World: Conference on Machine Learning in Production 2022](https://www.iguazio.com/events/tmls-mlops-world-conference-on-machine-learning-in-production-2022/): We are proud to be Gold Sponsors of the 3rd annual MLOps World Summit Conference and Expo. MLOps World is...
- [Snowflake Summit](https://www.iguazio.com/events/snowflake-summit/): We are proud to be sponsoring the 2022 Snowflake Summit. The summit gathers top technical, data, and business experts to...
- [MLOps: Machine Learning in Production / New York City Summit](https://www.iguazio.com/events/mlops-machine-learning-in-production-new-york-city-summit/): We are excited to be a featured speaker for NYC's virtual ML in Production conference. This event gathers industry leaders...
- [ODSC Europe](https://www.iguazio.com/events/odsc-europe-london/): ODSC Europe combines immersive in-person sessions and hands-on training with innovative virtual ones. We hope to see you there!
- [ODSC East](https://www.iguazio.com/events/odsc-east-boston/): We are proud sponsors of the ODSC East Conference, and the keynote speaker of the Data Engineering and MLOps track.
- [Understanding Fraud Prediction for Banking & Finance Sector through Iguazio & Royal Cyber](https://www.iguazio.com/events/webinar-understanding-fraud-prediction-for-banking-finance-sector-through-iguazio-royal-cyber/): The webinar will focus on how banking and financial institutions can use cutting-edge machine learning and AI tools and technologies...
- [MDLI Ops Conference 2023](https://www.iguazio.com/events/mdli-ops-conference-2022/): We are proud to be speaking and sponsoring MDLI Ops 2023. This yearly conference brings together experts in the field...
- [ODSC Webinar: Git Based CI/CD for ML](https://www.iguazio.com/events/git-based-ci-cd-for-ml-a-complimentary-odsc-webinar/): In this live webinar we examine naive ML workflows and explore why they break down, showcase system design principles that...
- [Session #17: Scaling NLP Pipelines at IHS Markit](https://www.iguazio.com/events/session-17-scaling-nlp-pipelines-at-ihs-markit/): MLOps LIVE is back and better than ever! The data science team at IHS Markit will be sharing practical advice...
- [GTC](https://www.iguazio.com/events/gtc/): We are excited to be participating in GTC 2021. Join us as we accompany our partners on a panel: Mastering...
- [ODSC West](https://www.iguazio.com/events/odsc-west/): We are excited to be returning to ODSC! Mark your calendars for the joint Track Keynote with IHS Markit: "MLOps...
- [NetApp Insight 2021](https://www.iguazio.com/events/netapp-insight-201/): Pop by our booth to say hi!
- [Iguazio MLOps Platform Launches in AWS Marketplace](https://www.iguazio.com/events/iguazio-mlops-platform-launches-in-aws-marketplace/): Iguazio announced its availability in the AWS Marketplace, a digital catalog with thousands of software listings from independent software vendors...
- [Kubecon North America 2021](https://www.iguazio.com/events/kubecon-north-america-2021/): Stay tuned for more information.
- [MLOps: Machine Learning in Production New York City](https://www.iguazio.com/events/mlops-machine-learning-in-production-new-york-city/): We are proud to be sponsoring this years MLOps NYC Conference dedicated to propogating clearer understanding of best practices, methodologies,...
- ["Building a Real-Time ML Pipeline with a Feature Store" - MLOps Live Webinar #16](https://www.iguazio.com/events/mlops-live-webinar-series-session-16/): Save your spot for our upcoming MLOps webinar and learn about challenges associated with online feature engineering, how feature stores...
- [#MLOpsforGood Award Ceremony](https://www.iguazio.com/events/mlopsforgood-award-ceremony/): After six excellent weeks of hands-on hacking, 300 participants, and over 30 awesome projects, we invite you to the live...
- [MLOps in Finance Summit](https://www.iguazio.com/events/mlops-in-finance-summit/): Catch us in the upcoming MLOps in Finance Summit! Yaron Haviv, Iguazio's CTO, will be presenting on how to build...
- [MLOps for Good Hackathon](https://www.iguazio.com/events/mlops-for-good-hackathon/): Join us for the first-ever MLRun hackathon, to help bring data science to production for social Good. Iguazio will be...
- [MLOps World: Machine Learning in Production Conference](https://www.iguazio.com/events/mlops-world-machine-learning-in-production-conference/): Join Iguazio at the 2nd Annual MLOps World Conference on Machine Learning in Production. We invite you to attend our...
- [ODSC EUROPE 2021](https://www.iguazio.com/events/odsc-europe/): ODSC Europe Virtual Conference 2021 is one of the largest applied data science conferences and Iguazio will be there! Attend...
- [Expert Panel: AI for Connected Vehicles](https://www.iguazio.com/events/expert-panel-ai-for-connected-vehicles/): In this panel, experts from NetApp, NVIDIA, Strategy Analytics and Iguazio will discuss the main use cases, and also examine...
- [Webinar: "Activate Data: Data Science Innovation with MongoDB & Iguazio"](https://www.iguazio.com/events/mlops-live-webinar-15-activate-data-data-science-innovation-with-mongodb-iguazio/): The MLOps Live Webinar Series is a complimentary webcast where you will learn how to manage and automate machine learning...
- [KubeCon NA](https://www.iguazio.com/events/kubecon-na/): The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud native communities virtually.
- [Toronto Machine Learning Summit](https://www.iguazio.com/events/toronto-machine-learning-summit/): The Toronto Machine Learning Summit (TMLS) is a uniquely interactive experience with a community of over 9,000 active members that...
- [NetApp INSIGHT 2020](https://www.iguazio.com/events/netapp-insight-2020/): NetApp INSIGHT 2020 is a fully digital, totally immersive event which explores how NetApp can help customers unlock the best...
- [Data Science Salon | Applying AI & ML to Healthcare, Finance & Technology](https://www.iguazio.com/events/data-science-salon-media-advertising-entertainment/): The data science salon is a unique vertical focused conference which grew into a diverse community of senior data science,...
- [MLOps Series Library](https://www.iguazio.com/events/mlops-in-flip-flops/): Access all past MLOps Live webinars to learn all about MLOps from industry experts and thought leaders.
- [MLOps Live Webinar #15: 'Automated Model Management for CPG Trade Effectiveness with Tredence'](https://www.iguazio.com/events/mlops-live/): The MLOps Live Webinar Series is a complimentary webcast where you will learn how to manage and automate machine learning...
- [MLOps NYC](https://www.iguazio.com/events/mlops-nyc/): MLOps NYC 2019 gathered industry leaders from companies like Netflix, Google, Twitter and Uber to share their biggest MLOps challenges...
- [NVIDIA GTC: Deep Learning & AI Conference](https://www.iguazio.com/events/nvidia-gtc/): NVIDIA's GPU Technology Conference (GTC) is the must attend digital event for developers, researchers, engineers, and innovators looking to enhance...
- [KubeCon Europe](https://www.iguazio.com/events/london-in-february/): The Cloud Native Computing Foundation’s virtual conference will gather adopters and technologists from leading open source and cloud-native communities.
## PRs
- [McKinsey & Company Acquires Iguazio to Accelerate & Scale Enterprise AI](https://www.iguazio.com/news-events/pr/mckinsey-company-acquires-iguazio-to-accelerate-scale-enterprise-ai/): Acquisition will enable AI's full power and potential to be realized across commercial, social, and environmental initiatives.
- [Sense Selects Iguazio for AI Chatbot Automation with AWS, Snowflake and NVIDIA](https://www.iguazio.com/news-events/pr/sense-selects-iguazio-for-ai-chatbot-automationwith-aws-snowflake-and-nvidia/): Sense will use the Iguazio MLOps platform for a large range of AI products, beginning with the Sense AI Chatbot.
- [Iguazio Partners with Snowflake to Automate and Accelerate MLOps](https://www.iguazio.com/news-events/pr/iguazio-partners-with-snowflake-to-automate-and-accelerate-mlops/): Iguazio announced a new partnership with Snowflake, the Data Cloud company, which includes connectivity of Iguazio’s solution for automating ML...
- [LATAM Airlines Chooses Iguazio to Operationalize Machine Learning](https://www.iguazio.com/news-events/pr/latam-airlines-group-selects-iguazio-to-operationalize-machine-learning-as-part-of-its-post-pandemic-innovation-strategy/): Iguazio announced that LATAM – the leading airline group in Latin America – has selected its MLOps platform for a...
- [Iguazio Partners with Pure Storage to Operationalize AI Production-First](https://www.iguazio.com/news-events/pr/pure-storage-and-iguazio-form-strategic-partnership-to-operationalize-ai-for-enterprises-taking-a-production-first-approach/): Iguazio announced a strategic partnership with Pure Storage. The new partnership will empower enterprises to unlock the value of their...
- [Iguazio MLOps Platform Now Supports Amazon FSx for NetApp ONTAP](https://www.iguazio.com/news-events/pr/iguazio-mlops-platform-now-supports-amazon-fsx-for-netapp-ontap/): Iguazio announced its support for the new FSx for ONTAP. FSx for ONTAP provides fully managed shared file and...
- [Iguazio MLOps Platform Launches in AWS Marketplace](https://www.iguazio.com/news-events/pr/iguazio-mlops-platform-launches-in-aws-marketplace/): Iguazio announced its availability in the AWS Marketplace. This new availability provides AWS customers with access to Iguazio’s MLOps solution.
- [Boston Limited and Iguazio Partner to Operationalize AI for the Enterprise](https://www.iguazio.com/news-events/pr/boston-limited-and-iguazio-partner-to-operationalize-ai-for-the-enterprise/): Enabling both companies to extend their offerings to enterprises across industries looking to bring data science into real life applications.
- [Iguazio Announces First-Ever ‘MLOps for Good’ Virtual Hackathon](https://www.iguazio.com/news-events/pr/iguazio-announces-first-ever-mlops-for-good-virtual-hackathon/): With the mission to foster projects that can immediately impact real-world issues, Iguazio, partners Microsoft, MongoDB, and sponsor Aztek seek.
- [Iguazio Launches Integrated Feature Store to Accelerate AI Deployment](https://www.iguazio.com/news-events/pr/iguazio-launches-the-first-integrated-feature-store-within-its-data-science-platform-to-accelerate-deployment-of-ai-in-any-cloud-environment/): The first production-ready integrated solution for enterprises to catalog, store and share features centrally, and use them to manage AI applications.
- [Sheba Medical Center Partners with Iguazio for Real-Time COVID-19 AI](https://www.iguazio.com/news-events/pr/sheba-medical-center-inks-strategic-agreement-with-iguazio-to-deliver-real-time-ai-for-covid-19-patient-treatment-optimization/): Read more about Sheba Medical Center Inks Strategic Agreement with Iguazio to Deliver Real-Time AI for COVID-19.
- [Iguazio Achieves AWS Outposts Ready Status to Accelerate AI in Hybrid](https://www.iguazio.com/news-events/pr/iguazio-achieves-aws-outposts-ready-designation-to-help-enterprises-accelerate-ai-deployment-in-hybrid-environments/): The new seamless integration of Iguazio’s technology with AWS outposts allows customers to build ML pipelines in weeks instead of months.
- [Faktion and Iguazio Bring Data Science to Production for Smart Mobility](https://www.iguazio.com/news-events/pr/faktion-iguazio-bring-data-science-to-production-smart-mobility-customers/): The partnership enables both companies to provide AI infrastructure and services to smart mobility companies looking to harness big data.
- [PadSquad Deploys the Iguazio Data Science Platform to Predict Ad Performance in Real-Time](https://www.iguazio.com/news-events/pr/padsquad-deploys-the-iguazio-data-science-platform-to-predict-ad-performance-in-real-time/): Discover how PadSquad utilizes the Iguazio Data Science Platform to predict ad performance in real-time, enhancing marketing strategies.
- [SFL Scientific and Iguazio Partner to Accelerate Custom AI Development](https://www.iguazio.com/news-events/pr/sfl-scientific-and-iguazio-partner-to-speed-up-custom-ai-development-for-fortune-1000-companies/): Iguazio announces partnerships with SFL Scientific to simplify and expedite development and deployment of AI for top tier enterprises across...
- [NetApp Deploys Iguazio to Run AI-Driven Digital Advisor on Active IQ](https://www.iguazio.com/news-events/pr/netapp-deploys-iguazio-to-run-ai-driven-digital-advisor-on-active-iq-2/): NetApp deploys the Iguazio platform to boost the infrastructure behind its Active IQ solution, responding in real-time to 10 trillion data points.
- [Iguazio Becomes Certified for NVIDIA DGX-Ready Software Program](https://www.iguazio.com/news-events/pr/iguazio-becomes-certified-for-nvidia-dgx-ready-software-program/): Iguazio achieves NVIDIA DGX-Ready certification, validating its software for high-performance AI applications.
- [Iguazio and NetApp Collaborate to Accelerate Deployment of AI Applications](https://www.iguazio.com/news-events/pr/iguazio-and-netapp-collaborate-to-accelerate-deployment-of-ai-applications/): Partnership that provides enterprises with a simple, end-to-end solution for developing, deploying and managing AI applications.
- [Iguazio Deployed by Payoneer to Prevent Fraud with Real-time Machine Learning](https://www.iguazio.com/news-events/pr/iguazio-deployed-by-payoneer-to-prevent-fraud-with-real-time-machine-learning/): Payoneer uses Iguazio to move from detection to prevention of fraud prevention with predictive machine learning models served in real-time.
- [Iguazio Raises $24M to Accelerate Growth of Its Data Science Platform](https://www.iguazio.com/news-events/pr/iguazio-raises-24m-to-accelerate-growth-and-global-penetration-of-its-data-science-platform/): Iguazio Raises $24M to Accelerate Growth and Global Penetration of its Data Science Platform enabling a wide range of industries.
- [PICSIX Launches Investigative Intelligence Platform Powered by Iguazio](https://www.iguazio.com/news-events/pr/picsix-iguazio/): PICSIX uses Iguazio to provide an AI-based platform addressing the ever-changing threats to homeland security and public safety
- [Iguazio Expands Serverless To Scale-out Machine Learning and Analytics Workloads](https://www.iguazio.com/news-events/pr/iguazio-expands-serverless/): New serverless capabilities in Iguazio’s Platform enable on-demand resource consumption, elastic scaling, and simpler ML pipelines
- [MLOps NYC19 Conference to Promote the Standardization of Machine Learning Operations](https://www.iguazio.com/news-events/pr/mlops-nyc19-conference-to-promote-the-standardization-of-machine-learning-operations/): MLOps NYC19 will reflect the current state of machine learning operations with accomplished industry leaders sharing insights and experiences.
- [Iguazio to Operationalize Data Science and AI on Azure and Azure Stack](https://www.iguazio.com/news-events/pr/iguazio-to-operationalize-data-science-and-ai-on-azure-and-azure-stack/): Enabling Leading Enterprises to Bring Machine Learning into Business Applications and Remove AI Project Complexity
- [Iguazio’s Platform Scales NVIDIA GPU-Accelerated Deployments](https://www.iguazio.com/news-events/pr/iguazio-platform-scales-nvidia-gpu-accelerated-deployments/): Samsung SDS Integrates Serverless and Big Data to Automate Machine Learning Application Scaling and Productization
- [Samsung SDS Invests in Iguazio to Boost Cloud Services](https://www.iguazio.com/news-events/pr/samsung-sds-invests-in-iguazio-to-boost-cloud-services/): Samsung Adopts Iguazio’s Nuclio Serverless PaaS for Real-time Intelligent Applications
- [Iguazio Powers the Intelligent Edge for Smart Retail and IoT Solutions with Google Cloud](https://www.iguazio.com/news-events/pr/google-iguazio-intelligent-edge-retail/): Collaborating with Trax on a Kubernetes-powered hybrid cloud for real-time supply chain and intelligent operations
- [Iguazio's Nuclio Update Enables Serverless Agility for Real-Time Apps](https://www.iguazio.com/news-events/pr/iguazio-new-nuclio-release-enables-serverless-agility-enterprise-deploying-real-time-intelligent-applications/): Leading open source serverless framework now includes capabilities that enable faster end-to-end enterprise and IoT deployments.
- [Iguazio Hosts Serverless NYC: Enterprise Deployments and Case Studies](https://www.iguazio.com/news-events/pr/iguazio-hosts-serverless-nyc/): Iguazio Hosts Serverless NYC to Go Beyond the Hype, Presenting Enterprise Deployments and Real-Life Case Studies - find more here.
- [Equinix and Iguazio Collaborate to Drive Smart Mobility Vision](https://www.iguazio.com/news-events/pr/equinix-iguazio-collaborate-drive-smart-mobility-vision/): Hybrid cloud solution enables leading Asian ride-hailing applications to deliver real time and event-driven insights
- [Iguazio's Real-Time Serverless Framework Now Available for Enterprises](https://www.iguazio.com/news-events/pr/real-time-serverless-available-for-enterprise/): Nuclio enterprise edition gains momentum with leading cloud providers, as well as in on prem deployments and at the edge
- [PickMe Deploys Iguazio’s Platform for Real-Time Heatmaps, Fraud Detection](https://www.iguazio.com/news-events/pr/pickme-selects-iguazio/): Sri Lanka’s highest performing mobility app uses iguazio to maintain its competitive edge and operational efficiency
- [iguazio Featured in CRN’s 2018 Partner Program Guide](https://www.iguazio.com/news-events/pr/crn-partner-program-guide/): Annual Guide Recognizes the IT Channel’s Top Partner Programs
- [iguazio Extends Global Reach with New Channel Partner Program](https://www.iguazio.com/news-events/pr/iguazio-extends-global-reach-with-new-channel-partner-program/): Continuous Data Platform Now Available to More Enterprises Through Systems Integrators, VARs, OEMs
- [Unified Data Platform Provider iguazio Opens APAC Headquarters in Singapore](https://www.iguazio.com/news-events/pr/unified-data-platform-provider-apac-hq/): Demand for iguazio’s Hybrid Cloud and Edge Solutions Drives Continued Global Expansion opening of its APAC regional headquarters in Singapore
- [iguazio Debuts the nuclio Serverless Platform for Multi-Cloud and Edge Deployments](https://www.iguazio.com/news-events/pr/iguazio-debuts-serverless-platform/): nuclio expands iguazio’s data platform to provide a complete cloud experience that allows faster, flexible deployment in the cloud.
- [iguazio Announces General Availability of Its Unified Data Platform](https://www.iguazio.com/news-events/pr/iguazio-announces-general-availability-unified-data-platform/): Early customer adoption includes Grab, the Largest Ride-Hailing Service in Southeast Asia
- [Grab, Southeast Asia’s #1 Ride-Hailing Service, Selects iguazio’s Unified Data Platform](https://www.iguazio.com/news-events/pr/grab-south-east-asias-1-ride-hailing-service-selects-iguazios-unified-data-platform/): Largest Ride-Hailing Service in Southeast Asia uses Iguazio to Ingest, Enrich and Analyze Data for Continuous Analytics and more.
- [Iguazio Raises $33M in Series B as Leader in Real-Time Analytics, Edge Data](https://www.iguazio.com/news-events/pr/iguazio-leader-real-time-analytics-edge-data-platforms-raises-33m-series-b-funds/): New round includes strategic investors from financial services, IoT and service providers following successful early deployments
- [iguazio Selected as a Gartner Cool Vendor in Data Management, 2017](https://www.iguazio.com/news-events/pr/iguazio-selected-as-a-gartner-cool-vendor-in-data-management-2017/): Delivering continuous analytics and offering greater simplicity, performance, security and agility for next generation applications
- [iguazio Demos Industry’s First Integrated Real-time Continuous Analytics Solution](https://www.iguazio.com/news-events/pr/iguazio-demos-industrys-first-integrated-real-time-continuous-analytics-solution/): Complete re-thinking of the traditional data pipeline reduces time-to-insights from hours to seconds ream-time Continuous Analytics Solution
- [iguazio Collaborates with Equinix to Offer Data-Centric Hybrid Cloud Solutions](https://www.iguazio.com/news-events/pr/iguazio-collaborates-with-equinix-to-offer-data-centric-hybrid-cloud-solutions/): Placing governed data and analytics closer to their sources, while leveraging Amazon Web Services compute elasticity.
- [iguazio Announces the World’s Fastest, Simplest and Lowest-Cost Enterprise Data Cloud](https://www.iguazio.com/news-events/pr/iguazio-announces-the-worlds-fastest-simplest-and-lowest-cost-enterprise-data-cloud/): Delivers 100x faster performance, Simplest and 10x lower cost for on-premises and hybrid cloud deployments
- [Iguazio Unveils World’s First Virtualized Data Services Architecture](https://www.iguazio.com/news-events/pr/iguazio-unveils-worlds-first-virtualized-data-services-architecture/): Reveals Details of Extremely Efficient Architecture that Seamlessly Accelerates Spark and Hadoop, Busts Silos and Ends ETL
- [Iguaz.Io raises $15 million in series a funding to disrupt big data storage](https://www.iguazio.com/news-events/pr/iguaz-io-raises-15-million-in-series-a-funding-to-disrupt-big-data-storage/): Iguaz.io, a provider of innovative data management and storage solutions for Big Data, IoT and cloud applications announced funding round.
## Sessions
- [Building Agent Co-pilots for Proactive Call Centers](https://www.iguazio.com/sessions/building-agent-co-pilots-for-proactive-call-centers/):
- [Real-time Agent Co-pilot Demo](https://www.iguazio.com/sessions/real-time-agent-co-pilot-demo/):
- [How to Manage Thousands of Real-Time Models in Production](https://www.iguazio.com/sessions/how-to-manage-thousands-of-real-time-models-in-production/):
- [Beyond the Hype: Gen AI Trends and Scaling Strategies for 2025](https://www.iguazio.com/sessions/beyond-the-hype-gen-ai-trends-and-scaling-strategies-for-2025/):
- [Agentic AI Frameworks: Bridging Foundation Models and Business Impact](https://www.iguazio.com/sessions/agentic-ai-frameworks-bridging-foundation-models-and-business-impact/): Explore how Agentic AI frameworks can enhance decision-making and operational efficiency across industries.
- [Deploying Gen AI in Production with NVIDIA NIM & MLRun](https://www.iguazio.com/sessions/deploying-gen-ai-in-production-with-nvidia-nim-mlrun/): Learn to deploy Gen AI applications efficiently using NVIDIA NIM and MLRun for streamlined AI orchestration.
- [Gen AI for Marketing - From Hype to Implementation](https://www.iguazio.com/sessions/gen-ai-for-marketing-from-hype-to-implementation/): Transform your marketing strategy with Gen AI insights and actionable steps for effective implementation and risk management.
- [Building Scalable Customer-Facing Gen AI Applications Effectively & Responsibly](https://www.iguazio.com/sessions/building-scalable-customer-facing-gen-ai-applications-effectively-responsibly/): Learn how to Building Scalable Customer-Facing Gen AI Applications Effectively & Responsibly in our session.
- [Implementing Gen AI in Highly Regulated Environments](https://www.iguazio.com/sessions/implementing-gen-ai-in-highly-regulated-environments/): Discover unique challenges and solutions for deploying Gen AI in regulated industries to ensure compliance and performance.
- [Transforming Enterprise Operations with Gen AI](https://www.iguazio.com/sessions/transforming-enterprise-operations-with-gen-ai/): Explore how Gen AI enhances operations, focusing on real-world applications across manufacturing and supply chains.
- [Improving LLM Accuracy & Performance](https://www.iguazio.com/sessions/improving-llm-accuracy-performance/): Uncover best practices for improving LLM accuracy and performance while managing risk and costs effectively
- [LLM Validation & Evaluation](https://www.iguazio.com/sessions/llm-validation-evaluation/): Learn from our experts how to validate and evaluate LLMs effectively with automation strategies for improved accuracy.
- [Implementing a Gen AI Smart Call Center Analysis App](https://www.iguazio.com/sessions/implementing-a-gen-ai-smart-call-center-analysis-app/): Learn how to build a Gen AI-driven smart call center analysis app tailored for effective customer engagement.
- [GenAI for Financial Services](https://www.iguazio.com/sessions/genai-for-financial-services/): Generative AI has sparked the imagination with the explosion of tools, highlighting the importance of LLMs as the basis for modern AI.
- [Sheba Medical Center Improves Patient Outcomes and Experiences with AI](https://www.iguazio.com/sessions/sheba-medical-center-healthcare-ai/): Sheba Medical Center is driving a digital transformation by unifying real-time and historic data from different sources and more.
- [How to Build an Automated AI ChatBot](https://www.iguazio.com/sessions/how-to-build-an-automated-ai-chatbot/): Gennaro describe how he and his team have built and perfected this chatbot, what their ML pipeline looks like behind the scenes.
- [Demo: LLM Call Center Analysis with MLRun](https://www.iguazio.com/sessions/demo-llm-call-center-analysis-with-mlrun/): We showcase how to use LLMs to turn audio files of conversations between customers and agents at a call center into valuable data.
- [MLOps for Gen AI in the Enterprise](https://www.iguazio.com/sessions/mlops-for-gen-ai-the-live-webinar-series/): We will explore the effective integration of GenAI into real-time business applications, along with associated challenges & solutions.
- [MLOps for Generative AI](https://www.iguazio.com/sessions/mlops-for-generative-ai/): The influx of new tools spark the imagination and highlight the importance of Generative AI and foundation models as the basis for modern AI applications.
- [MLOps for LLMs](https://www.iguazio.com/sessions/mlops-for-llms/): Unlocking Generative AI's potential. Learn MLOps strategies for deploying and optimizing Hugging Face models in real business environments efficiently.
- [How Seagate Runs Advanced Manufacturing at Scale](https://www.iguazio.com/sessions/how-seagate-runs-advanced-manufacturing-at-scale/): Like most enterprises, Seagate was facing numerous challenges around AI and MLOps, at their hard disk advanced manufacturing sites.
- [HCI’s Journey to MLOps Efficiency](https://www.iguazio.com/sessions/hcis-journey-to-mlops-efficiency/): This is just what the data science team at HCI is doing. In this session, Jiri will be sharing enterprise secrets to establishing efficient systems for ML/AI.
- [How to Easily Deploy Your Hugging Face Model to Production at Scale](https://www.iguazio.com/sessions/how-to-easily-deploy-your-hugging-face-model-to-production-at-scale/): Seems like almost everyone uses Hugging Face to simplify and reuse advanced models and work collectively as a community.
- [Breaking AI Bottlenecks with Iguazio + Amazon FSx for NetApp ONTAP](https://www.iguazio.com/sessions/fsx/): Learn how to overcome AI bottlenecks using Iguazio and Amazon FSx for faster data processing and enhanced analytics.
- [From AutoML to AutoMLOps: Automated Logging & Tracking of ML](https://www.iguazio.com/sessions/automl-to-automlops-automated-logging-and-tracking-of-ml/): In this session, Yaron and Guy outline the challenges, describe open-source tools available for Auto-MLOps, and finish off with a live demo.
- [Best Practices for Succeeding with MLOps](https://www.iguazio.com/sessions/best-practices-for-succeeding-with-mlops/): Discover effective MLOps strategies and best practices to streamline your machine learning operations and enhance collaboration.
- [Simplifying Deployment of ML in Federated Cloud and Edge Environments](https://www.iguazio.com/sessions/simplifying-deployment-of-ml-in-federated-cloud-and-edge-environments-p/):
- [Predicting 1st Day Churn with Real-Time AI](https://www.iguazio.com/sessions/predicting-1st-day-churn-with-real-time-ai/): his On-Demand MLOps Live Webinar goes beyond theory, with industry leaders sharing challenges and practical solutions.
- [LATAM Customer Testimonial](https://www.iguazio.com/sessions/latam-customer-testimonial/): Learn how Latam Airlines Group uses Iguazio across to improve pilot training and fraud detection, creating a safer and faster way to operate their models.
- [Git Based CI/CD for ML](https://www.iguazio.com/sessions/git-based-cicd-for-ml/): In this session, Yaron Haviv, discusses how to enable continuous delivery of machine learning to production using Git-based ML pipelines.
- [Scaling NLP Pipelines at S&P Global (IHS Markit)](https://www.iguazio.com/sessions/scaling-nlp-pipelines-at-ihs-markit/): The data science team at S&P Global (IHS Markit) share practical advice on building sophisticated NLP pipelines that work at scale.
- [Building a Real-Time ML Pipeline with a Feature Store](https://www.iguazio.com/sessions/building-a-real-time-ml-pipeline-with-a-feature-store/): Join us to explore real-time ML pipelines using a feature store, enhancing your machine learning workflows with efficiency.
- [Automated Model Management for CPG Trade Effectiveness](https://www.iguazio.com/sessions/automated-model-management-for-cpg-trade-effectiveness/): Leading consumer packaged goods companies excel at executing large-scale trade promotions, but struggle in running precisely the right promotion at scale.
- [Automating & Governing AI Over Production Data on Azure](https://www.iguazio.com/sessions/automating-governing-ai-over-production-data-on-azure/): Many enterprises today face numerous challenges around handling data for AI/ML. They find themselves having to manually extract datasets.
- [How Feature Stores Accelerate & Simplify Deployment of AI to Production](https://www.iguazio.com/sessions/how-feature-stores-accelerate-simplify-deployment-of-ai-to-production/): In this live session we will discuss how feature stores can be used to accelerate AI deployment, show a demo and present a customer use case.
- [Simplifying Deployment of ML in Federated Cloud and Edge Environments](https://www.iguazio.com/sessions/simplifying-deployment-of-ml-in-federated-cloud-and-edge-environments/): Unlock AI adoption: Hybrid solutions for data challenges, simplified cloud-edge deployment. Learn practical insights in ML edge apps, cloud-edge harmony.
- [Handling Large Datasets in Data Preparation & ML Training Using MLOps](https://www.iguazio.com/sessions/handling-large-datasets-in-data-preparation-ml-training-using-mlops/): Learn how to use Dask, Kubernetes, and MLRun to scale your data prep and ML training with ease.
- [Quadient Customer Testimonial](https://www.iguazio.com/sessions/quadient-customer-testimonial/): Experience Quadient's customer journey firsthand with a testimonial at Iguazio. Find inspiration for your business transformation.
- [NetApp Customer Testimonial](https://www.iguazio.com/sessions/netapp-customer-testimonial/): Discover the powerful testimonial of NetApp at Iguazio! Unlock transformative insights and strategies today.
- [Siemens on the Importance of Data Storytelling in Shaping a Data Science Product](https://www.iguazio.com/sessions/siemens-on-the-importance-of-data-storytelling-in-shaping-a-data-science-product/): A deep dive into storytelling with data and how to make sure all your hard work on developing the right model.
- [NVIDIA on Industrializing Enterprise AI with the Right Platform](https://www.iguazio.com/sessions/nvidia-on-industrializing-enterprise-ai-with-the-right-platform/): Enterprises need a platform that brings together tools to streamline data science workflow and bring innovative concepts into production sooner.
- [NetApp's Michael Oglesby on Building ML Pipelines Over Federated Data](https://www.iguazio.com/sessions/building-ml-pipelines-over-federated-data-compute-environments/): Learn more about constructing ML pipelines across federated data and computing environments in our session.
- [Product Madness (an Aristocrat co.) on Predicting 1st-Day Churn in Real Time](https://www.iguazio.com/sessions/product-madness-an-aristocrat-co-on-predicting-1st-day-churn-in-real-time/): Hear from Product Madness about how technology and new work processes can help the gaming and mobile app industries.
- [Greg Hayes on Uniting Data Scientists, Engineers, and DevOps with MLOps](https://www.iguazio.com/sessions/breaking-the-silos-between-data-scientists-engineers-and-devops-with-new-mlops-practices/): Learn with Greg Hayes how Ecolab is accelerating the deployment of AI applications by using new MLOps methodologies.
- [NetApp’s Shankar Pasupathy on Building Scalable Predictive Maintenance](https://www.iguazio.com/sessions/how-to-build-a-predictive-maintenance-and-actionable-intelligence-solution-at-scale/): How built a solution for predictive maintenance and actionable intelligence that responds in real time to 10 trillion data points per month.
- [Microsoft & GitHub on Git-Based CI / CD for Machine Learning & MLOps](https://www.iguazio.com/sessions/git-based-ci-cd-for-machine-learning-mlops/): Learn about using Git to enable continuous delivery of ML to production, enable controlled collaboration across ML teams, and solve rigorous MLOps needs.
- [How to Deal With Concept Drift in Production with MLOps Automation](https://www.iguazio.com/sessions/how-to-deal-with-concept-drift-in-production-with-mlops-automation/): How to detect and handle problems that arise when models lose their accuracy and how to implement concept drift detection in production.
- [Quadient’s Jason Evans on Saving Time & Costs Bringing AI to Production](https://www.iguazio.com/sessions/quadients-jason-evans-on-saving-costs-bringing-ai-to-production/): Learn how industry leaders save costs and get to market faster by leveraging ML pipeline automation and open-source technology.
- [S&P Global’s Ganesh Nagarathnam on Bringing ML Pipelines to Production](https://www.iguazio.com/sessions/sp-globals-ganesh-nagarathnam-on-bringing-ml-pipelines-to-production/): Gain insights from S&P Global on operationalizing ML pipelines and aligning business needs with data-driven solutions.
## MLOps Terminologies
- [Arithmetic Intensity](https://www.iguazio.com/glossary/arithmetic-intensity/): A frontier model is a highly advanced, large-scale AI model that pushes the boundaries of AI in areas like NLP, image generation, video and coding.
- [Frontier model](https://www.iguazio.com/glossary/frontier-model/): A frontier model is a highly advanced, large-scale AI model that pushes the boundaries of AI in areas like NLP, image generation, video and coding.
- [Context Window](https://www.iguazio.com/glossary/context-window/): The context window for an LLM refers to the maximum number of tokens the model can process in a single interaction.
- [Excessive Agency](https://www.iguazio.com/glossary/excessive-agency/): Excessive agency in LLMs refers to AI making decisions beyond intent, posing risks like bias, misinformation, and loss of control.
- [Reasoning Engine](https://www.iguazio.com/glossary/reasoning-engine/): Learn how to use an AI reasoning engine and why, overcoming hallucinations, components, use cases & how it works.
- [LLM Orchestration](https://www.iguazio.com/glossary/llm-orchestration/): Learn what LLM orchestration is, its key components, and the benefits it offers for optimizing large language model workflows and performance.
- [AI Scalability](https://www.iguazio.com/glossary/ai-scalability/): Discover what AI scalability means, explore key best practices, and tackle common challenges to optimize AI performance and growth.
- [LLM Tracing](https://www.iguazio.com/glossary/llm-tracing/): LLM tracing is the practice of tracking and understanding the decision-making and thought processes within LLMs as they generate responses.
- [Human in the Loop](https://www.iguazio.com/glossary/human-in-the-loop/): "Human in the loop" (HITL) is the process that blends human intervention in an automated or semi-automated AI/ML system.
- [LLM Monitoring](https://www.iguazio.com/glossary/what-is-llm-monitoring/): LLM Monitoring is the set of practices and tools used to track, validate and maintain the performance, safety and quality of LLMs. Learn how.
- [Random Forest](https://www.iguazio.com/glossary/what-is-random-forest/): Understand random forest - An ensemble learning method that combines results from multiple decision trees during training.
- [Prompt Management](https://www.iguazio.com/glossary/what-is-prompt-management/): Understand prompt management - Optimizing and maintaining the prompts used to interact with LLMs for quality, consistency and scalability.
- [LLM Customization](https://www.iguazio.com/glossary/llm-customization/): Explore LLM customization, its importance & applications in fine-tuning, prompt engineering, RAG, RAFT, agents, and performance optimization
- [LLM as a Judge](https://www.iguazio.com/glossary/llm-as-a-judge/): "LLM as a judge" refers to the use of LLMs to evaluate content, responses, or performances, including the performance of other AI models.
- [Chain-of-Thought Prompting](https://www.iguazio.com/glossary/chain-of-thought-prompting/): Chain-of-Thought (CoT) Prompting encourages the model to break down the problem into a series of smaller, logical steps
- [LLM Embeddings](https://www.iguazio.com/glossary/llm-embeddings/): LLM embeddings are vector representations of words, phrases, or entire texts generated by language models. Discover how they work.
- [Gen AI App](https://www.iguazio.com/glossary/gen-ai-app/): Gen AI apps use generative AI technologies and leverage advanced ML models like LLMs, GANs and VAEs, to create or generate new content.
- [AI Infrastructure](https://www.iguazio.com/glossary/ai-infrastructure/): LLM temperature is a parameter that influences the language model’s output, determining whether the output is more creative or predictable.
- [Diffusion Models](https://www.iguazio.com/glossary/diffusion-models/): Diffusion models are powerful generative models that are able to produce high-quality, diverse samples of image, audio or text.
- [Generative Agents](https://www.iguazio.com/glossary/generative-agents/): Generative agents are software entities that use generative models to simulate and mimic human behavior and responses.
- [LLM Optimization](https://www.iguazio.com/glossary/llm-optimization/): LLM temperature is a parameter that influences the language model’s output, determining whether the output is more creative or predictable.
- [LLM Temperature](https://www.iguazio.com/glossary/llm-temperature/): LLM temperature is a parameter that influences the language model’s output, determining whether the output is more creative or predictable.
- [LLM Agents](https://www.iguazio.com/glossary/llm-agents/): An on-premise AI platform runs AI services and applications within the organization's physical environment, rather than on the cloud.
- [On-Premise AI Platform](https://www.iguazio.com/glossary/on-premise-ai-platform/): An on-premise AI platform runs AI services and applications within the organization's physical environment, rather than on the cloud.
- [False Positive Rate](https://www.iguazio.com/glossary/false-positive-rate/): Get insights into the false positive rate in machine learning, its implications, and how to assess model performance.
- [True Positive Rate](https://www.iguazio.com/glossary/true-positive-rate/): True positive rate (TPR) is a performance metric used to evaluate the effectiveness of binary classification models in machine learning
- [RLHF](https://www.iguazio.com/glossary/rlhf/): RLHF is an AI/ML model training approach that combines reward-based reinforcement learning methods and human-generated feedback.
- [Fine-Tuning LLMs](https://www.iguazio.com/glossary/fine-tuning/): Fine-tuning LLMs (Large Language Models) is the process of adapting a pre-trained language model to a specific task or dataset.
- [Prompt Engineering](https://www.iguazio.com/glossary/prompt-engineering/): Prompt engineering is the process of formulating inputs (prompts) to an AI model (usually an LLM) to achieve the desired outputs.
- [LLM Hallucinations](https://www.iguazio.com/glossary/llm-hallucination/): LLMs can sometimes produce outputs that are coherent and grammatically correct but factually incorrect or nonsensical.
- [Auto-Regressive Models](https://www.iguazio.com/glossary/auto-regressive-models/): Auto-regressive models are models used in time series datasets that predict future values based on past values.
- [AI Tokenization](https://www.iguazio.com/glossary/ai-tokenization/): Dive into AI tokenization, explore its usefulness in the AI ecosystem, and understand the advantages it brings to AI applications.
- [Model Behavior](https://www.iguazio.com/glossary/model-behavior/): Model behavior is the way in which a trained ML model makes predictions and decisions when it is exposed to new data.
- [Baseline Models](https://www.iguazio.com/glossary/baseline-models/): Baseline models are simple models that serve as the basis for evaluating the performance of more complex models
- [ML Stack](https://www.iguazio.com/glossary/ml-stack/): An ML Stack is the entire collection of technologies and frameworks used throughout ML development, deployment and management.
- [LLMOps](https://www.iguazio.com/glossary/llmops/): LLMOps refers to the set of practices and tools used to manage, streamline, and operationalize large language models, is a portmanteau of ‘LLM’ and ‘MLOps’.
- [Transfer Learning](https://www.iguazio.com/glossary/transfer-learning/): Transfer learning works by leveraging a pre-trained model's learned features and fine-tuning it on a new task-specific dataset.
- [Foundation Models](https://www.iguazio.com/glossary/foundation-models/): These models, often referred to as "base models" or "pre-trained models," have quickly become the building blocks of many advanced AI systems.
- [Large Language Models](https://www.iguazio.com/glossary/large-language-model-llms/): A large language model is an advanced AI system that has been trained on extensive amounts of data to understand and generate human-like language.
- [Data Ingestion](https://www.iguazio.com/glossary/data-ingestion/): Data ingestion for machine learning refers to the process of collecting and preparing data for use in machine learning models.
- [Data Pipeline Automation](https://www.iguazio.com/glossary/data-pipeline-automation/): Data pipeline automation is the process of automating the flow of data from one system or application to another.
- [Risk Management](https://www.iguazio.com/glossary/risk-management/): This article introduces the concept of risk management for ML and what technical risks come with ML models.
- [MLOps Governance](https://www.iguazio.com/glossary/mlops-governance/): Model governance is a supplementary component to MLOps that supports AI compliance and traceability by improving visibility and control over ML deployments
- [Model Evaluation](https://www.iguazio.com/glossary/model-evaluation/): This article defines model evaluation, discusses its importance, and introduces best practices on how to report on and perform the evaluation.
- [Holdout Dataset](https://www.iguazio.com/glossary/holdout-dataset/): When labels are present (i.e., in supervised learning,) the most simple and commonly used approach for model evaluation is a holdout dataset.
- [Overfitting](https://www.iguazio.com/glossary/overfitting/): Model overfitting is a statistical error in supervised ML, whereby the trained model fits the noise in the training data rather than its actual pattern.
- [Open Source Model](https://www.iguazio.com/glossary/open-source-model/): Open source has always been an integral part of AI. Open source refers to software for which the original source code is made publicly available.
- [Cross-Validation](https://www.iguazio.com/glossary/cross-validation/): An introduction to cross-validation, an overview of its benefits, and a walk through of when and how to use this technique in machine learning.
- [Automated Machine Learning](https://www.iguazio.com/glossary/automated-machine-learning/): Automated machine learning aims to simplify the entry requirements and ongoing resource requirements for developing real life AI solutions.
- [Recall](https://www.iguazio.com/glossary/recall/): Recall in ML: an introduction to this machine learning metric, a discussion of when to use it, and a walk-through of how to improve it.
- [Classification Threshold](https://www.iguazio.com/glossary/classification-threshold/): Classification is the set of algorithms that, together with regression, comprises supervised machine learning (ML).
- [Regression](https://www.iguazio.com/glossary/regression/): This article presents an introduction to ML regression, a review of the most common associated evaluation metrics, a walk-through of when to use regression
- [Model Training](https://www.iguazio.com/glossary/model-training/): This article presents an introduction to model training, a discussion of its importance, and a walk-through of how to train ML models during experimentation.
- [Continuous Validation](https://www.iguazio.com/glossary/continuous-validation/): A comprehensive definition of continuous validation in ML, a review of its importance, and a walk-through of the most common tools used in the field.
- [Model Tuning](https://www.iguazio.com/glossary/model-tuning/): What hyper parameters and model tuning are, why model tuning is important, and how to successfully tune your machine learning models.
- [Noise in ML](https://www.iguazio.com/glossary/noise-in-ml/): Understand the concept of noise in machine learning and its impact on model accuracy and decision-making, and how to handle.
- [Model Accuracy in Machine Learning](https://www.iguazio.com/glossary/model-accuracy-in-ml/): AI accuracy is the percentage of correct classifications that a trained machine learning model achieves.
- [Explainable AI](https://www.iguazio.com/glossary/explainable-ai/): Explainable AI (XAI) is a set of tools and methods that attempt to help humans understand the outputs of machine learning models.
- [Drift Monitoring](https://www.iguazio.com/glossary/drift-monitoring/): A part of the MLOps process, drift monitoring ensures model performance and relevance.
- [Model Serving Pipeline](https://www.iguazio.com/glossary/model-serving-pipeline/): A machine learning (ML) model pipeline or system is a technical infrastructure used to automatically manage ML processes. Learn more here.
- [Image Processing Framework](https://www.iguazio.com/glossary/image-processing-framework/): Image processing is the series of operations aimed at improving the quality of images for computer vision tasks so they can be more predictive.
- [Kubernetes for MLOps](https://www.iguazio.com/glossary/kubernetes-for-mlops/): Kubernetes is the preferred MLOps tool to manage automated machine learning pipelines in a reproducible, safe, and scalable way.
- [GPU for Machine Learning](https://www.iguazio.com/glossary/gpu-for-machine-learning/): What a GPU is, how it works, how it compares with other typical hardware for ML, and how to select the best GPU for your application.
- [Machine Learning Infrastructure](https://www.iguazio.com/glossary/machine-learning-infrastructure/): A complete machine learning infrastructure famously involves a much wider variety of engineering and procedural components.
- [Model Retraining](https://www.iguazio.com/glossary/model-retraining/): ML model retraining is the MLOps capability to automatically and continuously retrain a machine learning model on a schedule or a trigger.
- [Model Deployment](https://www.iguazio.com/glossary/model-deployment/): Model deployment is the process of putting ML model into production, and it can be resource-intensive and time consuming.
- [Deep Learning Pipelines](https://www.iguazio.com/glossary/deep-learning-pipelines/): Bringing deep learning use cases to production is a particularly complex AI project, given the scale of data required.
- [Model Management](https://www.iguazio.com/glossary/model-management/): Model management is the component of MLOps that ensures a machine learning model is set up correctly.
- [Feature Vector](https://www.iguazio.com/glossary/feature-vector/): ML Glossary: A feature vector is an ordered list of numerical properties of observed phenomena.
- [Model Serving](https://www.iguazio.com/glossary/model-serving/): What production-grade model serving actually is, plus model serving use cases, tools, and model serving with Iguazio.
- [CI/CD for Machine Learning](https://www.iguazio.com/glossary/ci-cd-for-machine-learning/): In this article, we review the basics of a CI/CD pipeline and explain what implementing a CI/CD practice for ML entails
- [Feature Engineering](https://www.iguazio.com/glossary/feature-engineering/): Feature engineering selects and transforms the most relevant variables from raw data to create input features to machine learning models for inferencing.
- [Real Time ML](https://www.iguazio.com/glossary/real-time-ml/): Real-time machine learning applications need a real-time data pipeline to make time-critical decisions on fresh data.
- [Unsupervised Machine Learning](https://www.iguazio.com/glossary/unsupervised-ml/): Unsupervised machine learning algorithms can discover underlying features of a data set for further downstream processing and prediction tasks.
- [Kubeflow Pipelines](https://www.iguazio.com/glossary/kubeflow-pipelines/): Kubeflow pipelines is a platform for scheduling and orchestrating multi- and parallel-step ML workflows in a simple and robust way.
- [Machine Learning Lifecycle](https://www.iguazio.com/glossary/machine-learning-lifecycle/): The machine learning lifecycle is the cyclical process that data science projects follow, to bring business value from ML.
- [Model Monitoring](https://www.iguazio.com/glossary/model-monitoring/): Model performance monitoring is a basic operational task that is implemented after an AI model has been deployed.
- [Concept Drift](https://www.iguazio.com/glossary/concept-drift/): Concept drift is a natural part of an ML system. To ensure that models deliver value, ML teams need to build a drift-aware system.
- [Feature Store](https://www.iguazio.com/glossary/feature-store/): Feature stores are a central place to build, manage and share features across different teams in the organization.
- [ML Pipeline Tools](https://www.iguazio.com/glossary/machine-learning-pipeline-tools/): A machine learning pipeline tool helps automate and streamline machine learning pipelines. Learn more on our page.
- [ML Pipeline](https://www.iguazio.com/glossary/machine-learning-pipeline/): A machine learning pipeline helps to streamline and speed up the process by automating these workflows and linking them together.
- [Operationalizing ML](https://www.iguazio.com/glossary/operationalizing-machine-learning/): Operationalizing machine learning is one of the final stages before deploying and running an ML model in a production environment.
- [Enterprise Data Science](https://www.iguazio.com/glossary/enterprise-data-science/): Enterprise data science combines data scientists, data engineers, IT teams, and more to generate value out of big data.
## Case Studies
- [HCI Builds a Mature Enterprise MLOps Practice to Deploy 73+ Financial Use Cases](https://www.iguazio.com/case-study/hci-mature-mlops-finance/): With Iguazio, Sense powers a wide range of AI products aimed at increasing the efficiency and scalability of their AI operations.
- [LATAM Airlines Group Builds an ML Factory to Generate Business Impact Across the Organization](https://www.iguazio.com/case-study/latam-airlines-group-builds-an-ml-factory-to-generate-business-impact-across-the-organization/): LATAM deploys over 40 AI services across commercial and operational departments to create business value and plan for the post-pandemic future.
- [Seagate Uses AI to Detect Defects on the Factory Floor, Reduce Cost and Improve Yield](https://www.iguazio.com/case-study/seagate-runs-advanced-manufacturing-at-scale/): Seagate transitioned from a manual inspection process via microscope to fully automated deep learning & computer vision inspection
- [Sense Scales Chatbot Automation and HR Tech](https://www.iguazio.com/case-study/sense-scales-chatbot-automation/): Sense leverages automation and AI to speed up the recruitment process, while delivering a hyper-personalized candidate experience.
- [Sense Personalizes the Recruitment Experience with AI Chatbot Automation](https://www.iguazio.com/case-study/sense-automates-and-personalizes-the-recruitment-experience-with-ai/): With Iguazio, Sense powers a wide range of AI products aimed at increasing the efficiency and scalability of their AI operations.
- [LATAM Airlines Group Drives Innovation with a Cross-Company AI Strategy](https://www.iguazio.com/case-study/latam-drives-innovation-with-a-cross-company-ai-strategy/): LATAM deploys over 40 AI services across commercial and operational departments to create business value and plan for the post-pandemic future.
- [LATAM Airlines Group Drives Innovation with a Cross Company AI Strategy](https://www.iguazio.com/case-study/latam-drives-innovation/): LATAM deploys over 40 AI services across commercial and operational departments to create business value and plan for the post-pandemic.
- [S&P Global Makes Engineering Documents Searchable and Indexable with NLP](https://www.iguazio.com/case-study/s-and-p-global-makes-engineering-documents-searchable-indexable-with-nlp/): S&P Global deploys semantic extraction on engineering documents to drive better decision making, processing thousands of PDF files and more.
- [Ecolab Reduces Time to AI Deployment from 12 months to 30 days](https://www.iguazio.com/case-study/ecolab-reduces-time-to-ai-deployment-from-12-months-to-30-days/): Hygiene technologies leader Ecolab brings data science to production with Microsoft Azure and Iguazio
- [Ecolab Deploys Predictive Risk Models](https://www.iguazio.com/case-study/ecolab/): Hygiene technologies leader Ecolab brings data science to production with Microsoft Azure and Iguazio
- [Sheba Medical Center Improves Patient Outcomes and Experiences with AI](https://www.iguazio.com/case-study/sheba-medical-center/): Sheba Medical Center is driving a digital transformation by rapidly deploying and monitoring AI models across clinical and logistical use cases.
- [NetApp Deploys Real-Time Predictive Maintenance](https://www.iguazio.com/case-study/netapp-deploys-real-time-predictive-maintenance/): NetApp deployed Iguazio at the core of Active IQ, analyzing 10 trillion data points in real time from storage sensors worldwide .
- [NetApp Deploys Real-Time Predictive Maintenance & Advanced Analytics](https://www.iguazio.com/case-study/netapp-deploys-real-time-predictive-maintenance-advanced-analytics/): NetApp deployed Iguazio at the core of Active IQ, analyzing data points from storage sensors worldwide to generate actionable intelligence.
- [Quadient Saves Time and Costs Getting AI to Production](https://www.iguazio.com/case-study/quadient-case-study/): Quadient unifies and combines every single data type they work with, to help its clients deliver more meaningful customer experiences to consumers.
- [Ecolab Breaks the Silos Between Data Scientists, Engineers and DevOps with New MLOps Practices](https://www.iguazio.com/case-study/video-title-lorem-ipsum-dolor-sit-amet-consectetur-adipiscing-elit-sed-do-2/): How Ecolab is accelerating the deployment of AI applications by using new MLOps methodologies, leveraging microservices and more.
- [Payoneer Uses Real-Time AI for Fraud Prevention](https://www.iguazio.com/case-study/payoneer-case-study/): With a scalable and reliable fraud prediction and prevention model, fraud attacks are almost impossible on Payoneer.
- [PadSquad Predicts Ad Performance in Real Time Based on Multivariate Data](https://www.iguazio.com/case-study/padsquad/): PadSquad aggregates and processes real-time ad data, to optimize performance in real-time and deliver timely insights to customers
## Solutions
- [Iguazio for Data Engineers](https://www.iguazio.com/blog/solution/iguazio-for-data-engineers/):
## Q&A
- [Why should you combine traditional ML with LLMs?](https://www.iguazio.com/questions/why-should-you-combine-traditional-ml-with-llms/): Traditional machine learning (ML) has been reliably used for computational tasks due to its deterministic and reproducible nature. These models...
- [What are the milestones of developing a multimodal agent?](https://www.iguazio.com/questions/what-are-the-milestones-of-developing-a-multimodal-agent/): When designing a multi-agent system for process automation, such as in a contact center, the process typically starts with analyzing...
- [Managing and Optimizing Costs in Production-Ready Generative AI](https://www.iguazio.com/questions/what-are-some-ways-to-manage-and-optimize-costs-when-deploying-generative-ai-in-production/): Need to know about Managing Costs When Deploying Generative AI Solutions - Check our answer on Iguazio Q&A section.
- [What training is recommended to upskill (gen) AI talent in the organization?](https://www.iguazio.com/questions/what-training-is-recommended-to-upskill-gen-ai-talent-in-the-organization/): Unlike traditional machine learning, generative AI introduces unique challenges and opportunities, necessitating specialized skills and a proactive approach to learning....
- [What are some examples of how gen AI is impacting operational and business results?](https://www.iguazio.com/questions/what-are-some-examples-of-how-gen-ai-is-impacting-operational-and-business-results/): Gen AI positively impacts operational and business performance. For example, companies using gen AI-powered operator and technician co-pilots in maintenance...
- [How do gen AI and traditional AI complement each other?](https://www.iguazio.com/questions/how-do-gen-ai-and-traditional-ai-complement-each-other/): Gen AI and traditional AI serve different purposes. They can be used separately and together. Traditional AI, such as classification...
- [What are the recommended steps for evaluating gen AI outputs?](https://www.iguazio.com/questions/what-are-the-recommended-steps-for-evaluating-gen-ai-outputs/): Gen AI outputs need to be evaluated for accuracy, relevancy, comprehensiveness, how they de-risk bias and toxicity, and more. This...
- [Why is It Important to Monitor LLMs?](https://www.iguazio.com/questions/why-is-it-important-to-monitor-llms/): When deploying LLMs in production, monitoring prevents risks such as malfunction, bias, toxic language generation, or hallucinations. This allows for...
- [What Guardrails Can Be Implemented in (Gen) AI Pipelines?](https://www.iguazio.com/questions/guardrails-gen-ai-pipelines/): Effective gen AI guardrails are required throughout the data, development, deployment and monitoring AI pipelines These guardrails help in mitigating...
- [RAG vs. Fine-tuning: When to Use Each One?](https://www.iguazio.com/questions/rag-vs-fine-tuning-when-to-use-each-one/): In RAG, the model queries an external dataset or knowledge base, typically using a vector space model where documents or...
- [What are some tips and steps for improving LLM prediction accuracy?](https://www.iguazio.com/questions/what-are-some-tips-and-steps-for-improving-llm-prediction-accuracy/): Looking for tools to assist with LLM prediction accuracy? Try open-source DeepEval or RAGAS. Additional metrics that can help you evaluate...
- [Which gen AI smart call center app use cases are other companies implementing?](https://www.iguazio.com/questions/which-gen-ai-smart-call-center-app-use-cases-are-other-companies-implementing/): Call center companies can use gen AI for a wide range of use cases. Gen AI can assist with demand...
- [What is hyper-personalization in gen AI?](https://www.iguazio.com/questions/what-is-hyper-personalization-in-gen-ai/): Gen AI has opened up new opportunities for segmentation and personalization. Access to real-time behavioral data and analytics allows catering...
- [How can organizations address risks in gen AI?](https://www.iguazio.com/questions/how-can-organizations-address-risks-in-gen-ai/): There are several risk factors to avoid when implementing gen AI. The most important ones are accuracy and hallucination. Despite...
- [Can LLMs be implemented in non-English languages?](https://www.iguazio.com/questions/can-llms-be-implemented-in-non-english-languages/): Iguazio supports any and all languages for gen AI applications. Today, there are projects in English, Turkish, Arabic, Portuguese, Spanish,...
- [What level of PII reduction accuracy in the AI pipeline is acceptable?](https://www.iguazio.com/questions/what-level-of-pii-reduction-accuracy-in-the-ai-pipeline-is-acceptable/): The acceptable level of PII reduction accuracy in an AI pipeline depends on various factors, including the specific use case,...
- [What is a LLM pricing strategy?](https://www.iguazio.com/questions/what-is-a-llm-pricing-strategy/): Need to know %%title%% - Check our answer on Iguazio Q&A section. All you need about LLMops.
- [What are the risks of open source LLMs?](https://www.iguazio.com/questions/what-are-the-risks-of-open-source-llms/): Open source Large Language Models (LLMs), like any other software, can pose security risks if not properly managed and used....
- [How can LLMs help determine user intent?](https://www.iguazio.com/questions/how-can-llms-help-determine-user-intent/): User intent is a classification problem. By giving the LLM model different examples of classifications, it can help simplify the...
- [What’s the best way to get related context to feed the prompt using similarity search?](https://www.iguazio.com/questions/whats-the-best-way-to-get-related-context-to-feed-the-prompt-using-similarity-search/): If you weren’t able to select the correct chunks, you can: For example, let’s assume you have Uber’s financial reports...
- [What are the privacy and security implications of using open source components in AI?](https://www.iguazio.com/questions/what-are-the-privacy-and-security-implications-of-using-open-source-components-in-ai/): The topic of privacy and security is of utmost concern when dealing with AI, and specifically with generative AI. It...
- [What's the best way to extract data from video with generative AI?](https://www.iguazio.com/questions/whats-the-best-way-to-extract-data-from-video-with-generative-ai/): To summarize information and sentiment from videos, it’s recommended to take a text classification approach. This is a similar approach...
- [What’s the best way to perform search queries in documents with generative AI?](https://www.iguazio.com/questions/whats-the-best-way-to-perform-search-queries-in-documents-with-generative-ai/): There are a number of use cases that require searching for and summarizing dynamic data. Examples include news websites, which...
- [Why is it important for data scientists and DevOps teams to collaborate and communicate around GenAI and MLOps?](https://www.iguazio.com/questions/why-is-it-important-for-data-scientists-and-devops-teams-to-collaborate-and-communicate-around-genai-and-mlops/): Need to know about the Importance of Collaboration in GenAI and MLOps - Check our answer on Iguazio Q&A section.
- [Addressing Scalability and Performance Challenges in Generative AI](https://www.iguazio.com/questions/whats-the-best-way-to-address-scalability-and-performance-challenges-for-a-generative-ai-app/): Need to know best way to address scalability and performance challenges for a generative AI app - Check our answer on Iguazio Q&A section.
- [What are some best practices for establishing thresholds to trigger a new iteration of a generative AI model with new prompts in the ML lifecycle?](https://www.iguazio.com/questions/what-are-some-best-practices-for-establishing-thresholds-to-trigger-a-new-iteration-of-a-generative-ai-model-with-new-prompts-in-the-ml-lifecycle/): What are best practices for establishing thresholds to trigger a new iteration of a generative AI model with new prompts in the ML lifecycle?
- [How can organizations implement guardrails to ensure ethical use of AI?](https://www.iguazio.com/questions/how-can-organizations-implement-guardrails-to-ensure-ethical-use-of-ai/): A human-centered approach is the cornerstone of ethical AI. To implement this approach: Interested in learning more? Check out this...
- [Is there a privacy issue or data leaking risk with custom models that utilize proprietary or public data?](https://www.iguazio.com/questions/is-there-a-privacy-issue-or-data-leaking-risk-with-custom-models-that-utilize-proprietary-or-public-data/): Need to know about privacy issue or data leaking risk with custom models that use public data - Check our answer on Iguazio Q&A section.
- [What is the key difference between fine tuning and embedding a foundational model?](https://www.iguazio.com/questions/what-is-the-key-difference-between-fine-tuning-and-embedding-a-foundational-model/): Fine-tuning is the process of taking a pre-trained model and training it further on a smaller, domain-specific dataset. This additional...
- [What are the steps in the MLOps workflow that are specific to LLMs?](https://www.iguazio.com/questions/what-are-the-steps-in-the-mlops-workflow-that-are-specific-to-llms/): Most of the MLOps steps are focused on structured data. To adapt them to LLMs, the steps need to be...
- [How can LLMs be customized with company data?](https://www.iguazio.com/questions/how-can-llms-be-customized-with-company-data/): Organizations that do not have an external dataset for training their models or want to have a model that is...
- [How can costs be optimized when deploying LLMs?](https://www.iguazio.com/questions/how-can-costs-be-optimized-when-deploying-llms/): Training and deploying LLMs can be a costly activity. This is because training and deploying LLMs requires substantial computational power,...
- [MLRun vs. Seldon: What's the difference?](https://www.iguazio.com/questions/mlrun-vs-seldon-whats-the-difference/): Seldon is an MLOps solution but it is not a serverless technology. Users will need to provide Seldon with code,...
- [Can MLRun be used with Amazon SageMaker?](https://www.iguazio.com/questions/can-mlrun-be-used-with-amazon-sagemaker/): MLRun can run as easily on Amazon SageMaker as it does on a local computer. In fact, it is environment-agnostic....
- [Can MLRun Support Models Built in AWS Dev Accounts for Upstream?](https://www.iguazio.com/questions/can-mlrun-support-models-built-in-an-aws-dev-account-and-promote-them-to-upstream-environments/): Need to know if Can MLRun support models built in an AWS Dev Account - Check our answer on Iguazio Q&A section.
- [Does MLRun orchestrate on a Kubernetes operator or use a classic Helm chart?](https://www.iguazio.com/questions/mlrun-k8s-or-helm-chart/): Does MLRun orchestrate on a Kubernetes operator or use a classic Helm chart? Check our answer on Iguazio Q&A section
- [Can MLRun utilize GPUs when running ML jobs?](https://www.iguazio.com/questions/can-mlrun-utilize-gpus-when-running-ml-jobs/): Can MLRun use GPUs? - Check our answer on Iguazio Q&A section.
- [How do I use MLRun for batch sizing?](https://www.iguazio.com/questions/batch-sizing/): How to use MLRun for batch sizing - Check our answer on Iguazio Q&A section.
- [How do I use MLRun for real-time streaming configuration?](https://www.iguazio.com/questions/how-do-i-use-mlrun-for-real-time-streaming-configuration/): Real-time streaming is the process of collecting and ingesting data and processing it in real-time to answer business use cases....
- [How do MLRun and Iguazio plug into the ML ecosystem?](https://www.iguazio.com/questions/how-do-mlrun-and-iguazio-plug-into-the-ml-ecosystem/): MLRun is an open source MLOps orchestration tool at the core of the Iguazio MLOps Platform. MLRun integrates data preparation,...
- [MLRun vs. Airflow vs. MLFlow](https://www.iguazio.com/questions/mlrun-vs-airflow-vs-mlflow/): There is some overlap between MLFlow and MLRun, but they have totally different goals. MLRun is an end to end...
- [Does Iguazio support data versioning and data labelling?](https://www.iguazio.com/questions/does-iguazio-support-data-versioning-and-data-labelling/): Need to know %%title%% - Check our answer on Iguazio Q&A section
- [How do you define the overlapping responsibilities between data scientists, data engineers and MLOps engineers?](https://www.iguazio.com/questions/how-do-you-define-the-overlapping-responsibilities-between-data-scientists-data-engineers-and-mlops-engineers/): Need to know overlapping responsibilities between data scientists, data engineers and MLOps engineers - Check our answer
- [Iguazio vs. MLRun vs. Nuclio: Whats the Difference?](https://www.iguazio.com/questions/iguazio-vs-mlrun-vs-nuclio-whats-the-difference/): Nuclio and MLRun are two open-source technologies that the Iguazio team maintains. Nuclio is a serverless platform, and MLRun is...
- [What Are the Tradeoffs Between a Data Lake and a Data Warehouse?](https://www.iguazio.com/questions/what-are-the-tradeoffs-between-a-data-lake-and-a-data-warehouse%ef%bf%bc/): A data warehouse is for structured reporting data that is typically used for reporting data with consistent business data points...
- [Batch Processing vs. Stream Processing: What’s the Difference? ](https://www.iguazio.com/questions/batch-processing-vs-stream-processing/): Batch and stream processing are two types of methods used to process data, which is one of the steps in feature engineering. The...
- [Static Deployment vs. Dynamic Deployment: What’s the Difference? ](https://www.iguazio.com/questions/static-vs-dynamic-deployment/): In Static Deployment, the model is trained offline, with batch data. The model is trained once with features generated from historical batch...
- [How do I train a model on very large datasets?](https://www.iguazio.com/questions/how-do-i-train-a-model-on-very-large-datasets/): Distributed computing tools are a great way to run training on very large datasets. Many ML use cases require big...
- [How do I move my batch pipeline over to real time?](https://www.iguazio.com/questions/how-do-i-move-my-batch-pipeline-over-to-real-time/): Moving your pipelines from batch to real-time is a complex endeavor. In many cases, the pipeline needs to be redesigned...
- [How do I automate the training pipeline with my CI/CD framework?](https://www.iguazio.com/questions/how-do-i-automate-the-training-pipeline-with-my-ci-cd-framework/): When automating a pipeline of any kind, whether it be ML pipelines like model training or data-focused pipelines such as...
- [What is the difference between data drift and concept drift?](https://www.iguazio.com/questions/what-is-the-difference-between-data-drift-and-concept-drift/): Data drift vs. concept drift: It’s important to understand the difference between them, because they require different approaches.
- [What is self-supervised learning in machine learning and how is it different from supervised learning?](https://www.iguazio.com/questions/what-is-self-supervised-learning-in-machine-learning-and-how-is-it-different-from-supervised-learning/): Self-supervised learning is an evolving technique of helping ML models to learn from more data, without the human-labelled datasets.
- [How do I serve models for real-time enterprise applications?](https://www.iguazio.com/questions/how-do-i-serve-models-for-real-time-enterprise-applications/): You are basically asking for model serving or a way to manage and deliver your models in a secure and...
- [What can I do with model monitoring?](https://www.iguazio.com/questions/what-can-i-do-with-model-monitoring/): In a nutshell, model monitoring allows a data scientist or DevOps engineer to keep track of a machine learning model after it...
- [Why is model monitoring so important?](https://www.iguazio.com/questions/why-is-model-monitoring-so-important/): After spending a long time developing and training our model, it’s finally time to go to production. But how do...
- [Where does a feature store fit into the ML lifecycle?](https://www.iguazio.com/questions/where-does-a-feature-store-fit-into-the-ml-lifecycle/): Where the feature store fits into the overall ML lifecycle depends on the functionality of the feature store. Some feature stores...
- [Who benefits from a feature store?](https://www.iguazio.com/questions/who-benefits-from-a-feature-store/): Aside from the technical benefits of a feature store, one of the main benefits is organizational. In a typical enterprise...
- [How do I select monitoring metrics specific to my use case?](https://www.iguazio.com/questions/how-do-i-select-monitoring-metrics-specific-to-my-use-case/): To maintain the accuracy of an ML model in production, and detect drops in performance, it can sometimes be useful...
- [Data preprocessing vs. feature engineering](https://www.iguazio.com/questions/data-preprocessing-vs-feature-engineering-whats-the-difference/): What is Data Preprocessing? Data preprocessing is the process of cleaning and preparing the raw data to enable feature engineering....
- [How are ML pipelines evolving to make way for MLOps?](https://www.iguazio.com/questions/how-are-ml-pipelines-evolving-to-make-way-for-mlops-2/): The concept of an ML pipeline is an automated pipeline that can be created from steps that take a model...
- [What businesses benefit the most from MLOps?](https://www.iguazio.com/questions/what-businesses-benefit-the-most-from-mlops/): In short, any business with a data science team that produces ML models that address business operations. The phrase “data-driven”...
- [What are Kubeflow Pipelines?](https://www.iguazio.com/questions/what-are-kubeflow-pipelines/): On a high level, the concept of pipelines in ML refers to a way of linking sequential components of the...
- [What are the pros and cons of MLOps?](https://www.iguazio.com/questions/what-are-the-pros-and-cons-of-mlops/): In the race to solve business problems, more companies have invested considerable capital into becoming data-driven. The hiring cycle is...
- [What are the key components of a successful MLOps strategy?](https://www.iguazio.com/questions/what-are-the-key-components-of-a-successful-mlops-strategy/): The key components of a successful MLOps strategy revolve around having standards and best practices that will help you develop...
- [What are a feature store’s capabilities?](https://www.iguazio.com/questions/what-are-a-feature-stores-capabilities/): Robust Data Transformation and Real Time Feature Engineering A feature store provides a means for creating a feature list in...
- [What is MLOps, and why should we care?](https://www.iguazio.com/questions/what-is-mlops-and-why-should-we-care/): MLOps (Machine Learning Operations) is a combination of Machine Learning and DevOps principles. It uses DevOps concepts to manage the entire lifecycle of...
- [Why does governance come first in MLOps?](https://www.iguazio.com/questions/why-does-governance-come-first-in-mlops/): Sub title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam magna est, efficitur ut sem in, condimentum finibus enim....
- [How are ML pipelines evolving to make way for MLOps?](https://www.iguazio.com/questions/how-are-ml-pipelines-evolving-to-make-way-for-mlops/): The concept of an ML pipeline is an automated pipeline that can be created from automated steps that take a...
---
#
# Detailed Content
## Pages
### De-Risk Your Gen AI Applications
---
### Company
---
### Gen AI Ops
---
### CI/CD for ML
---
### AI in Secure IT Environments: AWS GovCloud & SCIF
---
### Data Mesh
---
### Gaming
---
### Security
---
### ODSC MLOps Resource Center
---
### Technology
---
### Iguazio’s ESG Strategy
---
### Questions
---
### Energy and Utilities
---
### Hackathon Terms
---
### MLOps Glossary
---
### Technology OLD
---
### MLRun
---
### MLOps
---
### Real-Time Feature Engineering
---
### Model Monitoring
---
### Healthcare
---
### Home - New
---
### Customers
---
### Integrated Feature Store
Transforming Data into Advanced Offline and Online Features Building a Feature Vector From Features Real-Time Features and Drift Detection
---
### What Is A Machine Learning Pipeline?
Want to learn more about machine learning pipelines? Book a live demo here.
---
### What Are Machine Learning Pipeline Tools?
Want to learn more about machine learning pipeline tools? Book a live demo here.
---
### What Is Enterprise Data Science?
Want to learn more about enterprise data science? Book a live demo here.
---
### Machine Learning Operations (MLOps)
---
### What is Operationalizing Machine Learning?
Want to learn more about operationalizing data science? Book a live demo here.
---
### MLOps Live Webinar Series
---
### Iguazio Support Policy
---
### Customer Support
---
### Terms of Use
---
### Privacy Policy
---
### Career Inner Page
---
### AI Pipeline Orchestration
---
### News & Events
---
### Nuclio
---
### Careers
---
### Serverless Automation
---
### Ad-Tech
---
### Smart Mobility
---
### Retail
---
### Manufacturing
---
### Telecommunications
---
### Solutions
---
### GPU Management
---
### Open Source
---
### Partners
---
### Contact
---
### Financial Services
---
### AI Blog
---
### Platform
---
## Posts
### Introducing Agentic RAG: The Best of Both Worlds
RAG and Agentic AI shape how intelligent systems interact with data and users. RAG enhances LLMs by retrieving external information to improve accuracy and contextual relevance, while Agentic AI introduces autonomy, decision-making, and adaptability into AI-driven workflows. Agentic RAG combines the power of both, transforming RAG into a multi-step, autonomous, complex process that can self-improve. In this article, we’ll explore how Agentic RAG revolutionizes AI-powered applications by making them more autonomous, intelligent, and context-aware. What is RAG? RAG (Retrieval-Augmented Generation) is an AI approach that enhances LLM outputs by retrieving relevant information from external sources, like databases or the Internet, before generating an answer. This improves response accuracy, reduces hallucinations, and provides more up-to-date responses. It also allows for domain expertise, since the LLM can retrieve industry-specific or proprietary knowledge from related databases and sources. What is Agentic AI? Agentic AI is a type of AI system in which AI applications exhibit a level of autonomy, goal-directed behavior, and adaptability. Unlike traditional AI models that respond only to user inputs, agentic AI can proactively take action, learn from feedback, and optimize its behavior toward achieving specific objectives. This allows the application to make decisions, plan actions, and interact dynamically with its environment. Examples include agents for software development that can design advanced code based on autonomous data analysis or AI personal assistants that proactively manage calendars and emails. Exploring the Concept of Agentic RAG Traditional RAG frameworks retrieve relevant documents from a knowledge base and feed them into a language...
---
### Gen AI Trends and Scaling Strategies for 2025
Generative AI isn’t just moving fast—it’s on turbo mode. Gartner confirms it in their popular Hype Cycle, compared to other evaluated technologies: gen AI tech is rocketing through the stages faster than anything else. In under three years, it’s already crashing into the trough of disillusionment, while prompt engineering shot to peak hype almost the second it emerged. In this blog post, we bring insights for AI leaders Svetlana Sicular, Research VP, AI Strategy, Gartner and Yaron Haviv, co-founder and CTO, Iguazio (acquired by McKinsey). To see the complete conversation and dive into their insights, watch the webinar here. What Gen AI Trends is Gartner Seeing? Some of the significant fast-paced trends Gartner is seeing include: AI-ready data - Large players are announcing partnerships and acquisitions of target="_blank" rel="noreferrer noopener">See the webinar for more Gartner trends. How should organizations respond to these changes? Gartner sees this as a matter of appetite and strategy. Enterprises seeking immediate productivity should look at implementing mature technologies on the right-hand side of the hype cycle. Enterprises seeking competitive differentiation should focus on emerging technologies on the left-hand side of the hype cycle. However, they should be mindful of skill availability. Established technologies have more implementers, whereas cutting-edge innovations require experts and ongoing skill development. At the early trigger stage of the hype cycle, enterprises should expect to invest in growing expertise rather than relying on readily available talent. What Does it Take to Productize Gen AI Applications? The technologies evaluated in Gartner’s hype cycle...
---
### AI Agent Training: Essential Steps for Business Success
AI agents are transforming business operations by automating processes, improving decision-making and unlocking new efficiencies. However, their effectiveness depends on how well they are trained. AI Agent Training is the structured process of teaching AI models to perform multi-step assignments, make decisions and adapt to real-world scenarios. Through various training methodologies, such as supervised learning, reinforcement learning, and transfer learning, AI agents can enhance business functions across industries, from fraud detection in finance to customer support automation and predictive maintenance. This guide explores the core principles, methodologies, challenges and best practices for training AI agents that are not only powerful but also reliable, scalable, and aligned with business goals. What is AI Agent Training? AI Agent Training is a conceptual term that refers to the process of teaching an LLM how to perform multi-step tasks, make complex decisions and adapt to real-world scenarios. This is done by training the LLM on data and providing feedback about the results. There are numerous training methods, like supervised or unsupervised learning, reinforcement learning, few-shot or zero-shot learning, and more (see below). By training them, LLMs can act agentically to provide business value across complex use cases, like customer service chatbots, finance fraud prediction, diagnosing diseases and more. Why AI Agent Training Matters for Businesses AI agents are autonomous software programs that interact with their surroundings to achieve a goal. The effectiveness of achieving this goal depends on how well they are trained. Organizations that effectively train agents can turn them into an...
---
### Best 13 Free Financial Datasets for Machine Learning [Updated]
Financial services companies are leveraging data and machine learning to mitigate risks like fraud and cyber threats and to provide a modern customer experience. By following these measures, they are able to comply with regulations, optimize their trading and answer their customers’ needs. In today’s competitive digital world, these changes are essential for ensuring their relevance and efficiency. How can financial services companies build, expand and optimize their use of data and ML? Open and free financial datasets and economic datasets are an essential starting point for data scientists and engineers who are developing and training ML models for finance. But sadly, they can be hard to come by. Here are 13 excellent open financial and economic datasets and data sources for financial data for machine learning. 1. Data. gov A US governmental website hosted by the General Services Administration Technology Transformation Service, data. gov provides a catalog of government data in open-machine and readable formats. To find financial-related datasets, you can search for relevant keywords, e. g “credit card”, and get a list of the available datasets for you to consume. Get the datasets here 2. Data. gov. in Nation-wide data sets from across India, intended to make Indian government-owned shareable data accessible in human and machine readable formats. There are 2,394 Resources in 484 catalogs related to finance, covering topics like consumer price indexes, GDP estimates, prices and more. Get the datasets here 3. data. europa. eu data. europa. eu is the official portal for EU European data....
---
### Gen AI or Traditional AI: When to Choose Each One
When it comes to leveraging AI to capture business value, it’s worth asking, “what kind of AI do we need exactly? ” There are significant differences between the methodologies collectively referred to as AI. While 2024 might have almost convinced us that gen AI is the end-all-be-all, there is also what’s sometimes called ‘traditional’ AI, deep learning, and much more. The question is: how do you decide when to use traditional AI, which excels in structured data and predictive modeling, versus generative AI, which shines in creating new content and enhancing human-like interactions? Choosing the right type of AI for your specific use case is critical to maximizing value, minimizing risk, and ensuring sustainable growth. In this blog, we’ll explore the economic potential of generative AI, discover its transformative capabilities across industries, and compare it to the applications of traditional AI. By the end, you’ll have a clear roadmap for identifying when to lean on gen AI and when to choose traditional AI to achieve your business goals. The Potential Economic Value of Gen AI The McKinsey report “The economic potential of generative AI: The next productivity frontier” estimates that generative AI has the potential to add $2. 6 trillion to $4. 4 trillion to the global economy. These sums increase the impact of all AI by an astounding 15 to 40 percent. In addition, the report states, this impact could even double if gen AI is extended beyond the analyzed use cases. The impact of generative AI is...
---
### Top Gen AI Demos of AI Applications With MLRun
Gen AI applications can bring invaluable business value across multiple use cases and verticals. But sometimes it can be beneficial to experience different types of applications that can be created and operationalized with LLMs. Better understanding the potential value can help: Garner excitement and collaboration across the organization Help secure resources Support planning strategies, from a single use cases to many and from PoC to operationalization Drive more ideas for innovation Help de-risk pitfalls And more In this blog post, we’ve curated the top gen AI demos of AI applications that can be developed with open-source MLRun. Each of these demos can be adapted to a number of industries and customized to specific needs. Follow along and choose the most relevant ones for your needs. You can also watch the complete library of demos here. 1. Smart Call Center Analysis Application Build a call analysis platform for call centers. The application analyzes customers calls and generates actionable insights for agents, management and downstream applications. It helps support agents, create tailored recommendations for customers, and more. This improves the customer experience, helps optimize first call resolution, enhances operational efficiency, supports decision-making and helps meet compliance regulations. The gen AI application is built on a multi-step pipeline that includes diarization, transcription, PII filtering, analysis, post-processing, among others. Output structured data is stored in a database, accessible for reporting or downstream applications. Visual dashboards provide metrics like topic summaries, ag and data filtering outcomes. Open-source MLRun automates the entire workflow, auto-scales resources as...
---
### 6 Best Practices for Implementing Generative AI
Generative AI has rapidly transformed industries by enabling advanced automation, personalized experiences and groundbreaking innovations. However, implementing these powerful tools requires a production-first approach. This will maximize business value while mitigating risks. This guide outlines six best practices to ensure your generative AI initiatives are effective: valuable, scalable, compliant and future-proof. From designing efficient pipelines to leveraging cutting-edge customization techniques, we’ll walk through the steps that will help your organization harness the full power of generative AI. 1. Create Modular Pipelines to Orchestrate Operationalization from End-to-End AI pipelines ensure that each step of the AI lifecycle—across data management, development, deployment and monitoring—is automated and optimized for performance. This is particularly important in large-scale gen AI projects, which handle massive datasets and complex models. Streamlining the entire process saves engineering resources, reduces operational bottlenecks and fosters collaboration across teams, from data scientists to DevOps. This ensures models stay accurate and reliable over time. Here’s our suggestion for four pipelines that will bring your models from the lab to production. Data management - Ensuring data quality through data ingestion, transformation, cleansing, versioning, tagging, labeling, indexing, and more. Development - High quality model training, fine-tuning or prompt tuning, validation and deployment with CI/CD for ML. Application - Bringing business value to live applications through a real-time application pipeline that handles requests, data, model and validations. LiveOps - Improving performance, reducing risks and ensuring continuous operations by monitoring data and models for feedback. 2. De-Risk Models and Establish AI Governance and Compliance Operationalizing LLMs...
---
### 2025 Gen AI Predictions: What Lies Ahead?
In 2024, organizations realized the revolutionizing business potential of gen AI. They accelerated their gen AI operationalization processes: explored new use cases to implement, researched LLMs and AI pipelines and contemplated underlying ethical issues. And with the seeds of the AI revolution now planted, the market is maturing accordingly. This means that in 2025, we’re likely to already see organizations offering gen AI services to customers, embedded in commercial applications - from SaaS CRMs to chatbots. To support this leap forward, the underlying tech stack will mature as well. Multi-agent systems and multimodality will become widely available, gen AI service providers will expand features and offerings and open-source will challenge commercial vendors with new capabilities. And with LLMs becoming more accurate and intelligent, and new guardrails are introduced, the risk of hallucinations diminishes, perpetuating trust that drives this entire ecosystem forward. Two years after OpenAI released ChatGPT to the public, will 2025 finally be the year gen AI becomes an inseparable part of business? What kind of obstacles do organizations need to overcome? Who will thrive in this new ecosystem? Here are my predictions for the upcoming year: 1. Gen AI Embedded as a Service Forward-thinking enterprises have been working on implementing gen AI as part of their innovation strategies. Excited about the opportunities and vast potential in gen AI, they are ramping up their efforts to implement gen AI in their applications. These could be either to support internal operations, (like McKinsey’s Lilli or a co-pilot agent to support...
---
### Choosing the Right-Sized LLM for Quality and Flexibility: Optimizing Your AI Toolkit
LLMs are the foundation of gen AI applications. To effectively operationalize and de-risk LLMs and ensure they bring business value, organizations need to consider not just the model itself, but the supporting infrastructure, including GPUs and operational frameworks. By optimizing them to your use case, you can ensure you are using an LLM that is the right fit to your needs. Unlike a one-size-fits-all approach (e. g, using an out-of-the-box public model), adapting your LLM and infrastructure to the right size allows tailoring the model to specific use cases, ensures better performance and saves costs. In this blog post, we’ll take you through the various phases for choosing and optimizing your LLM: model selection, GPU selection, GPU utilization and MLOps practices. In the end, you’ll have the tools to make the necessary choices for building your gen AI application foundation. Step 1: Model Selection Choosing the Model The first step, even before choosing the model, is separating your use case into tasks. These tasks should be as small and focused as possible. Now it’s time to choose the right model for the tasks at hand. While it’s tempting to jump into complex solutions that provide a wide-range of capabilities - we recommend taking a different approach, and starting with smaller, simpler models, so you can assign a small model to each small task. Each task should be handled by the smallest model possible. You can join tasks if your chosen model can handle a number of them at the same...
---
### MLRun v1.7 Launched — Solidifying Generative AI Implementation and LLM Monitoring
As the open-source maintainers of MLRun, we’re proud to announce the release of MLRun v1. 7. MLRun is an open-source AI orchestration tool that accelerates the deployment of gen AI applications, with features such as LLM monitoring, fine-tuning, data management, guardrails and more. We provide ready-made scenarios that can be easily implemented by teams in organizations. This new version brings substantial enhancements that address these increasing demands of gen AI deployments, with a particular focus on monitoring LLMs. Additional updates introduce performance optimizations, multi-project management, and more. MLRun v1. 7 is the culmination of months of hard work and collaboration between the Iguazio engineering team, MLRun users and the open-source community. We’ve listened to what our users are saying and have designed this version (and the upcoming ones) to address needs and gaps for managing ML and AI across the lifecycle. With this new version, users ranging from individual contributors to large teams in enterprises will be able to deploy AI applications much faster and with more flexibility than before. Brief Intro for Newcomers: What is MLRun? MLRun is an open-source AI orchestration framework for managing ML and generative AI applications across their lifecycle, to accelerate their productization. It automates data preparation, model tuning, customization, validation and optimization of ML models, LLMs and live AI applications over elastic resources. MLRun enables the rapid deployment of scalable real-time serving and application pipelines, while providing built-in observability and flexible deployment options, supporting multi-cloud, hybrid, and on-prem environments. Why This Release Focuses on...
---
### Gen AI for Marketing - From Hype to Implementation
Gen AI has the potential to bring immense value for marketing use cases, from content creation to hyper-personalization to product insights, and many more. But if you’re struggling to scale and operationalize gen AI, you’re not alone. That’s where most enterprises struggle. To date, many companies are still in the excitement and exploitation phase of gen AI. Few have a number of initial pilots deployed and even fewer have simultaneous pilots and are building differentiating use cases. Only a handful have select use cases running at scale, and no company, per McKinsey’s knowledge, is using gen AI as the new normal in their day-to-day use cases. In this blog post, we provide a staged approach for rolling out gen AI, together with use cases, a demo and examples that you can implement and follow. For more details, watch the webinar this blog post is based on. The webinar hosts Eli Stein, Partner and Modern Marketing Capabilities Leader from McKinsey, Ze’ev Rispler, ML Engineer, from Iguazio (acquired by McKinsey), and myself. Watch the entire webinar here. Marketing and Technology: A Historical Perspective Marketing is frequently being disrupted by technology. This has occurred with computing in the 1970s, the Internet in the 1990s and mobile in the 2000s. In 2022, “AI everywhere” has enabled zero marginal cost of content generation. In the upcoming years, we can expect to see AGI (Artificial General Intelligence), which will provide intelligence beyond human capabilities, at zero marginal cost. And similar to how some marketers were...
---
### Implementing Gen AI in Regulated Sectors: Finance, Telecom, and More
If 2023 was the year of gen experimentation, 2024 is the year of gen AI implementation. As companies embark on their implementation journey, they need to deal with a host of challenges, like performance, GPU efficiency and LLM risks. These challenges are exacerbated in highly-regulated industries, such as financial services and telecommunication, adding further implementation complexities. Below, we discuss these challenges and present some best practices and solutions to take into consideration. We also show a banking chatbot demo that includes fine-tuning a model and adding guardrails. This blog post is based on the webinar “Implementing Gen AI in Highly Regulated Environments” with Asaf Somekh, co-founder and CEO, and Yaron Haviv, co-founder and CTO, of Iguazio (acquired by McKinsey). You can watch the entire session here. Challenges Implementing and Scaling Gen AI Implementing and scaling gen AI comes with a host of challenges: Performance vs. cost - The massive scale of LLMs make these models relatively inefficient and compute-heavy. Costs can especially spike in highly-regulated industries, where the models need to be run within cloud accounts. In addition, data science teams need flexibility to switch between different LLMs to fit their use case, further enhancing costs. The data science team needs to find a balance between performance accuracy and cost efficiency. At a business level, the organization needs to ensure the cost associated with the development and deployment is lower than the value the application will bring to the business. GPU efficiency - GPUs are scarce and costly. The team...
---
### Building Scalable Gen AI Apps with Iguazio and MongoDB
AI and generative Al can lead to major enterprise advancements and productivity gains. By offering new capabilities, they open up opportunities for enhancing customer engagement, content creation, virtual experts, process automation and optimization, and more. According to McKinsey & Company, gen Al has the potential to deliver an additional $200-340B in value for the banking industry. One popular gen AI use case is customer service and personalization. Gen AI chatbots have quickly transformed the way that customers interact with organizations. They can handle customer inquiries and provide personalized recommendations while empathizing with them and offering nuanced support that is tailored to the customer’s individual needs. Another less obvious use case is fraud detection and prevention. AI offers a transformative approach by automating the interpretation of regulations, supporting data cleansing, and enhancing the efficacy of surveillance systems. AI-powered systems can analyze transactions in real-time and flag suspicious activities more accurately, which helps institutions take informed actions to prevent financial losses. In this blog post, we introduce the joint MongoDB-Iguazio gen AI solution, which allows for the development and deployment of resilient and scalable gen AI applications. Before diving into how it works and its value for you, we will introduce MongoDB and Iguazio (acquired by McKinsey). We will then list the challenges enterprises are dealing with today when operationalizing gen AI applications. In the end, we’ll provide resources on how to get started. MongoDB for end-to-end AI data management MongoDB Atlas, an integrated suite of data services centered around...
---
### RAG vs Fine-Tuning: Navigating the Path to Enhanced LLMs
RAG and Fine-Tuning are two prominent LLM customization approaches. While RAG involves providing external and dynamic resources to trained models, fine-tuning involves further training on specialized datasets, altering the model. Each approach can be used for different use cases. In this blog post, we explain each approach, compare the two and recommend when to use them and which pitfalls to avoid. What is Fine-Tuning? LLM fine-tuning is an AI/ML LLM customization process where a pre-trained model is further trained on a new dataset that is specific to a particular task. Unlike RAG, another form of LLM customization, this includes modifying the model’s weights and parameters based on the new data. By adapting and “tweaking” the LLM, fine-tuning improves the model’s performance and accuracy for the required task. This allows for better applicability to specific tasks, domains and use cases and brings more business value. For example: 1. If you need a model that excels in legal document analysis, you can fine-tune an LLM pre-trained in English, using a corpus of legal texts. The fine-tuned model will then better understand legal jargon, context and nuances. The result will be a model highly effective for tasks like legal document classification or summarization. 2. A pre-trained image recognition model can be fine-tuned to identify specific objects relevant to a particular industry, such as medical imaging for tumor detection or industrial inspection for defect identification. 3. Models pre-trained on general speech data can be fine-tuned to recognize industry-specific jargon or accents. This can improve...
---
### Commercial vs. Self-Hosted LLMs: A Cost Analysis & How to Choose the Right Ones for You
As can be inferred from their name, foundation models are the foundation upon which developers build AI applications for tasks like language translation, text summarization, sentiment analysis and more. Models such as OpenAI's GPT, Google's Gemini, Meta’s Llama and Anthropic’s Claude, are pre-trained on vast amounts of text data and have the capability to understand and generate human-like language. In addition to the hype surrounding these commercial models, there's also a rising interest in self-hosted open-source LLMs, like Mistral. As businesses and organizations increasingly rely on AI for various applications, the demand for advanced foundation models is expected to continue growing, driving further innovation and competition in the market. In this blog post, we dive into the considerations to make when choosing a proprietary model* vs. self-hosting an open-source model. We also provide some example use cases that exemplify how to implement these requirements. Then, we show how these requirements are implemented in models, through Artificial Analysis’s website comparison. Finally, we provide a list of additional tools and capabilities that can help optimize LLMs even more. (*Clarification: While in some cases commercial LLMs like Cohere and AI21 can be hosted in your own environments, this is not the common use case. Throughout the blog post, when referring to commercial LLMs we mean vendor-hosted models). Commercial vs. Self-Hosted: How to Make the Choice When deciding whether to choose commercial versus self-hosted models, several key factors should be considered. These will help determine the best fit for your particular use cases or...
---
### Transforming Enterprise Operations with Gen AI
Enterprises are beginning to implement gen AI across use cases, realizing its enormous potential to deliver value. Since we are all charting new technological waters, being mindful of recommended strategies, pitfalls to avoid and lessons learned can assist with the process and help drive business impact and productivity. In this blog post, we provide a number of frameworks that can help enterprises effectively implement and scale gen AI while avoiding risk. We also include a number of use cases, from R&D to automotive to the supply chain. In the end, we list potential hurdles and how to overcome them. This blog post is based on the webinar “Transforming Enterprise Operations with Gen AI”, which was held with Dinu de Kroon, Partner and Operations Hub Lead, Nicola Unfer, Sr. Program Delivery Analyst and Davide Di Lucca, Research Science Analyst, from McKinsey, and Yaron Haviv, Co-Founder and CTO, Iguazio (acquired by McKinsey). You can watch the webinar recording here. 5 Questions to Ask and Answer About Gen AI in the Enterprise The evolution of AI began in the 1950s, but the advent of ChatGPT and other generative AI capabilities have created the “perfect storm” of AI. This revolution has been driven by the convergence of massive computing power, enabling data processing at unprecedented scale and speed; an abundance of data available through the internet for model training; and pre-trained transformers that empower us to efficiently work with unstructured data. At this point in time, it’s recommended that enterprises ask themselves five questions...
---
### Future-Proofing Your App: Strategies for Building Long-Lasting Apps
The generative AI industry is changing fast. New models and technologies (Hello GPT-4o) are emerging regularly, each more advanced than the last. This rapid development cycle means that what was cutting-edge a year ago might now be considered outdated. The rate of change demands a culture of continuous learning and technological adaptation. To ensure AI applications remain relevant, effective, secure and capable of delivering value, teams need to keep up with the latest research, technological developments and potential use cases. They also need to understand regulatory and ethical implications of deploying AI models, taking into consideration issues like data privacy, security and ethical AI use. But building gen AI pipelines and operationalizing LLMs requires significant engineering resources. How can organizations ensure their architecture remains robust, resilient, scalable and secure, so it can support up-to-date LLM deployment and management? The Solution: Designing Modular Pipelines with Swappable Components To stay relevant and future-proof, applications need to be designed with adaptability in mind. One of the most strategic approaches to addressing this need is designing your AI architecture in a modular fashion, where different components of the pipelines can be swapped out or updated as needed without overhauling the entire system. The benefits of a modular architecture include: The ability to easily upgrade components and modules in your framework with minimal adjustments. Reducing the risk associated with changes, since updates are confined to specific modules. This containment makes it easier to trial new features and roll them back if they don’t perform as...
---
### LLM Validation and Evaluation
LLM evaluation is the process of assessing the performance and capabilities of LLMs. This helps determine how well the model understands and generates language, ensuring that it meets the specific needs of applications. There are multiple ways to perform LLM evaluation, each with different advantages. In this blog post, we explain the role of LLM evaluation in AI lifecycles and the different types of LLM evaluation methods. In the end, we show a demo of a chatbot that was developed with crowdsourcing. This blog post is based on a webinar with Ehud Barnea, PhD, Head of AI at Tasq. AI, Yaron Haviv, co-founder and CTO of Iguazio (acquired by McKinsey) and Guy Lecker, ML Engineer Team Lead at Iguazio, which you can watch here. Gen AI Reference Architecture Following Established ML Lifecycles Building generative AI applications requires four main elements: Data management -
---
### Integrating LLMs with Traditional ML: How, Why & Use Cases
Ever since the release of ChatGPT in November 2022, organizations have been trying to find new and innovative ways to leverage gen AI to drive organizational growth. LLM capabilities like contextual understanding and response to natural language prompts enable the development of applications like automated AI chatbots, smart call center apps, or for financial services. Generative AI is by no means a replacement for the previous wave of AI/ML (now sometimes referred to as ‘traditional AI/ML’), which continues to deliver significant value, and represents a distinct approach with its own advantages. By integrating LLMs with traditional ML models, organizations can significantly enhance and augment each model’s capabilities, leading to new and exciting applications that bring value to their customers. In this blog post, we detail LLMs’ and ML models’ strengths, evaluate the benefits of integration and provide a number of example use cases, from advanced chatbots to synthetic data generation. In the end, we explain how MLOps can help accelerate the process and bring these models to production. LLMs vs. Classical ML Models: Strengths and Capabilities LLMs and ML models each have their distinct strengths, which can be applied to different kinds of tasks and objectives. Strengths of LLMs: Natural Language Understanding and Generation - LLMs excel at comprehending and producing human-like text. They can generate coherent and contextually relevant responses over a wide range of topics, making them ideal for applications like chatbots, content creation and language translation. Contextual Learning - With their deep learning architecture, LLMs can...
---
### LLM Metrics: Key Metrics Explained
Organizations that monitor their LLMs will benefit from higher performing models at higher efficiency, while meeting ethical considerations like ensuring privacy and eliminating bias and toxicity. In this blog post, we bring the top LLM metrics we recommend measuring and when to use each one. In the end, we explain how to implement these metrics in your ML and gen AI pipelines. Why Do We Need to Monitor LLM Metrics? Monitoring the metrics of LLMs enhances performance optimization, enables understanding user interaction and ensures ethical compliance. In more detail: Accuracy - Monitoring the outputs of the model is the main course for validating its reliability at a given task. It is the first flag that needs to be raised in case the model needs to go into another phase of development, whether it is prompt engineering the inputs to the model or fine-tuning the model itself. Resource Management - LLMs require significant computational resources. Metrics related to resource utilization help in managing these resources effectively and reduce operational costs. User Interaction - Monitoring metrics related to user interactions helps in understanding how users engage with the model. These insights can guide enhancements in user experience, making the model more intuitive and responsive to user needs. Ethical Compliance and Bias Reduction - Monitoring metrics related to the ethical use of LLMs ensures the trustworthiness of the model. This is important for preventing incomplete and incorrect responses, responses with the wrong tone, violation of privacy or ethical standards (like preventing ePHI leakage...
---
### Generative AI in Call Centers: How to Transform and Scale Superior Customer Experience
Customer care organizations are facing the disruptions of an AI-enabled future, and gen AI is already impacting customer care organizations across use cases like agent co-pilots, summarizing calls and deriving insights, creating chatbots and more. In this blog post, we dive deep into these use cases and their business and operational impact. Then we show a demo of a call center app based on gen AI that you can follow along. For more details on this topic, you can watch the webinar this blog post is based on. Oana Cheta, Partner and Lead Gen AI Service Ops for North America at McKinsey & Company and Yaron Haviv, Co-Founder and CTO of Iguazio (acquired by McKinsey), share insights, examples and more details. Watch here. The 3 Driving Forces of AI Impact AI is broadly redefining work and the economy. The three main areas of impact are: 1. Dramatic productivity gains in how companies are run. This includes improving functions like coding, customer interactions, creative work and content synthesis. 2. Transformation of products to change how customer needs are met. This includes new enhanced features, like conversational interfaces, co-pilots and hyper-personalization. 3. A redistribution of profit pools through a new layer of AIaaS in the value chain. This could enable solutions like ChatGPT that replace entire value chains, low-cost native AI startups that quickly create new solutions at low costs and simplified onboarding and migration processes between products. How Are Industries Being Transformed with Gen AI? As a result, entire industries could...
---
### Why You Need GPU Provisioning for GenAI
GPU provisioning serves as a cost-effective solution for organizations who need more GPUs for their ML and gen AI operations. By optimizing the use of existing resources, GPU provisioning allows organizations to build and deploy their applications, without waiting for new hardware. In this blog post, we explain how GPU provisioning as a service works, how it can close the GPU shortage gap, when to use GPU provisioning and how it fits with gen AI. How Companies are Dealing with the GPU Shortage Organizations need GPUs to be able to process large amounts of data simultaneously, speed up computational tasks and handle specific applications like AI, data visualization and more. This need is likely to grow as the demand for computational power and real-time processing increases across industries. However, growing demand is meeting a lack of supply, and organizations are encountering obstacles when attempting to purchase GPUs, with years-long waitlists. As a result, companies are hustling their way to GPU access. They’re trying to leverage their industry connections and reach out to anyone who might help them gain access to GPUs. This includes seeking assistance through professional networks, applying for government grants and forming partnerships with cloud providers to secure access to GPUs for AI ventures. Others are more creative, setting up initiatives like renting out GPUs or trying to repurpose other hardware. The unexpected wide adoption of gen AI has further exacerbated these attempts. Companies across industries are looking to implement LLMs in their operations, which require GPUs...
---
### Best 10 Free Datasets for Manufacturing [UPDATED]
The manufacturing industry can benefit from AI, data and machine learning to advance manufacturing quality and productivity, minimize waste and reduce costs. With ML, manufacturers can modernize their businesses through use cases like forecasting demand, optimizing scheduling, preventing malfunctioning and managing quality. These all significantly contribute to bottom line improvement. In times of global recession, supply chain cut-offs and difficulties meeting consumer demands for materials and products, manufacturing optimization becomes even more important for companies that wish to remain competitive and relevant without impairing their revenue streams. How can manufacturers develop, grow and optimize their use of data and ML? Open and free datasets for machine learning are an important starting point for data scientists and engineers who are developing and training ML models for manufacturing. But these datasets for manufacturing can be hard to come by, since manufacturing often takes a legacy approach and data is not always available. Here are 10 excellent open manufacturing datasets and data sources for manufacturing data for machine learning. 1. Eurostat Industrial Production Index The output and activity of the European industry sector, measured on a monthly basis. The dataset’s base year is 2015 and depicts monthly growth rates. Get the dataset here. 2. US Manufacturing Trends Manufacturing trends in the US related to wage rates, profits, employment, production, capacity utilization, productivity, exports and shipments. The dataset provides information for the present and year-to-date. Get the dataset here. 3. here. 4. Personal Protective Equipment Computer Vision Dataset and Model A dataset and model...
---
### Implementing Gen AI for Financial Services
Gen AI is quickly reshaping industries, and the pace of innovation is incredible to witness. The introduction of ChatGPT, Microsoft Copilot, Midjourney, Stable Diffusion and many more incredible tools have opened up new possibilities we couldn’t have imagined 18 months ago. While building gen AI application pilots is fairly straightforward, scaling them to production-ready, customer-facing implementations is a novel challenge for enterprises, and especially for the financial services sector. Risk, compliance, data privacy and escalating costs are just a few of the acute concerns that financial services companies are grappling with today. This blog post will discuss: The potential impact of generative AI for financial services The challenges of deploying LLMs in production Which engineering and risk-related considerations financial services companies need to take to successfully implement gen AI in their business environments. To learn more, watch the webinar “Implementing Gen AI for Financial Services” with Larry Lerner, Partner & Global Lead - Banking and Securities Analytics, McKinsey & Company, and Yaron Haviv, Co-founder and CTO, Iguazio (acquired by McKinsey), which this blog post is based on. View the entire webinar here. What is the Potential Value of Gen AI and Analytics for Financial Services? The potential annual value of AI and analytics for global banking could reach as high as $1 trillion. But the evolution from analytical Al to generative Al has led to major advancements in the power of advanced analytics. Gen AI has the potential to deliver significant incremental value, potentially leading to 3-5% margin improvements. These...
---
### LLMOps vs. MLOps: Understanding the Differences
Data engineers, data scientists and other data professional leaders have been racing to implement gen AI into their engineering efforts. But a successful deployment of LLMs has to go beyond prototyping, which is where LLMOps comes into play. LLMOps is MLOps for LLMs. It’s about ensuring rapid, streamlined, automated and ethical deployment of LLMs to production. This blog post delves into the concepts of LLMOps and MLOps, explaining how and when to use each one. To read more about LLMOps and MLOps, checkout the O’Reilly book “Implementing MLOps in the Enterprise”, authored by Iguazio’s CTO and co-founder Yaron Haviv and by Noah Gift. What is LLMOps? LLMOps (Large Language Model Operations), is a specialized domain within the broader field of machine learning operations (MLOps). LLMOps focuses specifically on the operational aspects of large language models (LLMs). LLM examples include GPT, BERT, and similar advanced AI systems. LLM models are large deep learning models that are trained on vast datasets, are adaptable to various tasks and specialize in NLP tasks. They are characterized by their enormous size, complexity, and the vast amount of data they process. These elements need to be taken into consideration when managing, streamlining and deploying LLMs in ML pipelines, hence the specialized discipline of LLMOps. Addressing LLM risks is an important part of gen AI productization. These risks include bias, IP and privacy issues, toxicity, regulatory non-compliance, misuse and hallucination. Mitigation starts by ensuring the training data is reliable, trustworthy and adheres to ethical values. How Does...
---
### Implementing Gen AI in Practice
Across the industry, organizations are attempting to find ways to implement generative AI in their business and operations. But doing so requires significant engineering, quality data and overcoming risks. In this blog post, we show all the elements and practices you need to to take to productize LLMs and generative AI. You can watch the full talk this blog post is based on, which took place at ODSC West 2023, here. Definitions: Foundation Models, Gen AI, and LLMs Before diving into the practice of productizing LLMs, let’s review the basic definitions of GenAI elements: Foundation Models (FMs) - Large deep learning models that are pre-trained with attention mechanisms on massive datasets. These models are adaptable to a wide variety of downstream tasks, including content generation. GenAI / Generative AI - The methods used to generate content with algorithms. Typically, foundation models are used. Large Language Models (LLMs) - A type of foundation model that can perform a variety of NLP tasks. These include generating and classifying text, answering questions in a conversational manner and translating text from one language to another The Landscape: The Race to Open and Commercial LLMs As mentioned, gen AI is garnering industry-wide interest. This has encouraged rapid evolution and intense competition between organizations. New and innovative solutions are constantly emerging, making it hard to pinpoint a single "winning" solution. One notable trend in this space is the rise of specialized or application-specific LLMs. Unlike generalized models, these models are becoming more vertical-oriented, catering to specific...
---
### How HR Tech Company Sense Scaled their ML Operations using Iguazio
Sense is a talent engagement company whose platform improves the recruitment processes with automation, AI and personalization. Since AI is a central pillar of their value offering, Sense has invested heavily in a robust engineering organization including a large number of data and AI professionals. This includes a data team, an analytics team, DevOps, AI/ML, and a data science team. The AI/Ml team is made up of ML engineers, data scientists and backend product engineers. The Challenge Like many organizations, the AI/ML team at Sense was finding it challenging to scale its ML operations. This was mainly due to three factors: Complexity when managing multiple projects and experiments - Sense had to determine the best strategy for conducting and controlling all their projects and versions, at scale. The need for speed while supporting developer efficiency - Sense needed to ensure fast time-to-market while managing resources efficiently. Establishing a deployment and monitoring strategy - Sense needed to create a sound deployment and monitoring strategy in a cost-effective and straightforward manner. The Solution Sense chose Iguazio as their MLOps solution. Iguazio is an essential component in Sense’s MLOps and DataOps architecture, acting as the ML training and serving component of the pipeline. With Iguazio, Sense’s ML team members can pull data, analyze it, train and run experiments, making the process automated, scalable and cost-effective. “With Iguazio, data scientists and ML engineers start having superpowers. ” Gennaro Frazzingaro, Head of AI/ML at Sense. Iguazio: A Key Component in Sense’s AI/ML and Data Stack...
---
### What Lays Ahead in 2024? AI/ML Predictions for the New Year
2023 was the year of generative AI, with applications like ChatGPT, Bard and others becoming so mainstream we almost forgot what it was like to live in a world without them. Yet despite its seemingly revolutionary capabilities, it's important to remember that Generative AI is an extension of “traditional AI”, which in itself is a step in the digital transformation revolution. This means that in 2024, we’re likely to see businesses continue to seek ways to adopt generative AI as a way to enhance their operations. But this year, businesses will go beyond the hype. They will focus their resources on optimizing and adapting generative AI, and other AI technologies, attempting to turn them into a driving force for the business. For data science practitioners, productization is key, just like any other AI or ML technology. Successful demos alone just won’t cut it, and they will need to take implementation efforts into consideration from the get-go, and not just as an afterthought. Considerations such as reducing and controlling potential risks, cost effectiveness, scalability, modularity and extensibility, and continuous operations must be part of any implementation. What will be different in the way businesses approach generative AI this year? What are their expectations from this hyped technology? Will generative AI continue to be one of the hottest topics in 2024 as well? Here are my predictions for the upcoming year: 1. Beyond the Demo: From Prototyping to Generating Business Value The excitement and attention surrounding generative AI is well-deserved, considering...
---
### 16 Best Free Human Annotated Datasets for Machine Learning
Successfully training AI and ML models relies not only on large quantities of data, but also on the quality of their annotations. Data annotation accuracy directly impacts the accuracy of a model and the reliability of its predictions. This is where human-annotated datasets come into play. Human-annotated datasets offer a level of precision, nuance, and contextual understanding that automated methods struggle to match. In this blog post, we bring the top 16 free human-annotated datasets you can use for your model training and evaluation. To cater to a wide variety of needs, these free datasets for machine learning cover a diverse set of categories and use cases: Sentiment analysis Language and docs For ethical AI use Images and videos When training and developing your models, don’t neglect the final phase - a deployed LLM (or other type of model). Use MLOps solutions to ensure the process is automated, streamlined, scalable and iterative. This will ensure the successful implementation of your model. To learn more about how to build and scale your pipelines, click here. What are Human-annotated Datasets? Human-annotated datasets are data records that have been annotated by humans. This means that humans have added information, like labels or tags. For example, humans can provide inputs for categorization, sentiment analysis, bounding boxes for images, etc. Human annotation helps advance ML and AI model training and evaluation. By providing the ground truth for models, algorithms can understand patterns and make better predictions on new, unseen data. As such, human annotation is...
---
### Introducing our New Book: Implementing MLOps in the Enterprise
Introducing The New O'Reilly Book: Implementing MLOps in the Enterprise “Implementing MLOps in the Enterprise: A Production-First Approach” is a practical guide, authored by MLOps veterans Yaron Haviv and Noah Gift and published by O’Reilly, which guides leaders of data science, MLOps, ML engineering and data engineering on how to bring data science to life for a variety of real-world MLOps scenarios, including for generative AI. Drawing from their extensive experience in the field, the authors share their strategies, methodologies, tools and best practices for designing and building a continuous, automated and scalable ML pipeline that delivers business value. With practical code examples and specific tool recommendations, the book empowers readers to implement the concepts effectively. After reading the book, ML practitioners and leaders will know how to deploy their ML models to production and scale their AI initiatives, while overcoming the challenges many other businesses are facing. Who This Book Is For This book is for practitioners in charge of building, managing, maintaining, and operationalizing the ML process end to end: Data science / AI / ML leaders: Heads of Data Science, VPs of Advanced Analytics, AI Lead etc. Data scientists Data engineers MLOps engineers / Machine learning engineers This book can also be valuable for technology leaders who want to efficiently scale the use of ML and generative AI across their organization, create AI applications for multiple business use cases, and bridge organizational and technological silos that prevent them from doing so today: CIOs CTOs CDOs Finally, this...
---
### Scaling MLOps Infrastructure: Components and Considerations for Growth
An MLOps platform enables streamlining and automating the entire ML lifecycle, from model development and training to deployment and monitoring. This helps enhance collaboration between data scientists and developers, bridge technological silos, and ensure efficiency when building and deploying ML models, which brings more ML models to production faster. When it comes to scaling your MLOps operations, a high-quality, reliable and effective MLOps platform is essential for growth. Some organizations might opt to build one themselves, while others will buy a commercial solution and yet a third group will take a hybrid approach that combines both building and buying. In this blog post, we explore what is required from you for each option and provide tools that will help you make the right choice for your organization to scale your ML and AI activities. Scaling MLOps with Custom Infrastructure Building an MLOps platform might be the right choice for your organization’s growth plans in certain scenarios. Here are some considerations to take into account when you’re contemplating whether to build such a platform. Specialized Requirements - If your ML workflows have unique requirements that off-the-shelf solutions can't address, building your own platform will enable you to customize the solution to your specific needs. Complex Integrations with Existing Systems - If you have a complex ecosystem of tools and databases, building your own MLOps platform may make it easier to achieve seamless integration, as opposed to adapting a commercial solution and getting it to fit into your existing architecture. Data Privacy...
---
### 11 Best Free Retail Datasets for Machine Learning
The retail industry has been shaped and fundamentally transformed by disruptive technologies in the past decade. From AI assisted customer service experiences to advanced robotics in operations, retailers are pursuing new technologies to address margin strains and rising customer expectations. AI use cases like personalized product recommendations, demand forecasting for optimized inventory and supply chain management, optimized pricing strategies based on market dynamics, and sales forecasting are generating value for companies who have adopted AI. By leveraging AI, retailers can maintain or increase their competitiveness in a saturated market. How can retailers use, grow and optimize their use of data and machine learning? For data scientists tasked with building and training machine learning models for retailers, open and free retail datasets are an important starting point. But these datasets for retailers can be hard to come by, since they include personal customer information and business competitive information, which is why not many retailers share this data. This blog post is here to help. Here are 11 excellent open datasets and data sources for retailer data for machine learning. Customer Behavior and Items E-commerce data from a real website that includes customer behavior data, item properties and a category tree. The behavior data includes events like clicks, add to cart and transactions and was collected over a period of four months and a half. Get the dataset here. E-Commerce Sales Data A comprehensive dataset with sales data across channels and financial information. Data includes SKUs, design numbers, stock levels, product...
---
### How to Build a Smart GenAI Call Center App
Building a smart call center app based on generative AI is a promising solution for improving the customer experience and call center efficiency. But developing this app requires overcoming challenges like scalability, costs and audio quality. By building and orchestrating an ML pipeline with MLRun, which includes steps like transcription, masking PII and analysis, data science teams can use LLMs to analyze audio calls from their call centers. In this blog post, we explain how. The Benefits of Generative AI for Customer Support The number of calls to call centers is increasing, creating a need for improving call center efficiency. AI solutions can help improve operational efficiency by automating processes, streamlining workflows and providing real-time insights. This can help organizations reduce costs, improve customer service and make better decisions. For example, AI solutions can reduce customer frustrations by documenting call details, providing contextual recommendations and analyzing the sentiment of calls. This data can then be leveraged by downstream applications that improve the customer experience, like live agent support, customer profiles, auto-generated content, tailored recommendations, customized offers and more. Challenges of Developing a Call Center Generative AI App Developing a generative AI call center application can improve call center performance, but development comes with its own set of challenges. These include: LLMs for Non-English Languages - LLM performance for non-English languages is usually considered poor. If the call center accepts calls in non-English languages, there needs to be a technological solution to address the gap. Audio Quality - The audio quality...
---
### Top 27 Free Healthcare Datasets for Machine Learning
Machine Learning is revolutionizing the world of healthcare. ML models can help predict patient deterioration, optimize logistics, assist with real-time surgery and even determine drug dosage. As a result, medical personnel are able to work more efficiently, serve patients better and provide higher quality healthcare. When developing and training machine learning models for healthcare, open and free datasets are an essential starting point for data scientists and engineers, and they can be hard to come by. Here are 22 excellent open datasets for healthcare machine learning: General Healthcare, Medical and Life Sciences Datasets 1. WHO Global Health Observatory (GHO) resources by the WHO (World Health Organization). The GHO includes data sets and reports from 194 countries on a wide variety of topics. Health topics include mortality, child nutrition, water and sanitation, HIV/AIDS, health systems, injuries, and more. 2. DHS Program Medical datasets from the DHS (Democratic and Health Services) Program spanning multiple topics. These datasets include data from around the globe, both from individual countries as well as cross-country comparisons. They are based on surveys, biomarker testing and geographic data. 3. HealthData. gov The official US government healthcare website, which includes multiple datasets of the US population. Dataset topics range from COVID-19 to health equity, and more. 4. Life Science Database Archive A life science dataset from Japan, gathered by life scientists over long periods of time. Includes datasets about organs, antigens, chemicals and more. 5. Data. gov. au The official source of Australian open government data. Includes all Australian...
---
### Top 10 ODSC West Sessions You Must Attend in 2023
ODSC West 2023, one of the leading AI conferences, will take place this year from Oct. 30 to Nov. 2, in San Francisco and virtually. This time, 250 speakers will be sharing 300 hours of valuable content. Sessions will be spread across multiple tracks: NLP and LLMs, MLOps, Generative AI, Machine Learning, Responsible AI, and more. Don’t miss this opportunity to hear from top speakers, get hands-on training, see demos from the latest innovators in the field and network with the data science community. 300 hours of content is a lot, which makes it tough to choose the sessions you’d like to attend. To help, we put together a list of our top 10 recommended sessions. We chose them because they provide practical information that can be implemented today in your environments and organizations, along with a forward-thinking approach that discusses ideas, considerations and opportunities for the future. As can be expected, LLMs and Generative AI are attracting a lot of attention this year, and our list includes sessions about those topics as well. Here are our top recommended 10 sessions: 1. Causality and LLMs Wed. , Nov. 1, 11:20am - 11:50am Robert Osazuwa Ness, PhD Senior Researcher, Microsoft Following the rapid development of LLMs, innovative opportunities for causal analysis have emerged. In this workshop, participants will learn how to effectively use LLMs in complex causal models and in the field of causal AI. First, it will explain how to extract and interpret causal knowledge from LLMs. Then, it will...
---
### 28 Best Free NLP Datasets for Machine Learning
NLP is a field of AI that enables machines to understand, interpret, and generate human language in a way that is both meaningful and contextually relevant. Recently, ChatGPT and similar applications have created a surge in consumer and business interest in NLP. Now, many organizations are trying to incorporate NLP into their offerings. To help with these efforts, we’ve compiled a list of the top NLP datasets for NLP projects that data scientists and data professionals can use for training their models. This list is a starting point for training your NLP models. The list is divided into a number of groups and types: Q&A Reviews and Ratings Sentiment Analysis Synonyms Emails Long-form Content Audio You can use these datasets for a number of use cases, like creating personal assistants, automating customer service, language translation, and more. The sky's the limit! When planning how to train your NLP models with NLP training datasets, it's important to start with the end in mind — with deployment. To learn more about how to build and scale your NLP pipelines, click here. Now let’s dive into the list: Q&A 1. Stanford Question Answering Dataset (SQuAD) A reading comprehension dataset, comprising pairs of questions and answers based on Wikipedia articles. Get the dataset here. 2. Jeopardy Questions A JSON file with 216,930 Jeopardy questions, answers and additional data like the air date. Get the dataset here. 3. The WikiQA Corpus Question and answer pairs that link to Wikipedia pages with the answer. The data...
---
### How to Mask PII Before LLM Training
Generative AI has recently emerged as a groundbreaking technology and businesses have been quick to respond. Recognizing its potential to drive innovation, deliver significant ROI and add economic value, business adoption is rapid and widespread. They are not wrong. A research report by Quantum Black, AI by McKinsey, titled "The Economic Potential of Generative AI”, estimates that generative AI could unlock up to $4. 4 trillion in annual global productivity. However, GenAI usage also needs to be fair, compliant and ethical. One of the main safety concerns is about accidental leaking of PII (Personal Identifiable Information), compromising individuals’ privacy. In this post, we share the open source solution that can help identify and mask PII information: The PII Recognizer. The Challenge: Masking PII PII (Personally Identifiable Information) is any information that can be used to identify an individual. This could include their name, address, phone number, social security number, or credit card number. One of the challenges businesses face when implementing GenAI and training LLMs is accidental exfiltration of PII (Personal Identifiable Information). LLMs are trained on large datasets of text and code. If this data contains PII, it becomes part of the models’ training dataset. This means it is incorporated into the models and can be used when generating future responses to any user. This could result in a data breach, the compromising of individual privacy and breaking compliance regulations, among other implications. There is also the risk of data retention. Even if the data isn't immediately shared...
---
### Model Observability and ML Monitoring: Key Differences and Best Practices
AI has fundamentally changed the way business functions. Adoption of AI has more than doubled in the past five years, with enterprises engaging in increasingly advanced practices to scale and accelerate AI applications to production. As ML models become increasingly complex and integral to critical decision-making processes, ensuring their optimal performance and reliability has become a paramount concern for technology leaders. This is where model observability and ML monitoring step in, playing a pivotal role in empowering organizations to gain comprehensive visibility into their ML models' behavior and performance in real-world applications. In this article, we delve into the fundamental distinctions between model observability and ML monitoring, shedding light on their unique attributes and functionalities. Moreover, we explore the best practices that enable data scientists and ML engineers to harness the full potential of these practices, ensuring seamless model deployment, rapid issue detection, and continual enhancement. What is Model Monitoring? Model monitoring is a fundamental practice in machine learning that focuses on the systematic observation and evaluation of ML models during their deployment and operation in live applications. As ML models are employed in critical decision-making processes across various domains, ensuring their continued effectiveness and reliability is extremely important. Model monitoring involves the continuous collection and analysis of key performance metrics related to the ML model's behavior, accuracy, and overall performance. These metrics may include prediction accuracy,
---
### Implementing MLOps: 5 Key Steps for Successfully Managing ML Projects
MLOps accelerates the ML model deployment process to make it more efficient and scalable. This is done through automation and additional techniques that help streamline the process. Looking to improve your MLOps knowledge and processes? You’ve come to the right place. In this blog post, we detail the steps you need to take to build and run a successful MLOps pipeline. What is MLOps? MLOps (Machine Learning Operations) is the set of practices and techniques used to efficiently and automatically develop, test, deploy, and maintain ML models and applications and data in production. An extension of DevOps, MLOps streamlines and monitors ML workflows. With MLOps, organizations can ensure ML projects are delivered efficiently and consistently, while overcoming challenges related to deploying, scaling and maintaining ML models, as well as silos between teams in the organization. As a result, with MLOps - ML models can be deployed much faster to production so they can bring business value. MLOps pipelines support a production-first approach. This means that they are designed to automatically deploy models to production at scale, starting from data collection and all the way to model monitoring. The Challenges MLOps Solves MLOps processes ensure ML models are brought to production and can bring business value. Implementing MLOps solves the following challenges: Siloed Teams - Before MLOps, data scientists, data engineers and DevOps used to work in silos and with different tools and frameworks. Consequently, the models needed to be technologically converted across the different stages, from the lab to production....
---
### MLOps for Generative AI in the Enterprise
Generative AI has already had a massive impact on business and society, igniting innovation while delivering ROI and real economic value. According to research by QuantumBlack, AI by McKinsey, titled “The economic potential of generative AI”, generative AI use cases have the potential to add $2. 6T to $4. 4T annually to the global economy. This potential spans more than 60 use cases across all industries. 75% of this impact is focused on marketing and sales, software engineering, customer operations and product and R&D. This is because generative AI has the potential to automate work activities that absorb 60% - 70% of employees’ time today, and especially mundane activities. In this post, we’ll dive into the potential of generative AI, see how organizations should approach leveraging Large Language Models (LLMs) in live business applications. We’ll also discuss how to do it responsibly by embedding Responsible AI principles into generative AI while taking a human-centered approach. This article is based on our recent webinar “MLOps for Generative AI” with guests Nayur Khan, partner at QuantumBlack, AI by McKinsey, Mara Pometti, associate design director, McKinsey & Company, and Yaron Haviv, CTO and co-founder of Iguazio. You can watch the webinar here. The Power and Promise of Generative AI In less than nine months, generative AI has dominated the tech landscape. Tools like Midjourney, Stable Diffusion, Dall-E 2, ChatGPT, Bard and the AI-powered version of Bing have burst onto the scene, revolutionizing our technological capabilities as consumers and business users. They have also...
---
### Mastering ML Model Performance: Best Practices for Optimal Results
Evaluating ML model performance is essential for ensuring the reliability, quality, accuracy and effectiveness of your ML models. In this blog post, we dive into all aspects of ML model performance: which metrics to use to measure performance, best practices that can help and where MLOps fits in. Why Evaluate Model Performance? ML model evaluation is an essential part of the MLOps pipeline. By evaluating models, data professionals can assess the reliability, accuracy, quality and effectiveness of ML models and ensure they meet the desired technological and business objectives and requirements. In other words, this means ensuring the model answers the business use case as expected and should continue to be deployed in production. In some cases, model performance evaluation can also help meet compliance standards. ML Model Performance Metrics Different metrics can be used to evaluate the performance of ML models. They vary according to the model type and use cases. Therefore, when monitoring the performance of your ML models, choose the relevant ones for your needs. Some of the most common performance metrics for machine learning models include: Classification Model Metrics A classification model is a model that is trained to assign class labels to input data based on certain patterns or features. Common metrics include: Accuracy - The ratio of correctly classified instances to the total number of instances in the dataset. Precision - The proportion of true positive predictions (correctly predicted positive instances) to the total number of positive predictions. Recall (Sensitivity or True Positive Rate)...
---
### What are the Advantages of Automated Machine Learning Tools?
AutoML (Automated Machine Learning) helps organizations deploy Machine Learning (ML) models faster, by making the ML pipeline process more efficient and less error-prone. If you’re getting started with AutoML, this article will take you through the first steps you need to find a tool and get started. If you’re at an advanced stage, it will help you validate you’re on the right track. After reading this article, you’ll have a good understanding of the benefits of AutoML, the available tools and how to choose the right tool for your industry. What is Automated Machine Learning? Automated Machine Learning (AutoML) is the automation of part of or all of the ML pipeline, in order to increase productivity and accuracy and eliminate organizational silos. ML pipeline stages that can be automated include model selection, the selection and parametrization of the ML algorithm for model training, data preprocessing, feature engineering, hyperparameter tuning and deployment. Tasks that can be automated include the selection of appropriate algorithms and feature sets, scaling numerical features, handling missing values, encoding categorical variables, testing and hypertuning parameters. By automating the complex and time-consuming tasks and activities required for building and deploying ML models, machine learning automation creates standardization. This makes these expert-created processes accessible to a wider range of users and increases efficiency. As a result, AutoML accelerates the development and deployment of ML models and makes them useful for business use cases. What are the Advantages of Automated Machine Learning Tools? AutoML tools have become increasingly popular in...
---
### Integrating MLOps with MLRun and Databricks
How to train on Databricks and deploy with MLRun Every organization aiming to bring AI to the center of their business and processes strives to shorten machine learning development cycles. Even data science teams with robust MLOps practices struggle with an ecosystem that is in a constant state of change and infrastructure that is itself evolving. Of course, no single MLOps stack works for every use case or team, and the scope of individual tools and platforms vary greatly. Databricks is a hugely popular data platform with core ML capabilities, reportedly used by over 5000 enterprises. But for use cases requiring automation and scale, or complex use cases with real-time applications, data science teams may need some extra tooling for model serving. To understand why you might want that extra tooling, consider a complex but fairly common scenario: fetching historical data for the training side, with the security requirement that live data remains on-prem. Databricks does offer real-time model serving capabilities, but by using MLRun real-time serving with the complex serving graph, you can deploy the real-time serving with pre-processing, model inference, post-processing in the same graph and scale different part of the graph independently. In this blog post, I’ll show how to integrate models trained in Databricks and perform model serving with open-source tool MLRun. What is MLRun and the Iguazio MLOps Platform? MLRun is a popular open-source MLOps framework that streamlines machine learning projects from data collection, experimentation, model development, feature creation, production model serving deployment and model...
---
### Deploying Machine Learning Models for Real-Time Predictions Checklist
Deploying trained models takes models from the lab to live environments and ensures they meet business requirements and drive value. Model deployment can bring great value to organizations, but it is not a simple process, as it involves many phases, stakeholders and different technologies. In this article, we provide recommendations for data professionals who want to improve and streamline their model deployment process. This list is based on our experience deploying models for large and small organizations across the globe. Overview of Deploying a Machine Learning Model Deployment processes will differ between organizations, depending on their technologies, environments and use cases. However, in most organizations, there are a few steps that will always repeat themselves. These steps include: Packaging: Adding dependencies and parameters, running scripts and performing builds. Scaling: Managing load balancing, data partitions, model distribution and AutoML. Tuning: Performing data parallelism, managing GPUs, tuning queries and caching. Instrumentation: Monitoring, logging, versioning and managing security. Automation: Of CI/CD, workflows, rolling upgrades and A/B testing. To minimize friction and errors and increase the chance of deployment success, it is recommended to streamline the machine learning deployment process by automating it as much as possible. An efficient and comprehensive model deployment framework will go a long way to support this. Data Preparation One of the most important steps that support successful model deployment is data preparation. In this stage, the data is pre-processed so it can be used in the deployed model. Proper data preparation ensures the model provides accurate and...
---
### Top 7 ODSC East Sessions You Can’t Afford to Miss
ODSC East is one of the leading data science conferences. This year, it’s taking place in Boston and virtually, between May 9 and 11. The event will be packed with valuable content, with 250 speakers across tracks like Machine Learning, Hands-on Training, Deep Learning, NLP, Responsible AI, and many more. Don’t miss this opportunity to stay up-to-date on the latest technological advancements, build your job skills, and network peers. With so many options, identifying the sessions you’d like to attend is challenging. To help, we gathered this list of our top seven recommended sessions. We chose them based on their ability to provide practical, yet innovative, ways to solve challenges encountered by data professionals. Here are our top recommended 8 sessions: here. 2. Keynote: Infuse Generative AI in your apps using Azure OpenAI Service Tue. , May 9, 9:40am - 10:10am Eve Psalti, Principal Group Program Manager, Microsoft Azure OpenAI Service can help organizations apply AI models like Dall-E 2, GPT-3. 5, Codex and ChatGPT to language-related applications they are building. The platform can help improve efficiency and mitigate risks through capabilities like security, privacy controls, geo-diversity, content filtering and responsible AI. Eve Psalti from Microsoft will explain how. Read the full abstract here. here. 4. Containers + GPUs In Depth Wed. , May 10, 3:35pm - 4:20pm Emily Curtin, Staff MLOps Engineer, Intuit Mailchimp Emily Curtin from Intuit Mailchimp was able to tackle the challenge of connecting abstract containerized processes to hardware and scaling that process across people and...
---
### Kubeflow Vs. MLflow Vs. MLRun: Which One is Right for You?
The open source ML tooling ecosystem has become vast in the last few years, with many tools covering different aspects of the complex and expansive process of building, deploying and managing AI in production. Some tools overlap in their capabilities while others complement each other nicely. In part because AI/ML is still an emerging and ever-evolving practice, the messaging around what all these tools can accomplish can be quite vague. In this article, we’ll dive into three tools to better understand their capabilities, the differences between them, and how they fit into the ML lifecycle. Kubeflow, MLflow, and MLRun are popular open source tools in the ML ecosystem with similar attributes, that actually address different facets of the ML lifecycle. How are they alike? Each of them enables cross-functional collaboration, with some level of parameter, artifact, and model tracking. Each is supported and maintained by major players in the AI industry. Each of them addresses data science and experimentation requirements While there are a handful of feature-level similarities among these three tools, it’s important to understand that they each solve different challenges, and one is not a true replacement for the other. What is Kubeflow? Kubeflow, started as an internal tool at Google and is now a multi-architecture, multi-cloud framework, which is described as the ML toolkit for Kubernetes. It provides a system for managing ML components on top of Kubernetes and acts as the bricks and mortar for model development, with a focus on the automation, scaling and tracking...
---
### How Seagate Runs Advanced Manufacturing at Scale With Iguazio
Transforming Manufacturing at Seagate: Using AI to Detect Defects on the Factory Floor Seagate wanted to leverage their vast petabytes of sensor and image data to reduce chip manufacturing times and capital costs and maintain quality, but faced challenges: Efficient data processing methods (compute and orchestration) while handling petabytes of real-time data Long manufacturing cycles, with 40% of time spent on measurement Sampling, logistics and time limitations With Iguazio, they were able to transition from a manual inspection process via microscope to fully automated deep learning and computer vision inspection. By leveraging fully managed data engineering and automation, they were able to: Doubled first-pass wafer yield and eliminated contamination by automating measurement and implementing deep learning visual inspection steps Reduced manufacturing steps by 40%, thereby reducing capital costs by modularizing and streamlining data engineering processes. Improved cloud usage efficiency by 6x. Business Background Seagate is the world’s leading data storage solution. Together with Iguazio, Seagate is able to manage data engineering at scale while harnessing petabytes of data, efficiently utilize resources, bridge the gap between data engineering and data science and create one production-ready environment with enterprise capabilities. In this new webinar, Vamsi Paladugu, Sr. Director of Lyve Cloud Analytics at Seagate explains about Seagate’s partnership with Iguazio, shares how Iguazio helped them overcome business and technical challenges and dives into a real-life use case where Iguazio helped leverage ML to improve productivity and cut costs. Watch the webinar here. About Seagate Seagate is the data storage industry leader, powering...
---
### McKinsey Acquires Iguazio: Our Startup’s Journey
8 years ago, when I founded Iguazio together with my co-founders Yaron Haviv, Yaron Segev & Orit Nissan-Messing, I never thought I would be making this announcement on our company blog: McKinsey acquired Iguazio! When we first embarked on this journey, we realized that while AI has the ability to transform any industry - from banking to retail to manufacturing - in reality most data science projects fail. In fact, according to McKinsey research, over $490B have been invested in AI between the years 2012-2021, with little return. Only one out of ten attempts to deploy AI in enterprise-level operations succeeds - 90% of AI projects never even make it out of the lab. This gap between the great potential of AI and its actual business impact was our north star as we built and expanded the Iguazio MLOps Platform, adding features and functionality to automate, accelerate and simplify the data science process. We set on a mission to change this reality for enterprises, and I’m happy to say we’ve been fortunate enough to work with some incredible clients and see some great success stories with results such as a 12X acceleration of the data science process, 50% reduction in cost and 90% reduction in code. Managing a startup is a roller coaster – The highest highs and the lowest lows. From tremendous technological achievements, incredible partnerships, industry alliances and events with standing-room-only - to a global pandemic, economic recession, and everything in between. It’s been...
---
### HCI’s Journey to MLOps Efficiency
HCI (Home Credit International) is a global consumer financial provider. As leaders in their space, they identified the potential of ML models in financial institutions, and especially in risk-related use cases. However, deploying ML models in enterprises is not always an efficient process: time to delivery is long and access to data is limited. Jiri Steueur, Enterprise Architect at HCI, recently joined us for a webinar to share his top tips and ideas for achieving MLOps efficiency. In this blog post, we bring a concise overview of his findings. To see Steuer’s entire presentation, including an in-depth description of an MLOps efficiency solution, you can watch the entire webinar here. Machine Learning Use Cases in Financial Institutions There is high potential for ML in financial institutions, especially with regards to risk-related requirements. Some of the top use cases include: Campaign management - Identifying the most fitting consumers for products Generating offers - Defining default and pre-approval limits Risk-based pricing - Calculating prices while balancing attractive products with product security Next best offer (cross selling) with the prediction of the next best offer Next best offer with the ability to decrease insurance-based risks Penalties when servicing products Payment behavior, calendars and promise to pay during collection processes Anti-fraud to protect clients, through device fingerprinting, mobile device scanning, malware detection, and more Creating a client behavioral profile based on predictions from data mining Improving ML Efficiency Yet, despite the value of ML, and according to HCI’s internal research, nearly 80% of the...
---
### Distributed Feature Store Ingestion with Iguazio, Snowflake, and Spark
Enterprises who are actively increasing their AI maturity in a bid to achieve business transformations often find that with increased maturity comes increased complexity. For use cases that require very large datasets, the tech stacks required to meet business needs quickly become unwieldy. AI services that involved predictive maintenance, fraud detection, and NLP serve critical business functions, but under the hood, the data wrangling required is a major bottleneck for the teams tasked with delivering business value. Distributed ingestion is one piece of the data complexity puzzle. With distributed ingestion tools like Spark, data science teams can increase scalability to process large amounts of data quickly and efficiently, improve performance by speeding up the overall time it takes to ingest and process the data, and leverage a simple API to abstract away much of the complexity of using multiple machines. But distributed ingestion tool is just one component of an end–to–end ML pipeline, so it needs to work together with all the other parts. A tech stack for a particularly complex use case could include components like a feature store, a data lake and much more. Adding a tool to the stack is never simple--integrating and maintaining multiple services can become an overwhelming task for engineering teams. For enterprises who are deploying AI at scale, a feature store has become a critical component. The Iguazio feature store is integrated with popular tools such as Spark, Dask, Snowflake, BigQuery, and more. In addition to integration with external tools,...
---
### Looking into 2023: Predictions for a New Year in MLOps
In 2022, AI and ML came into the mainstream consciousness, with generative AI applications like Dall-E and GPT AI becoming massively popular among the general public, and ethical questions of AI usage stirring up impassioned public debate. No longer a side project for forward-thinking businesses or CEOs that find it intriguing, AI and ML are now moving towards the center of the business. For enterprises, this means that in 2023 AI and ML have the potential to exceed every technological and business expectation. As an ML industry veteran, I’m excited to see the rapid changes in AI and ML’s technological capabilities and the accelerated adoption rate in the past years. As we raise our glasses to the upcoming year, here are my predictions of what we’re expected to face as an industry in 2023: From ML Models to AI Apps Up to now, the popular approach has been to start with building models and thinking about the overall application and business integration later. Models were used initially by business analysts and reporting systems which didn’t require integration with business applications. As AI becomes central to the business, data science teams now need to integrate with live data sources and existing applications, and turn manually executed applications into interactive and real time applications. The traditional approach is to produce model serving endpoints. These endpoints accept a numeric feature vector and respond with a prediction output. In this scenario, the critical functionality of integrating with data, adding business logic, acting on the...
---
### Iguazio Named a Major Player in the IDC MLOps MarketScape 2022
The IDC MarketScape: Worldwide Machine Learning Operations Platforms 2022 Vendor Assessment is an annual study that evaluates technology vendors based on a comprehensive framework. It provides an in-depth quantitative and qualitative assessment of MLOps solution vendors in a long-form research report, to help buyers make important technology decisions that will create long term business success. We're proud to announce that Iguazio has been named a Major Player in the IDC MarketScape: Worldwide Machine Learning Operations Platforms 2022 Vendor Assessment. This report is the latest recognition from several top industry analyst firms for the Iguazio MLOps Platform. The Iguazio MLOps Platform was Recognized for Enterprise AI Acceleration As AI adoption increases among enterprises, the need to operationalize ML presents a number of obstacles, including legacy tech stacks inadequate for running ML at scale, a lack of collaboration between different departments, ad-hoc processes and insufficient automation. Enterprises that are serious about AI are now looking to build ‘ML factories’—where repeatable and reliable workflows can help quickly roll out new AI services to production. To build such an 'ML factory', data science teams are looking for solutions to automate and accelerate machine learning projects, deploy them to production, monitor and manage them in live business environments. The Iguazio MLOps Platform enables multi-functional data science teams to develop, deploy and manage AI applications effectively and efficiently, even at scale and in real-time environments. Iguazio enables data science, data engineering and ML engineering teams to work on a single platform to operationalize machine learning...
---
### Iguazio Named a Leader and Outperformer In GigaOm Radar for MLOps 2022
The GigaOm Radar reports support leaders looking to evaluate technologies with an eye towards the future. In this year's Radar for MLOps report, GigaOm gave Iguazio top scores on multiple evaluation metrics, including Advanced Monitoring, Autoscaling & Retraining, CI/CD, and Deployment. Iguazio was therefore named a leader and also classified as an Outperformer for its rapid pace of innovation. The goal of MLOps is to make ML a fully integrated part of business operations. Enterprise MLOps involves implementing machine learning applications in live business environments, to solve specific business problems, often across multiple teams. This requires a platform with enterprise-grade capabilities, not only for development, but also for deployment at scale and post-deployment monitoring and management of models in production, in live business environments. The GigaOm Radar for MLOps helps buyers become familiar with the growing range of MLOps solutions and vendor offerings. Why GigaOm Named Iguazio an Outperforming Leader for 2022 Iguazio is honored to be named an Outperforming Leader in GigaOm’s latest MLOps report. This recognition highlights our rigorous production-first approach to MLOps and differentiated capabilities that address the entire AI/ML lifecycle. We're proud to see that the Iguazio MLOps Platform has been recognized for its ability to help enterprises scale, automate and accelerate enterprise AI, with features such as: CI/CD for ML Real-time serving pipelines An integrated online and offline feature store Built-in monitoring and retraining What the Iguazio MLOps Platform Brings to Enterprise AI As organizations continue to adopt machine learning, technical teams are shifting their...
---
### Deploying Your Hugging Face Models to Production at Scale with MLRun
Hugging Face is a popular model repository that provides simplified tools for building, training and deploying ML models. The growing adoption of Hugging Face usage among data professionals, alongside the increasing global need to become more efficient and sustainable when developing and deploying ML models, make Hugging Face an important technology and platform to learn and master. Together with MLRun, an open source platform for simplifying the deployment and management of MLOps pipelines, Hugging Face enables data scientists and engineers to get their models to production faster and in a more efficient manner. In this blog post, we introduce Hugging Face and MLRun and show the value of running them together. This blog is based on the webinar “How to Easily Deploy Your Hugging Face Models to Production”. The webinar also shows a live demo of a Hugging Face deployment with MLRun, including data preparation, a real application pipeline, post-processing and model retraining. You can watch the webinar, presented by Julien Simon, Chief Evangelist at Hugging Face, Noah Gift, MLOps Expert and author, and Yaron Haviv, co-founder and CTO of Iguazio, here. How Transformers Have Revolutionized Deep Learning One of the most recent trends in ML is the reinvention of deep learning. Or rather, it is more accurate to say that transformer models are “swallowing up” traditional deep learning. Traditional deep learning architectures like CNNs, LSTMs and RNNs, which attempt to solve problems with unstructured data like audio and images, were very popular until recently. But while these...
---
### Top 10 ODSC West Sessions You Must Attend!
ODSC West is a fascinating three-day professional event, taking place in San Francisco and virtually, from Nov. 1 - 3, 2022. Topics like ML and deep learning, data engineering and MLOps, NLP and Responsible AI will be discussed and showcased during ODSC West through training sessions, workshops, bootcamps and keynote talks. As part of our pre-event tradition here at Iguazio, we’re excited to share with the ML and AI community our list of recommended sessions. We chose them based on their novel approaches to ML and AI, their diverse ways of approaching data science and their applicability to a wide variety of business and real-life use cases. Here are our top recommended 10 sessions: 1. Hyper-productive NLP with Hugging Face Transformers Tue. , Nov. 1, 9:40am - 10:55am A workshop showing how to use Hugging Face Transformers, the popular open source NLP library. During the workshop, Julien Simon, Chief Evangelist at Hugging Face, will show how to train and deploy a text classification model that predicts ratings based on a data set with Amazon product reviews. The code from the session will also be shared with attendees and available for reuse. 2. Self-Supervised and Unsupervised Learning for Conversational AI and NLP Tue. , Nov. 1, 11:25 AM - 12:40 PM Self-supervised and Unsupervised learning techniques are restructuring AI in fields like NLP, Computer Vision and Robotics. In this workshop, Chandra Khatri, Chief Scientist and Head of AI at Got it AI, will show how to leverage transformers and large language...
---
### How to Run Workloads on Spark Operator with Dynamic Allocation Using MLRun
With the Apache Spark 3. 1 release in early 2021, the Spark on Kubernetes project has been production-ready for a few years. Spark on Kubernetes has become the new standard for deploying Spark. In the Iguazio MLOps platform, we built the Spark Operator into the platform to make the deployment of Spark Operator much simpler. This blog post will cover the benefits of running your workload on Spark Operator, and how to run the workloads on Spark Operator with dynamic allocation of executors with MLRun, Iguazio’s open source MLOps orchestration framework. Spark Operator Concept Here is a quick introduction to how Spark Operator works. For more details, you can refer to the Spark Documentation Spark creates a Spark driver running within a Kubernetes pod. The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code. When the application is complete, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it is eventually garbage collected or manually cleaned up. Note that in the completed state, the driver pod does not use any computational or memory resources. The driver and executor pod scheduling are handled by Kubernetes. Communication to the Kubernetes API is done via fabric8. It is possible to schedule the driver and executor pods on a subset of available nodes through a node selector using the configuration property for it. It will...
---
### Building an Automated ML Pipeline with a Feature Store Using Iguazio & Snowflake
When operationalizing machine and deep learning, a production-first approach is essential for moving from research and development to scalable production pipelines in a much faster and more effective manner. Without the need to refactor code, add glue logic and spend significant efforts on data and ML engineering, more models will make it to production and with less issues like drift. In this blog post, we’ll demonstrate how to use Iguazio & Snowflake to create a simple, seamless, and automated path to production at scale. This post is based on a talk I gave at MLOps NYC Summit. Prefer to see the video? You can watch the whole thing here: The MLOps Pipeline: A Short Intro Before diving into how the Iguazio feature store and Snowflake operate together, let’s review why we need a solution for automating the ML pipeline in the first place. Deploying machine learning models to production introduces four main challenges: 1. Siloed Work - The machine learning pipeline requires collaboration between data scientists, data engineers and DevOps. Today, these teams often work in silos and use different tools and frameworks. As a result, code and models that were developed in the lab need to be technologically converted so they match the production environment’s requirements. This process creates friction that results in some models not making it to production. 2. Lengthy Processes - The route from the lab to production is long and time-consuming. It constitutes multiple phases, including testing, security, versioning, tuning, ensuring scalability, CI/CD and more....
---
### Iguazio Product Update: Optimize Your ML Workload Costs with AWS EC2 Spot Instances
Iguazio users can now run their ML workloads on AWS EC2 Spot instances. When running ML functions, you might want to control whether to run on Spot nodes or On-Demand compute instances. When deploying Iguazio MLOps platform on AWS, running a job (e. g. model training) or deploying a serving function users are now able to choose to deploy it on AWS EC2 Spot compute instances. Choosing a spot instance is a great cost saving choice if you can be flexible about when your applications run and if your applications can be interrupted from your ML perspective. Configuring Spot nodes inside the Iguazio MLOps PlatformAbout Amazon EC2 Spot Instances: “Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. You can use Spot Instances for various stateless, fault-tolerant, or flexible applications such as big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and test & development workloads. Because Spot Instances are tightly integrated with AWS services such as Auto Scaling, EMR, ECS, CloudFormation, Data Pipeline and AWS Batch, you can choose how to launch and maintain your applications running on Spot Instances. Moreover, you can easily combine Spot Instances with On-Demand, RIs and Savings Plans Instances to further optimize workload cost with performance. Due to the operating scale of AWS, Spot Instances can offer the scale and cost savings to run hyper-scale workloads. You also have the option to hibernate, stop or terminate...
---
### From AutoML to AutoMLOps: Automated Logging & Tracking of ML
AutoML with experiment tracking enables logging and tracking results and parameters, to optimize machine learning processes. But current AutoML platforms only train models based on provided data. They lack solutions that automate the entire ML pipeline, leaving data scientists and data engineers to deal with manual operationalization efforts. In this post, we provide an open source solution for AutoMLOps, which automates engineering tasks so that your code is automatically ready for production. For more in-depth explanations and a live demo of the solution, you can watch the webinar this blog post is based on, here. Experiment Tracking: Benefits and Challenges Traditionally, machine learning models are developed by data scientists and engineers obtaining data from a data warehouse, a data lake or standalone files. They need to prepare it, train their model or use an AutoML service, evaluate the model for accuracy and finally generate a model for production. This process needs to be tracked, so data professionals can understand how any parameter changes they make impact the model’s behavior and accuracy. Data scientists often use experiment tracking for monitoring and logging model results, since it provides them with better decision making (understanding which parameters and inputs yield the best results and reusing them in future models) and visualization (for better understanding the experiment results and model performance evaluation). Experiment tracking can also help with tracking additional aspects of the ML workload: Accelerating the process of moving results from a successful experiment to productionSimplifying pipelines - building and deploying pipelines in...
---
### How to Deploy an MLRun Project in a CI/CD Process with Jenkins Pipeline
In this article, we will walk you through steps to run a Jenkins server in docker and deploy the MLRun project using Jenkins pipeline. Before we dive into the actual set up, let’s have a brief background on the MLRun and Jenkins. What is MLRun? MLRun is a popular open source MLOps orchestration framework that streamlines machine learning projects from data collection, experimentation, model development, feature creation, production model serving deployment and model monitoring, and the full lifecycle management of machine learning. Iguazio MLOps Platform is built with MLRun at its core, with added enterprise features such as data management, user management, real-time model serving, security, autoscaling, high availability and more. Please see mlrun and the Iguazio MLOps Platform for more information. What is Jenkins Pipeline? A continuous delivery (CD) pipeline is an automated expression of your process for getting software from version control right through to your users and customers. Every change to your software (committed in source control) goes through a complex process on its way to being released. This process involves building the software in a reliable and repeatable manner, as well as progressing the built software through multiple stages of testing and deployment. In a traditional build process, the code and build process are separated, code is in the SCM system while the build process is residing inside the Jenkins configuration in a Jenkins server. This process had a lot of management challenges. As we are moving towards CI/CD “as...
---
### Beyond Hyped: Iguazio Named in 8 Gartner Hype Cycles for 2022
We’re so proud to share that Iguazio has been named a sample vendor in eight Gartner Hype Cycles in 2022: The Hype Cycle for Data Science and Machine LearningThe Hype Cycle for Artificial IntelligenceHype Cycle for Analytics and Business IntelligenceThe Hype Cycle for Infrastructure StrategyThe Hype Cycle for ITSMHype Cycle for Healthcare Data, Analytics and AIThe Hype Cycle for Customer Experience AnalyticsThe Hype Cycle for CRM Sales Technology Iguazio was mentioned in the following categories: MLOps, Logical Feature Store, Adaptive ML, >The Hype Cycle for Data Science and Machine Learning, 2022 analyzes the maturity of the DSML landscape and how it is evolving to meet the requirements of the enterprise while delivering business value. According to the DSML Hype Cycle, innovation and novel techniques are being used by data and analytics leaders to find solutions to challenges and to stay ahead of the pack. In this DSML Hype Cycle, Iguazio is mentioned in the following categories: MLOps, which discusses the streamlining of the end-to-end development, testing, validation, deployment, operationalization and instantiation of ML models; and how MLOps is standardizing the process to drive ML value. Logical Feature Store, which explains how feature stores enable reusability, reproducibility and reliability of features for ML; for breaking down silos and accelerating feature engineering. Adaptive ML, which reviews how to conduct online retraining of ML models so they can quickly adapt to real-world requirements. 2. The Hype Cycle for Artificial Intelligence, 2022 evaluates the use of AI innovations for real business utility and high...
---
### Build an AI App in Under 20 Minutes
What is an AI App? Machine learning is more accessible than ever, with datasets available online and Jupyter notebooks providing an easy way to explore and train models with minimal expertise. But when building a model, it's easy to forget that its real value lies in being incorporated into a live application that will provide value to the user. Therefore, we wanted to demonstrate how we can very easily leverage the models we build into a full application, with minimal engineering. In this blog, we'll demonstrate how quickly and easily you can create what we’re calling an AI App in Under 20 minutes. The idea is to create a set of simple AI applications that leverage a machine learning model and allow you to interact with a UI to visualize the results of their predictions. We will show that you don't have to build the whole pipeline from scratch to deploy and interact with these models. Along with interactive UIs, these AI apps will also contain Behind The Scenes code snippets showing how the models were deployed and how the application interacts with them. How Do We Embed AI into an Application? The tool that makes this all possible is MLRun - an end-to-end, open-source MLOps orchestration framework that includes experiment tracking, job orchestration, model deployment, feature store, and much more. Our focus in this blog will be on MLRun's model deployment capabilities. In machine learning, deployment refers to the process of making a model available for use. As...
---
### Machine Learning Experiment Tracking from Zero to Hero in 2 Lines of Code
Why Experiment Tracking In your machine learning projects, have you ever wondered “why is model Y is performing better than Z, which dataset was model Y trained on, what are the training parameters I used for model Y, and what are the model performance metrics I used to select model Y? ” Does this sound familiar to you? Have you wondered if there is a simple way to answer the questions above? Data science experiments can get complex, which is why you need a system to simplify tracking. In this blog, you will learn how you can perform all those tasks (as well as set up an automated machine learning pipeline) with a few simple steps using MLRun. Automated Experiment Tracking with MLRun MLRun is an open source framework to orchestrate MLOps from the research stage to production-ready AI applications. With a feature store and a modular strategy, MLRun enables a simple, continuous, and automated way of creating scalable production pipelines. MLRun automates the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, CI/CD integration, deployment to production, monitoring, and more. MLRun empowers multiple-functional ML teams to collaborate on the orchestration of the entire ML lifecycle and automate MLOps. MLRun simplifies and accelerates the time to production. To learn more about the MLRun open source project, please refer to this git repo. In this article, you will see how to turn your existing model training code into an MLRun job and get the benefit of all the...
---
### Iguazio Recognized in Gartner's 2022 Market Guide for DSML Engineering Platforms
We’re proud to share that Iguazio has been named in Gartner's 2022 Market Guide for Data Science & Machine Learning Engineering Platforms. According to Gartner, “The AI & data science platform market is due to grow to over $10 billion by 2025 at a 21. 6% compounded annual growth rate. This growth mirrors the investments made by organizations in DSML initiatives, which are largely turning from strategy to execution. ” This new Gartner guide is intended for technology leaders, to gain insights into the DSML platform market and help them choose the right solution for their specific needs. The guide assesses engineering requirements and provides a framework for vendor comparison and selection. Being one of the first vendors in the MLOps market, with highly differentiated capabilities and an aggressive approach to innovation, Iguazio is uniquely positioned to help enterprises bring data science to production. We’re proud to see the Iguazio MLOps platform recognized by customers, partners and the industry for capabilities such as: End-to-end MLOps automation, CI/CD for ML, real time serving pipelines, a built in feature store, real-time feature engineering, model monitoring and more. Iguazio is used by companies like the LATAM Airlines Group, which is rolling out a large-scale, cross-company AI innovation project using Iguazio on GCP, alongside many other key customers across verticals. The Iguazio MLOps platform abstracts away the complexities of MLOps and empowers enterprises to create real business impact with AI, across a variety of use cases. In addition to Iguazio being mentioned in this new Gartner guide, Iguazio...
---
### The Easiest Way to Track Data Science Experiments with MLRun
As a very hands-on VP of Product, I have many, many conversations with enterprise data science teams who are in the process of developing their MLOps practice. Almost every customer I meet is in some stage of developing an ML-based application. Some are just at the beginning of their journey while others are already heavily invested. It’s fascinating to see how data science, a once commonly used buzz word, is becoming a real and practical strategy for almost any company. In the following post, I’ll address one of challenges that customers bring up time and again: running and tuning data science experiments. With a step by step tutorial I’ll cover complexity concerns and show how to solve them with MLRun, our open source MLOps orchestration framework which enables a simple, continuous, and automated way of creating scalable production pipelines. MLRun is all about automation: with a few simple lines of code, it automates the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, CICD integration, deployment to production, monitoring, and more. MLRun enables data science teams to track machine learning experiments by providing a generic and easy to use mechanism to describe and track code, metadata, inputs and outputs of machine learning related tasks (executions). Machine learning experiment management is part of any healthy MLOps process. MLRun tracks various elements, stores them in a database and presents all running jobs as well as historical jobs in a single report. The database location is configurable and users run queries...
---
### Top 9 ODSC Europe Sessions You Can’t Miss!
ODSC Europe 2022 will be an exciting hybrid event, taking place in London and virtually, from June 15-16. With 50 workshops, 150 sessions and thousands of attendees, it is surely the place to learn about AI and ML and meet colleagues and experts from around the globe. Choosing which sessions to attend is not always easy, so we put together this list of our top recommended sessions for the event. We chose them based on their innovative way of thinking and the new approaches and methodologies they offer, while remaining connected to real-life examples and challenges. Here are our top recommended 9 sessions, in order of appearance: 1. Keynote: MLOps Beyond Training: The Production-First Approach to AI Wed. , June 15, 10:00am - 10:40am AI journeys usually start by manually building AI models. But to create business value, we need to take a production-first approach to MLOps. This means designing a continuous operational pipeline, automating components and measuring metrics. In this keynote session, Yaron Haviv, Iguazio’s co-founder and CTO, will explain what it means to operationalize ML, the challenges and how to overcome them. The session will include an overview of all the required steps alongside real-world examples. 2. Explainability by Design: a Methodology to Support Explanations in Decision-making Systems Wed. , June 15, 10:50am - 11:30am How can we ensure automated decision-making is meaningful? A new method, explainability by design, introduces proactive measures to include explanations in the design, instead of treating it as an afterthought. In this eye-opening...
---
### Best Practices for Succeeding with MLOps
Data science is an important skill, but the hard truth is many organizations aren’t seeing the ROI showing that data science work is making a business impact. Yet today, many organizations are still struggling to adopt a holistic approach centered around creating business value. Instead, they are focused on theoretical work. Here at Iguazio, we recently held a webinar with Noah Gift, founder of Pragmatic A. I. Labs, professor, author and MLOps consultant. In his talk, he provided best practices for succeeding with MLOps and connecting data science to a clear ROI. Below, you can find the main best practices and future trends he presented. The entire webinar is available on-demand, and has more in-depth explanations and examples. 1. Pick the Right Technology Partners and Solutions for You The right technology partners will help you scale, adapt to future technological changes and be innovative. We recommend having two to three technology partners/solutions in place: The first, your foundational solution, which is meant to be your primary product. These are solutions like AWS, Azure or Google Cloud. The second solution is a platform that solves a specific problem very well. For example, Iguazio, Splunk, Snowflake or Databricks. The third is a long-term R&D investment. For example, frameworks for ML/DL, Kubernetes or edge computing. How should you choose these primary, secondary and investment partners? While there is no silver bullet for choosing the right platform or technology provider, there is a set of considerations Gift recommends taking into account. Considerations for choosing...
---
### Top 8 Recommended MLOps World 2022 Sessions
MLOps World is taking place this year in Canada and virtually, from June 7 to 10, 2022. Packed with more than 80 workshops and case study talks, MLOps World is an important global gathering for anyone working with ML or AI. With so many incredible sessions, it can be hard to choose which ones to attend. To help, we compiled our top recommended ones. We chose them based on the use cases they cover, their comprehensiveness of productizing ML and the practical tools they offer. We will also be there, more details below. Here are our top recommended eight sessions, in the order of their appearance: 1. Machine Learning Monitoring in Production: Lessons learned from 30+ Use Cases June 7th, 10am - 12pm Machine learning monitoring is essential for detecting problems in ML stacks. But how should data scientists and engineers get started? In this session, Lina Weichbrodt from DKB Bank will explain about the four golden signals, how to prioritize a service response and how to monitor with the existing tools you already have! 2. Implementing MLOps Practices on AWS using Amazon SageMaker June 7, 10am - 12:30 pm A hands-on workshop by Shelbee Eigenbrode, Bobby Lindsey and Kirit Thadaka from AWS, teaching how to use Amazon SageMaker Pipelines to implement ML pipelines with a CI/CD approach. 3. Production ML for Mission-Critical Applications June 7, 1pm - 1:40pm How does Google implement production applications in ML pipeline architectures while ensuring they are production-ready? In this talk, Robert Crowe from...
---
### Using Snowflake and Dask for Large-Scale ML Workloads
Many organizations are turning to Snowflake to store their enterprise data, as the company has expanded its ecosystem of data science and machine learning initiatives. Snowflake offers many connectors and drivers for various frameworks to get data out of their cloud warehouse. For machine learning workloads, the most attractive of these options is the Snowflake Connector for Python. Snowflake added some new additions to the Python API in late 2019 that improved performance when fetching query results from Snowflake using Pandas DataFrames. In their blog post just a few months later, internal tests showed a 10x improvement if you download directly into a Pandas DataFrame using the new Python client APIs. In this article, we’re going to show you how to use this new functionality. Install the Snowflake Connector Python Package pip install snowflake-connector-python The snowflake connector needs some parameters supplied so it can connect to your data. Snowflake user, password, warehouse, and account all need to be supplied. In our code examples you will see these params given as **connection_info for brevity. Below is a very simple example of using the connector. This uses the fetch_pandas_all function, which retrieves all the rows from a SELECT query and returns them in a pandas dataframe. To be clear, this does not replace the pandas read_sql method, as it only supports SELECT statements. import snowflake. connector as snow ctx = snow. connect(**connection_info) query = "SELECT * FROM SNOWFLAKE_SAMPLE_DATA. TPCH_SF1. CUSTOMER" cur = ctx. cursor cur. execute(query) df = cur. fetch_pandas_all This...
---
### ODSC East Boston 2022 - Top 11 Sessions for AI and ML Professionals to Attend
On April 19, 2022 data scientists, data engineers and AI professionals will gather in Boston and virtually, to attend ODSC East. Over the course of three days, from April 19 to April 21, expert speakers will discuss deep learning, ML, MLOps, data visualization, responsible AI, and more. This year, many of the talks focus on governance and security, as well as various interesting use cases. Since there are so many great sessions and it’s so hard to choose which ones to attend, we’ve put together our recommendations. We will also be there - more details below. Here are our top 11 recommended sessions for ODSC East: 1. Adversarial Robustness: How to Make Artificial Intelligence Models Attack-proof! Serg Masís, Climate Data Scientist, Syngenta April 19, 2PM - 4PM A security-focused session about evasion attacks, which occur when perpetrators trick classifiers to make false predictions. The session will examine an evasion attack use case and explain two defense methods for ML models. Then there will be a demonstration of a robustness evaluation method and a certification method, to assure the ML model will resist evasion attacks. 2. ODSC Keynote - The Big Wave of AI at Scale Luis Vargas, Partner Technical Advisor, Microsoft April 19, 9 AM - 9:40 AM In this session, the Partner Technical Advisor to the CTO of Microsoft discusses the trend of larger AI models that enable new tasks in language, vision, and multi-modality. He will provide an overview of the research and engineering efforts that comply with...
---
### Real-Time Streaming for Data Science
This tutorial demonstrates the availability of streaming data in a data science environment, which is useful for working with real-time and fresh datasets: First, we collect data from an existing Kafka stream into an Iguazio time series table. Next, we visualize the stream with a Grafana dashboard; and finally, we access the data in a Jupyter notebook using Python code. We use a Nuclio serverless function to “listen” to a Kafka stream and then ingest its events into our time series table. Iguazio gets you started with a template for Kafka to time series. We visualize the data with Grafana and work with time series data using Python code in Jupyter. Data scientist easily access both historical and real-time data in a full Python environment for exploration and training with Iguazio. If you'd prefer to read along, here's a transcript of how we did it: Here’s how to make streaming data available for data scientists in just a few minutes. We're going to collect data into an Iguazio time series table from an existing Kafka stream, and then create a quick Grafana dashboard and access the data using Python code from a Jupyter notebook. The idea here is to have data scientists getting real-time and fresh data into their data science environment where they can immediately start working with streaming datasets. Create the Time Series Table Go to the services view and then to the shell service. I'm using tstb CLI. Specify the container name and table name with the...
---
### GigaOm Names Iguazio a Leader and Outperformer for 2022
We’re proud to share that the Iguazio MLOps Platform has been named a leader and outperformer in the GigaOm Radar for Data Science Platforms: Pure-Play Specialist and Startup Vendors report. The GigaOm Radar reports take a forward-looking view of the market and are geared towards IT leaders tasked with evaluating solutions with an eye to the future. GigaOm analysts emphasize the value of innovation and differentiation over incumbent market position. In this Radar Report for Data Science Platforms, GigaOm gave Iguazio top scores on several evaluation metrics, including Integrated MLOps Capabilities, Data Security, Scalability, Deployment Flexibility, Usability and Integrations with Third-Party Frameworks. Why GigaOm Named Us an Outperforming Challenger for 2022 Iguazio’s leader placement in the report highlights our differentiated capabilities and our commitment to a production-first approach to delivering AI/ML services. We’re thrilled to see the Iguazio MLOps Platform recognized for the capabilities every company should embrace when adopting MLOps and data science: End-to-end automation, ML CI/CD pipelines, scalability, feature store usage and real-time feature engineering, and more. Of all the vendors reviewed in this report, Iguazio ranked highest for innovation, reflecting the Platform’s aggressive approach to technical innovation, over a more conservative stance. GigaOm noted that nimble startups without legacy technologies are particularly well suited to a rapidly evolving field like data science. GigaOm analysts ranked Iguazio as an outperformer, which denotes their assessment that the Platform will progress rapidly over the next 12 to 18 months, based on strategy and pace of innovation. What the Iguazio MLOps Platform...
---
### Top 8 Machine Learning Resources for Data Scientists, Data Engineers and Everyone
Machine learning is a practice that is evolving and developing every day. Newfound technologies, inventions and methodologies are being introduced to the community on a daily basis. As ML professionals, we can enrich our knowledge and become better at what we do by constantly learning from each other. But with so many resources out there, it might be overwhelming to choose which ones to stay up-to-date on. So where is the best place to start? We tapped our experienced team to compile a list of eight ML resources for anyone interested in ML: data scientists, data engineers and more. These include YouTube channels, influencers, communities, blogs, and more. We hope you find this list valuable for your professional development. If you have more to add, feel free to drop us a line and we’ll gladly add it to the list. 1. The AI Epiphany A popular YouTube channel by Aleksa Gordić. Each video covers a paper, code or other AI-related topic and breaks it down in a clear and easy-to-follow way. Playlists cover topics like Difusion models, Robotics, Computer Vision, AI breakthroughs, and more. The AI Epiphany has 24. 8 thousand subscribers and more than 600,000 views just two years in the making. Another reason to love this source: Explanations and tutorials are presented in Gordić’s soothing voice and the videos are entertaining. 2. Towards Data Science A Medium-based independent publication that covers all data science-related concepts, like model training, serving, feature engineering, monitoring and more. Consistent reading of articles...
---
### Iguazio named in Forrester's Now Tech: AI/ML Platforms
We are delighted to share that Iguazio has been named along with Microsoft, Databricks, Cloudera, Alteryx and others in Now Tech: AI/ML Platforms, Q1 2022, Forrester’s Overview of the Leading AI/ML Platform Providers, by Mike Gualtieri. This report by Forrester Research looks at AI/ML Platform providers, to help technology executives evaluate and select one based on functionality aligned with their needs. Enterprises who are infusing AI into their applications need a platform that will support the various roles involved. This report provides a survey of the market landscape for technology decision makers. The Forrester report describes Iguazio as a code-first AI/ML platform, for enterprises whose primary teams are coders, along with with GUI tools to perform various tasks around building, deploying and managing models. The code-first paradigm that Forrester identifies in the report allows data scientists and engineers to work together with their preferred open-source tools (like Jupyter), while enabling collaboration via the UI. The Iguazio MLOps Platform is an end-to-end solution for enterprise ML teams, that automates and orchestrates the entire ML pipeline. Data science, data engineering and DevOps team can collaborate on one platform to rapidly deploy operational ML pipelines with an online and offline feature store, built-in model monitoring, dynamic scaling capabilities, all packaged in an open and managed platform. For a complete look at the evaluation criteria and Forrester’s recommendations for navigating this rapidly evolving market, access the full report here (for Forrester subscribers).
---
### ML Workflows: What Can You Automate?
When businesses begin applying machine learning (ML) workflows to their use cases, it’s typically a manual and iterative process—each step in the ML workflow is executed until a suitably trained ML model is deployed to production. In practice, the performance of ML models in the real world often degrades over time, as the workflows fail to adapt to changes in the dynamics and data that describe the environment and require frequent retraining with fresh data. Data science teams also need to experiment with new implementations of the ML workflow, such as feature engineering, model architecture, and hyperparameter optimization, to improve ML model performance. This requires experiments tracking to monitor changes in model accuracy that are caused by changes in the ML workflow. In this article, you will learn about the challenges of managing ML workflows, as well as how automating various steps in the workflow using a MLOps approach can help data teams achieve faster deployment of ML models. Understanding the ML Workflow Figure 1: ML workflow with performance monitoring (Source: Adapted from Google Cloud) A typical ML workflow involves the following steps: Data ingestion: Data is extracted from various sources and integrated into an input dataset for the ML task. Data analysis: Data scientists and engineers perform exploratory data analysis to understand the data schema and characteristics, so that they can identify the data preparation and feature-engineering operations needed for building the ML model. Data preparation: Data is prepared from the input dataset for building the ML model. This...
---
### Orchestrating ML Pipelines at Scale with Kubeflow
Still waiting for ML training to be over? Tired of running experiments manually? Not sure how to reproduce results? Wasting too much of your time on devops and data wrangling? Spending lots of time tinkering around with data science is okay if you’re a hobbyist, but data science models are meant to be incorporated into real business applications. Businesses won’t invest in data science if they don’t see a positive ROI. This calls for the adoption of an “engineered” approach — otherwise it is no more than a glorified science project with data. Engineers use microservices and automated CI/CD (continuous integration and deployment) in modern agile development. You write code, push and it gets tested automatically on some cluster at scale. It then goes into some type of beta/Canary testing phase if it passes the test, and onto production from there. Kubernetes, a cloud-native cluster orchestrator, is the tool now widely used by developers and devops to build an agile application delivery environment. Leading ML engineers and AI/ML-driven companies are already using Kubernetes, which comes with pre-integrated tools and scalable frameworks for data processing, machine learning and model deployment. Some Kubernetes data pipeline frameworks also enable horizontal scaling and efficient use of GPUs, which further cut down overall wait times and costs. Don’t settle for proprietary SaaS/cloud solutions or legacy architectures like Hadoop. Get a Kubernetes-based solution for a scalable and forward-looking data science platform. The following post reviews best practices for the process and Kubernetes frameworks used to scale and operationalize...
---
### Automating MLOps for Deep Learning: How to Operationalize DL With Minimal Effort
Operationalizing AI pipelines is notoriously complex. For deep learning applications, the challenge is even greater, due to the complexities of the types of data involved. Without a holistic view of the pipeline, operationalization can take months, and will require many data science and engineering resources. In this blog post, I'll show you how to move deep learning pipelines from the research environment to production, with minimal effort and without a single line of code. To automate the entire process, we will use MLRun, an open source MLOps orchestration framework. This blog post is based on a talk I gave with Iguazio co-founder and CTO Yaron Haviv at the Open Data Science Conference, called “Automating MLOps for Deep Learning”, which you can watch here. The Challenges of Operationalizing Deep Learning Models When developing deep learning models, the prevalent mindset most data scientists have today is to begin by developing in the research environment. Data scientists will take the data (images, text, etc. ) from an object, data lake or warehouse, prepare it, train the model and evaluate it. However, this interactive and iterative process quickly breaks down when moved into production. There, it becomes a convoluted mess. This is because the production environment requires the data scientist to make his model a reliable asset in the product, and like any other aspect of the product, the model will require performance monitoring, tests and evaluation, logging of every training/retraining process and versioning in order to pass on the best version of the...
---
### What Are Feature Stores and Why Are They Critical for Scaling Data Science?
The field of MLOps has grown up around the reality that while the theoretical ability of machine learning to make accurate predictions and solve complex problems is incredibly sophisticated, actually operationalizing machine learning is still a major blocker for most companies. Most of the complexities arise from the data: work is typically done in silos, the path to production is resource-intensive, there’s a general lack of simple access to production-ready features at scale that are consistent with the features in the data science research phases, and disjointed or nonexistent model and feature monitoring processes once the AI service is live. ML teams need a way to continuously deploy AI applications in a way that creates real, ongoing business value for the organization. Features are the fuel driving AI for the organization, and feature stores are the architectural answer that can simplify processes, increase model accuracy and accelerate the path to production. A feature store provides a single pane of glass for sharing all available features across the organization. When a data scientist starts a new project, he or she can go to this catalog and easily find the features they are looking for. But a feature store is not only a data layer; it is also a data transformation service enabling users to manipulate raw data and store it as features ready to be used by any machine learning model. These features can then accelerate machine learning use cases through the reduction of duplicate work. Some of the largest...
---
### The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 4
Hybrid Cloud + On-Premises Model Serving + Model Monitoring Recap Last time in this blog series, we provided an overview of how to leverage the Iguazio Feature Store with Azure ML in part 1. We built out a training workflow that leveraged Iguazio and Azure, trained several models via Azure's AutoML using the data from Iguazio's feature store in part 2. Finally, we downloaded the best models back to Iguazio and logged them using the experiment tracking hooks in part 3. In this final blog, we will: Discuss the benefits of a hybrid cloud architectureDefine model load and predict behaviorCreate a model ensemble using our top three trained modelsEnable real-time enrichment via the feature store during inferencingDeploy our model ensemble in a Jupyter notebook and on a Kubernetes clusterEnable model monitoring and drift detectionView model/feature drift in specialized dashboards Hybrid Cloud Benefits and Motivation Hybrid clouds are all the rage. While cloud computing has given many organizations access to near infinite compute power, the reality is that on-premise infrastructure is not going away. From data privacy concerns, to latency requirements, to simply owning hardware, there are many legitimate reasons for having an on-premise footprint. However, with this increased flexibility comes increased complexity – the right tools are needed for the job. With the multitude of end-to-end platforms, SaaS services, and cloud offerings, it is increasingly difficult to find those correct tools—especially those that work across cloud and on-premise environments. In the last blog of this series, we will combine the...
---
### The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 3
Part 3: Model Training with Azure ML and Iguazio Recap In part one and two, we introduced Iguazio's feature store and discussed the benefits of using one in the ML workflow. Additionally, we ingested and transformed the data that we will be using to train our model. In this blog, we will do the following: Upload data from Iguazio into Azure and register dataset in Azure MLTrain several models in Azure using Auto MLRetrieve trained models from Azure back to IguazioLog trained models with experiment tracking and metrics MLRun Function - Overview Before running any code, we need to take a moment to discuss one of the core tenants of the Iguazio platform: the MLRun function. This abstraction allows for simple containerization and deployment of code. Users are able to execute workloads on Kubernetes by specifying code and configuration options via high level Python syntax. An MLRun function will look something like this: The function will have its own name/tag, code, docker image, resources, and runtime. The syntax for creating an MLRun function looks like the following: From there, the code can be executed locally in Jupyter or on the cluster using several runtime engines including Job, Spark, Dask, Horovod, and Nuclio real-time functions. Create Azure MLRun Function from Python File Now that we have the background on what an MLRun function is, we are going to create one with our Azure code. I have written a Python file called azure_automl. py that performs several tasks such as: Upload the...
---
### The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 2
Part 2: Data Ingestion + Transformation into Iguazio's Feature Store Recap Last time, we discussed why organizations might require the functionality of a feature store like Iguazio's. In this blog, we will actually get into the project and cover the following: Detailed overview of Iguazio feature store functionalityHow to ingest and transform datasets into the feature storeHow to retrieve features in batch, ingest and transform the datasets into feature store, in real time Overview of Iguazio's Feature Store While some feature stores are more focused in their functionality, Iguazio's feature store is designed to facilitate the entire end-to-end workflow: In the previous blog post, we discussed the Iguazio feature store at a high level. The functionalities include (but are not limited to) the following: Ingest and transform data sources in batch or real-timeEasily retrieve features in batch or real-timeDual storage formats to facilitate batch and real-time workloadsComplex real-time feature engineering (e. g. , sliding window aggregations)Integration with model monitoringIntegration with model serving We will be exploring all of this functionality in this blog series. Iguazio Terminology: FeatureSet and FeatureVector The Iguazio feature store introduces two new terms: FeatureSet : group of features from one data source (file, data frame, table, etc. )FeatureVector : group of features from one or more FeatureSets (i. e. a few columns from here, a few columns from there) These act as the building blocks of the Iguazio feature store and will be used heavily throughout this blog series. They fit into the overall picture as...
---
### The Complete Guide to Using the Iguazio Feature Store with Azure ML - Part 1
Feature Store Motivation In this series of blog posts, we will showcase an end-to-end hybrid cloud ML workflow using the Iguazio MLOps Platform & Feature Store combined with Azure ML. This blog will be more of an overview of the solution and the types of problems it solves, while the next parts will be a technical deep dive into each step of the process: Part 1: Feature Store MotivationPart 2: Data Ingestion + Transformation into Iguazio's Feature StorePart 3: Model Training via Azure ML leveraging IguazioPart 4: Hybrid Cloud + On-Premise Model Serving + Model Monitoring with Iguazio The Gaps: Challenges When Operationalizing Data Science Regardless of the environment, one of the main challenges when operationalizing data science is fostering collaboration between teams, and eliminating tech silos. This is both a technological and organizational challenge, requiring the right processes in place and the right tools to support these processes. In a typical data science project, there are usually three different teams involved at different points in the pipeline: Data Engineer: Ingest and transform raw data from various sourcesData Scientist: Utilize transformed data to train modelMLOps Engineer: Containerize and deploy model at scale with monitoring, drift detection, and re-training capabilities The Data Science Pipeline - Image by Iguazio While the pipeline itself looks straightforward, there are more than a few places where things can go wrong - mostly at the handoff points between teams. What happens when a data scientist needs additional or different features? What happens when the...
---
### Looking into 2022: Predictions for a New Year in MLOps
In an era where the passage of time seems to have changed somehow, it definitely feels strange to already be reflecting on another year gone by. It’s a cliche for a reason–the world definitely feels like it’s moving faster than ever, and in some completely unexpected directions. Sometimes it feels like we’re living in a time lapse when I consider the pace of technological progress I’ve witnessed in just a year. The cool thing about being in the ML industry for so long is that I have a front row seat to a fascinating market characterized by rapid innovation. So before we toast to a new (and better! ) year ahead, here are my predictions of what awaits the ML industry in 2022: From AutoML to AutoMLOps 2022 will be the start of a focus shift from the practice of model creation to a holistic view towards productizing AI. The next steps will be a set of best practices and repeatable MLOps processes that will help small teams roll out complete AI services on an ongoing basis. We’re not talking about putting a notebook into production, but about building services with ML baked-in, that continuously deliver bottom-line business value. Until recently, much of the focus was on automating the ML training process, that is, AutoML. But the most significant problem in ML is not about how to find the best algorithm and parameters, but rather about deploying those algorithms as part of an application with business impact. This...
---
### Adopting a Production-First Approach to Enterprise AI
After a year packed with one machine learning and data science event after another, it’s clear that there are a few different definitions of the term ‘MLOps’ floating around. One convention uses MLOps to mean the cycle of training an AI model: preparing the data, evaluating, and training the model. This iterative or interactive model often includes AutoML capabilities, and what happens outside the scope of the trained model is not included in this definition. My preferred definition of MLOps refers to the entire data science process, from ingestion of the data to the actual live application that runs in a business environment and makes an impact at the business level. This isn’t just a semantic issue--this discrepancy in usage has major consequences for the ways enterprises bring ML to production—or don’t. The Training-First Mindset in MLOps As a vestige of an era when ML was confined mostly to academia, or to analytics services, most data science solutions and platforms today still start with a research workflow and fail to deliver when it comes time to turn the generated models into real-world AI applications. So much so, that even the term ‘CI/CD pipeline’ is sometimes used to refer to the training loop, and not extended to include the entire operational pipeline. This mindset forces the ML team to re-engineer the entire flow to fit the production environment. At this late stage, building the actual AI application, deploying it, and maintaining it in production become acutely painful, and sometimes nonviable. Modern applications in which AI models provide real-time recommendations, prevent fraud, predict failures and guide self-driving cars require significant engineering efforts and a new...
---
### Introduction to TF Serving
Machine learning (ML) model serving refers to the series of steps that allow you to create a service out of a trained model that a system can then ping to receive a relevant prediction output for an end user. These steps typically involve required pre-processing of the input, a prediction request to the model, and relevant post-processing of the model output to apply business logic. Out of the vast spectrum of tools and techniques that enable model serving, we will focus on TensorFlow Serving in this blog post. Figure 1: The high-level steps of ML model serving (Source: Kubernetes Blog) Still, serving machine learning models is not a solo act. For a full-scale production system, other external processes perform a vital support role to model serving: services that fetch and ensure consistency of features between serving and training time, services that monitor and log requests, services that automatically deploy the latest model(s), to name a few. If you are solely interested in technical guidance on TF Serving and TensorFlow production deployment, you can jump to the What TensorFlow Serving Can Do for You section for a hands-on guide tailored to different levels of experience. However, if you are also curious about how model serving fits into the intertwined scene of MLOps and how to move toward MLOps maturity, you’ll be interested in what’s coming next. The Need for MLOps and the Central Role of Model Serving The history of machine learning has been marked by a few major...
---
### ODSC West Conference- Top 6 Sessions You Must Attend
ODSC West Reconnect starts in a few days (Nov. 16-18), which means now is the best time to put the finishing touches to your session attendance schedule. Whether you’re going in-person or joining virtually, ODSC West brings together many AI and data science experts who will present trainings, immersive workshops, keynotes, and more. So if you’re interested in machine learning, data engineering, MLOps, NLP, big data and anything data science - ODSC West Reconnect is the place to be. To help you choose which sessions to attend, we put together a list of our six top recommended sessions. And of course, we will also be there. More details below. Here are our top 6 recommended sessions for ODSC West Reconnect 2021: 1. Unifying Development and Production Environments for Machine Learning Projects Chip Huyen, Adjunct Lecturer, Stanford University Nov. 16, 11:30am - 1:00 pm PT A two-part workshop covering the challenges of productionizing ML models and how to overcome them. The first part of the workshop reviews the topic in theory, while the second part is a hands-on tutorial showing how to use Metaflow to push code to production on AWS batch. (Just make sure you’re comfortable with Python for the tutorial section). This session includes a hands-on tutorial that answers one of the most important MLOps challenges: pushing to production. 2. How to Effectively Scale ML & AI in Any Organization Ella Hilal, PhD, Director of Data Science, Shopify Nov. 16, 12:20 - 12:50 pm, PT This session will cover...
---
### It Worked Fine in Jupyter. Now What?
You got through all the hurdles getting the data you need; you worked hard training that model, and you are confident it will work. You just need to run it with a more extensive data set, more memory and maybe GPUs. And then... well. Running your code at scale and in an environment other than yours can be a nightmare. You have probably experienced this or read about it in the ML community. How frustrating is that? All your hard work and nothing to show for it. What are the Challenges? Your coding style will impact how easy it is for others to take your code and run it elsewhere. For example, hardcoded parameters and file locations can make it difficult to move your code. Those are under your control. You can adopt best practices to make your code more portable. However, you still need to be aware of how your work will integrate with the release process. Is it flexible enough to be part of a CI/CD pipeline? In today’s enterprise application world, you are dealing with several infrastructure as software technologies: Docker Kubernetes Cloud Virtual Machines You can certainly learn all these technologies, but wouldn’t you rather focus on your model training instead? How awesome would it be if you could fully integrate with the infrastructure while concentrating 99% of your time on actual data science with the satisfaction of knowing that your AI/ML project can make it to production fast, at the speed of your business? This blog will explore how to use MLRun to quickly deploy applications, and run on Kubernetes without changing code or learning a new technology. Getting Set Up MLRun can be configured to run on Docker or Kubernetes. Docker Desktop gives you the most flexibility and faster path to run MLRun. We are doing this exercise working with MLRun using Kubernetes. You can follow the installation instructions here. At the end of the installation, you...
---
### How to Bring Breakthrough Performance and Productivity To AI/ML Projects
By Jean-Baptiste Thomas, Pure Storage & Yaron Haviv, Co-Founder & CTO of Iguazio You trained and built models using interactive tools over data samples, and are now working on building an application around them to bring tangible value to the business. However, a year later, you find that you have spent an endless amount time and resources, but your application is still not fully operational, or isn’t performing as well as it did in the lab. Don’t worry, you are not alone. According to industry analysts, that’s the case in more than 85% of AI (artificial intelligence) deployments. Moving from research (using small data sets) to robust online production environments is far from trivial. Production environments need to process large amounts of real-world data and meet application performance and SLA goals. The move from lab to production involves multiple practitioners across teams (data scientists, data engineers, software, DevOps, SecOps etc. ) who need to work in synchronization to build production-grade AI applications. In order to effectively bring data science to production, enterprises need to change their mindset. Most organizations today initially focus on research and model development because that is often seen as the natural first step to the data science process. However, taking a production-first approach is a sign of operational maturity and strategic thinking. Enterprises that take a production-first approach can plan for the future and make sure to harness the tooling to support the data science process as they mature and later scale. To industrialize AI, an ML (machine learning) solution that delivers scalable production grade AI/ML applications is needed. One that streamlines the workflow and fosters collaboration across teams, with a focus on automation to improve productivity. One option is to build an AI solution from discrete open-source tools, infrastructure components, and/or cloud services, but that requires significant amount of IT and engineering resources. Another option is to choose a pre-integrated and managed solution which lets you focus...
---
### Building Machine Learning Pipelines with Real-Time Feature Engineering
Real-time feature engineering is valuable for a variety of use cases, from service personalization to trade optimization to operational efficiency. It can also be helpful for risk mitigation through fraud prediction, by enabling data scientists and ML engineers to harness real-time data, perform complex calculations in real time and make fast decisions based on fresh data, for example to predict credit card fraud before it occurs. A feature store can help make this process repeatable and reproducible, saving time and effort on building additional AI services. This blog post will explain how real-time feature engineering can be seamlessly performed with an online feature store, and how with seamless integration to an MLOps platform, model monitoring, drift detection and automatic re-triggering of training can be performed, keeping your models at peak accuracy even in changing environments. This article is based on the talk Adi Hirschtein, VP Product at Iguazio, gave at the ML in Finance summit. You can view the entire presentation here. ML Pipelines in Organizations Companies invest a lot of time and resources putting together a machine learning (ML) team. However, 85% of their projects never make it to production. This is due to business as well as technical challenges. Business Challenges in Building ML Pipelines From the business side, silos between data scientists, ML engineers and DevOps make it challenging for everyone to collaborate on the machine learning pipeline in an efficient way. On top of that, organizations also need to incorporate aspects like security, network,...
---
### Implementing Automation and an MLOps Framework for Enterprise-scale ML
With the explosion of the machine learning tooling space, the barrier to entry has never been lower for companies looking to invest in AI initiatives. But enterprise AI in production is still immature. How are companies getting to production and scaling up with machine learning in 2021? Implementing data science at scale used to be an endeavor reserved for the tech giants with their armies of developers and deep pockets. Today, building a machine learning application is feasible for even the leanest startup. Yet even with the massive growth of the ML tooling space, enterprise AI in production is still immature, lacking a convergence on a common set of best practices and tools. As a response to market conditions caused by the COVID-19 pandemic and its acceleration of all things digital, many companies are forecasting even more investment in AI initiatives in the coming year. To get beyond the experimentation phase, organizations need an automated and streamlined approach to ML operationalization (MLOps). This approach is not just about machine learning workflow automation to accelerate the deployment of ML models to production. It’s also important at an enterprise level, to manage risk as ML gets scaled across the organization to more use cases in dynamic environments, and ensures that the applications continually fulfill business goals. The role of MLOps to the broader organization shouldn’t be underestimated. Whereas building software is by now a mature and straightforward practice with decades of best practices and a large pool of veteran...
---
### Using Automated Model Management for CPG Trade Success
CPG executives invest billions of dollars in trade and consumer promotion investments every year, spending as much as 15-20% of their total annual revenues on these initiatives. However, studies show that less than 72% of these promotions don’t break even and 59% of them fail. Despite these troubling statistics, most CPG organizations continue to design and execute essentially the same promotions year after year with negligible hope of obtaining sustained ROI. Trade promotions are becoming increasingly complex and harder to manage as CPG organizations struggle to respond to evolving digital channels of engagement, increasing demands from retailers, and ever-changing consumer behaviors. Companies are discovering the obsolescence of using manual processes such as basic rules-based technologies and spreadsheets to optimize CPG campaigns. This is due to the proliferation of trade promotions globally, the huge number of SKUs, and constantly evolving channels and consumer preferences. And using these outdated methods to facilitate CPG promotions at scale is virtually impossible. The billion-dollar question is: How can CPG players successfully run a profitable trade promotion in the face of today’s complex and diverse CPG industry? Thankfully, emerging technology such as AI and machine learning can help CPG companies execute promotions with greater precision to obtain impactful results and higher ROI. CPG organizations are recognizing the value of AI, ML, and other advanced technologies in enhancing product visibility, increasing brand awareness, and overcoming operational challenges. To watch the full joint webinar with Tredence about MLOps solutions for the CPG industry,...
---
### All That Hype: Iguazio Listed in 7 Gartner Hype Cycles for 2021
We are proud to announce that Iguazio has been named a sample vendor in seven 2021 Gartner Hype Cycles, including the Hype Cycle for Data Science and Machine Learning, the Hype Cycle for Artificial intelligence, Analytics and Business Intelligence, Infrastructure Strategies, Customer Experience Analytics, Financial Analytics and Hybrid Infrastructure Services, alongside industry leaders such as Google, IBM and Microsoft (who are also close partners of ours). Iguazio is listed under the following categories: MLOps, Feature Store, AI Orchestration and Automation Platform and Continuous Intelligence. The Hype Cycle for Data Science and Machine Learning, 2021 analyzes how accelerated digitization is driving the urgency to productize experimental data science and machine learning initiatives, and assesses the evolution of existing and emerging trends to orchestrate and productize DSML. The Hype Cycle for Artificial Intelligence, 2021 takes a look at the acceleration of AI as more enterprises embrace digital transformation of their core operations, and how leaders can successfully navigate AI-specific innovations that are in various phases of maturation, adoption and hype. The Hype Cycle for Analytics and Business Intelligence, 2021 evaluates the maturity of innovations across the ABI space, including consumer-focused augmented analytics, composability of D&A ecosystems, and the governance and education required to execute a variety of analytics at scale. The Hype Cycle for Infrastructure Strategies, 2021 covers innovations and enhancements in infrastructure consumption models, automation/intelligence and architecture. The report includes a look at net zero data centers, the disruptions and opportunities of containers and cloud delivery, and the maturity of software-defined...
---
### Announcing the Winners of the MLOps for Good Hackathon
We just wrapped up the first-ever 'MLOps for Good' hackathon, and we are so thrilled by the incredible response we’ve gotten from the ML community. 300 participants from all parts of the globe—From the USA, Canada, Germany, Singapore, New Zealand, Australia, India, Portugal, Philippines, Malaysia, Morocco and Pakistan—joined our cause of bringing data science to production for social good. After 6 weeks of intense hacking, 30 innovative projects were submitted. It is gratifying to pause and celebrate the ingenuity these hackathoners have shown. Last night we held the virtual Finals and Awards Ceremony. At the ceremony, our global audience heard from our panel of expert judges, including: Cecile Blilious - Head of Impact and Sustainability at Pitango Venture Capital Greg Hayes – Data Science Director, Ecolab Orit Nissan-Messing - Co-Founder and Chief Architect, Iguazio Nick Brown – Senior Data Scientist, IHS Markit Anna Anisin - Founder, DataScience. Salon Yaron Haviv – Co-Founder and CTO, Iguazio Adi Hirschtein - VP Product, Iguazio They also heard from our keynote speakers, Tomer Simon, Chief Scientist at Microsoft Israel R&D Center, and Boris Bialik, Global Head of Enterprise Modernisation at MongoDB. Simon leads the Microsoft AI for Good program, and shared how Microsoft is donating resources and technology to solve global challenges relating to the environment, accessibility, healthcare, cultural heritage, and humanitarian issues. The goal is to “amplify human ingenuity, and bring it to scale ” he said. He encouraged the hackathon participants to check out the grants available with the AI for...
---
### MLOps for Good Hackathon Roundup
2020 was a tough year. So six weeks ago we launched the first ever “MLOps for Good” virtual hackathon. Its purpose? Fostering projects that positively impacted real-world issues by bringing data science to production with MLRun. Together with MongoDB, Microsoft and 12 leading ML communities, we set out on a journey. To be honest, even we were surprised by the traction it got. More than 300 participants from around the world created over 30 projects. They tackled important issues ranging from healthcare to making the web a safer place. We’d like to thank all the participants, who took the time and energy to do good in the world. In addition to the prizes, we’ll be donating $10 for charity for each one of you. We’re now only a few days before the virtual awards ceremony where the winners will be announced. Before we see you all there, we’d like to highlight eight of the incredible projects that have been submitted. Project #1: Heart Disease Prediction The first project in this list is a heart disease predictor app, built by Amit Sharma, an aspiring MLOps engineer. By using Flask, Ansible, Kubernetes, Docker, Crio and Jenkins, he created an application that enables medical personnel to submit information and get a prediction about the patient’s heart health. What a great idea to help with the challenging task of identifying life-threatening diseases! Check it out here. Project #2: AI Wonder Girls: ICU Ops This important project was submitted by an all women's team: Aruna...
---
### Operationalizing Machine Learning for the Automotive Future
It’s no secret that global mobility ecosystems are changing rapidly. Like so many other industries, automakers are experiencing massive technology-driven shifts. The automobile itself drove radical societal changes in the 20th century, and current technological shifts are again quickly restructuring the way we think about transportation. The rapid progress in AI/ML has propelled the emergence of new mobility application scenarios that were unthinkable just a few years ago. These complex use cases require some rigorous MLOps planning. What is the ideal way to process data efficiently and cost-effectively in a connected car world? Recently, Iguazio CEO Asaf Somekh moderated a panel on this topic, with experts Ken Obuszewski, the General Manager for Automotive Vertical at NetApp; Norman Marks, the Global Senior Manager for the Automotive Industry at NVIDIA; and Roger Lanctot, Director of Global Automotive Practice at Strategy Analytics to discuss some of the core challenges of AI for smart mobility use cases. What they discussed: New technologies are driving a fundamental shift in how cars are used on a product levelData locality is a critical factor for most connected-car use casesMany smart-mobility use cases pose some rather big, as-yet unanswered questions around data governance5G may be a significant enabling technology for the low-latency requirements of connected cars The Car of the Near Future: Sometimes Autonomous, Always Connected The term “connected car” means that our cars will be networked with the outside world—to other cars, to the OEM or to transportation infrastructure, for example. That connectivity will...
---
### Building Unified Data Integration and ML Pipelines with Azure Synapse
Across organizations large and small, ML teams are still faced with data silos that slow down or halt innovation. Read on to learn about how enterprises are tackling these challenges, by integrating with any data types to create a single end-to-end pipeline and rapidly run AI/ML with Azure Synapse Analytics with Iguazio. Data Challenges in ML Pipelines Data integration is an important requirement for the entire ML lifecycle. It affects issues like: Gathering raw data from different sources, structured or unstructuredPreparing it at scaleFeeding the data into both training and production environments Gathering additional data from production to feed back into the inferencing layerModel monitoringModel explainabilityGovernance and compliance Yet, data integration is also one of the biggest challenges data engineers and scientists have today. It is a siloed, cumbersome process that is full of friction. Data handling today is divided into three different pipelines: 1. The Research Pipeline In the research pipeline, all data sets are thrown into the data lake through ETL processes. Batch transformation led by the deep engineering team and additional transformations are run to train models. Quite often, the data is not kept up-to-date. 2. The Serving Pipeline The deploy pipeline gets data from operational databases. This includes up-to-date information, streaming events, and more. Results are stored in an interactive database or key value store, from which they serve the model. 3. The Governance Pipeline The governance pipeline collects data from the production environment, and then runs anomaly detection, accuracy analyses, and more. Data is stored for...
---
### Top 10 Recommended MLOps World 2021 Sessions
MLOps World begins (soon! ) on June 14, and it’s full of interesting sessions covering methods, principles and best practices for bringing ML models to production. The talks, demos and workshops will discuss feature engineering, feature stores, version management, CI/CD architecture, optimization of pipeline schedule, ML strategies, and more. The depth and breadth of topics might make choosing which sessions you want to attend a difficult task. To help, we’ve put together our pick of the top 10 MLOps World sessions. We chose them based on the various real-life use cases they cover, the interesting stories they tell and their comprehensive approach to ML pipelines. We highly recommend checking them out. Some might also require reserving your spot in advance, so make sure to check. We will also be there, more details below. Here are our top 10 recommended sessions for MLOps World 2021 (in chronological order): 1. From Concept to Production: Template for the Entire ML Journey June 14, 3:00 pm - 7:25 pm ET Speakers: Chanchal Chatterjee, Timothy Ma and Elvin Zhu A two-part, hands-on workshop teaching how to build and deploy ML components from concept to production-ready. The workshop is based on an open source Python template created by the presenters and will build components like data prep, model hyper train, model train, model deploy and online/batch prediction. Deployment will take place in a Kubeflow pipeline. We chose this session because it shows the entire ML journey, because it provides a practical hands-on explanation and because it...
---
### Top 9 Recommended ODSC Europe 2021 Sessions
ODSC Europe is next week, which makes today the perfect time to finalize the sessions you want to attend. While we’ll still be seeing each other virtually this year, attending ODSC Europe is nevertheless a great opportunity to meet with and learn from the top data science professionals in the industry. The sessions this year will cover a wide array of topics, from ML and deep learning, through data engineering and MLOps and all the way to AI, big data and analytics. So whether you’re just getting started with machine learning or you’re an AI ninja, ODSC Europe is the place to be from June 8 to 10. We’ve gathered the top nine sessions we are looking forward to the most. We chose them because they all provide practical guidance and new, creative ways of thinking. We believe they can help us use AI and data science better, both for business and social value. We will also be there, more details below. Here are our top 9 recommended sessions for ODSC Europe 2021: 1. Explainable AI explained A talk for managers, developers and data scientists who are interested in learning how to interpret the decisions ML models make. This session will explain the difference between white and black box models, the taxonomy of explainable models and approaches to XAI. We chose this session because learning why models work the way they do can help us debug and train them. In addition, this session includes both basic methods and advanced ones,...
---
### Announcing Iguazio Version 3.0: Breaking the Silos for Faster Deployment
Overview We’re delighted to announce the release of the Iguazio Data Science Platform version 3. 0. Data Engineers and Data Scientists can now deploy their data pipelines and models to production faster than ever with features that break down silos between Data Scientists, Data Engineers and ML Engineers and give you more deployment options . The development experience has been improved, offering better visibility of the artifacts and greater freedom of choice to develop with your IDE of choice. Finally, this version also includes new options to help optimize costs. As part of Iguazio’s commitment to the open-source community, we have also updated MLRun – the open-source MLOps orchestration framework. See https://www. mlrun. org/ for more details. Version Highlights: Collaborate between functions with shared project visibility Working in silos is still one of the main pain points we hear from customers and the industry at large. The approach of ‘throwing it over the wall’ gets a lot of deserved flak in software development, but ML teams are still lacking effective collaboration systems. In this 3. 0 update, we’ve added features to help Data Scientists and Data Engineers exchange information and collaborate in a much more streamlined way. Each project now has a dashboard view, showing the status of the project, including the feature-sets, models, jobs, and functions that are part of this project. The whole team can review and work together on different projects, starting from data integration, through model development all the way to production deployment and governance. With...
---
### Iguazio Named A Fast Moving Leader by GigaOm in the ‘Radar for MLOps’ Report
At Iguazio, we’ve spoken and written
---
### Join us at NVIDIA GTC 2021
Bring your data science to production with Iguazio at this year's NVIDIA GTC event. Join Iguazio CTO Yaron Haviv and CEO Asaf Somekh at GTC for four sessions on our favorite topics, MLOps and AI! Registration is free this year, and sessions will be available in the GTC portal on-demand from Monday, April 12 through Friday, April 19. Accelerating Data Science to Production with MLOps Best Practices Speaker: Yaron Haviv, Co-Founder and CTO, Iguazio April 13, 2021 | 1pm CET | 7am EST Haviv will share how Iguazio customers, including Fortune 500 companies, are operationalizing machine learning in the enterprise and explain how to map a business problem into an automated ML production pipeline and identify the right tools for the job, how to leverage feature stores and automated model monitoring and drift detection capabilities, build real-time ML pipelines that work at scale, and ultimately how to run Al models in production leveraging MLOps best practices to accelerate business value with AI. Automating Machine Learning Pipelines and MLOps to Accelerate AI Deployment Speaker: Yaron Haviv Co-Founder and CTO, Iguazio April 12-16, 2021 | On-demand Haviv will demonstrate, through real customer use cases, how MLOps can make your enterprise production-ready and help your organization succeed with AI initiatives by accelerating AI deployment, all while keeping costs low. Implementing AI with NVIDIA Systems at the Metro Edge (Presented by Equinix) A panel with Matt Hull, Vice President, Global DGX And AI Data Center Solutions Sales at NVIDIA, Asaf Somekh, Co-Founder and CEO of...
---
### Simplify Your AI/ML Journey with Higher-Level Abstraction, Automation
You’ve already figured out that your data science team cannot keep developing models on their laptops or a managed automated machine learning (AutoML) service and keep their models there. You want to put artificial intelligence (AI) and machine learning (ML) into action and solve real business problems. And you’ve discovered that it’s not that easy to move from (data) science experiments to always-on services that integrate with your existing business applications, data, APIs, and infrastructure. Keeping to your SLA, addressing security and governance, and maintaining model accuracy is even harder. You already bet on Kubernetes as your cloud-native infrastructure, but now you have a micro-services sprawl. How do you manage persistent storage, backup, disaster recovery, and rapid cloning in that environment? How can you lower and control your costs? And can you abstract away all that complexity and focus on higher-level services, which will help you focus on building modern applications instead of chasing Pods and YAMLs? Accelerating deployment with an integrated MLOps stack and application-aware data management According to industry reports, data science teams don’t get to focus on data science work. They spend most of their time on data wrangling, data preparation, managing software packages and frameworks, configuring infrastructure, and integrating various components. Many organizations underestimate the amount of effort it takes to incorporate machine learning (ML) into production applications. This problem leads to abandoning entire projects when they’re halfway done --
---
### Iguazio Honored in 2021 Gartner Magic Quadrant for Data Science
We’re proud to share that Iguazio has received an honorable mention in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms, 2021. This is the second year in a row that Iguazio receives this recognition. The 2021 report assesses 20 vendors of platforms enabling data scientists and engineers to develop, deploy and manage AI/ML in the enterprise, across a wide array of criteria relating to their capabilities, performance and completeness of vision. It’s an exciting time to be a part of the data science landscape, with new innovations, tools and technologies empowering data science teams to work better, faster and more efficiently than ever before. We are grateful to our customers for being on this journey with us. We remain committed to helping enterprises large and small bring their data science to life. Access the full report here (open to Gartner members). What is the Gartner Magic Quadrant? Gartner provides high quality and unbiased research on the IT market, where complexity and rapid change are defining characteristics. With its global reach and meticulous research and analysis, Gartner determines both current industry standards and market trends that will soon become a reality. The Magic Quadrant is a series of market research publications produced by Gartner Inc. Each report maps a market within the IT industry, rating technology vendors based on defined criteria. visualization of Gartner’s research into various markets in the IT industry, each one comparing competitors. Companies are rated on their Completeness of Vision and...
---
### Concept Drift Deep Dive: How to Build a Drift-Aware ML System
There is nothing permanent except change. - Heraclitus In a world of turbulent, unpredictable change, we humans are always learning to cope with the unexpected. Hopefully, your machine learning business applications do this every moment, by adapting to fresh data. In a previous post, we discussed
---
### Accelerating ML Deployment in Hybrid Environments
We’re seeing an increase in demand for hybrid AI deployments. This trend can be attributed to a number of factors. First of all, many enterprises look to hybrid solutions to address data locality, in accordance with a rise in regulation and data privacy considerations. Secondly, there is a growing number of smart edge devices powering innovative new services across industries. As these devices generate volumes of complex data, which often needs to be processed and analyzed in real time, IT leaders must consider how—and where—to process that data. For many use cases, migrating data sets from on-premises or edge environments to the cloud can be inefficient, impractical, or prohibitively expensive. Deploying and managing ML on the edge also presents its own challenges. That’s why a hybrid approach may be right for many enterprises. Edge ML is an attractive option for data teams looking to analyze the data where it is generated or reduce reliance on cloud networks, but there are drawbacks. There can be multiple compute levels between the point where data is generated and where it will eventually reside. And in many situations, privacy, regulatory, security or contractual issues require that data be stored locally at the edge, while production data is sent to the application in the cloud. To handle these complexities, enabling the deployment of ML on both edge and cloud is often preferred. But this type of setup in turn introduces a new set of challenges. Operationalizing machine learning is a complex process,...
---
### Handling Large Datasets in Data Preparation & ML Training Using MLOps
Operationalizing ML remains the biggest challenge in bringing AI into business environments Data science has become an important capability for enterprises looking to solve complex, real-world problems, and generate operational models that deliver business value across all domains. More and more businesses are investing in ML capabilities, putting together data science teams to develop innovative, predictive models that provide the enterprise with a competitive edge — be it providing better customer service or optimizing logistics and maintenance of systems or machinery. While the development of autoML tools and on cloud platforms, along with the availability of inexpensive on-demand compute resources is making it easier for businesses to become ML-driven, and lowering the barrier to AI adoption, effectively developing production-ready ML requires data scientists to imbue operational ML pipelines. Along with this challenge, enterprises are finding it challenging to work with really large datasets which are essential to creating accurate models in complex environments. To operationalize any piece of code in data science and analytics and get it to production, data scientists need to go through four major phases: Exploratory data analysisFeature engineeringTraining, testing and evaluatingVersioning and monitoring While each of the above phases can be complex and time consuming, the major challenge with productionizing ML models doesn’t end here. The real challenge lies in building, deploying and continuously operating a multi-step pipeline with the ability to automatically retrain, revalidate and redeploy models at scale. As such, data scientists need a way to effortlessly scale up their work and get...
---
### The Importance of Data Storytelling in Shaping a Data Science Product
Artificial intelligence and machine learning are relentlessly revolutionizing marketplaces and ushering in radical, disruptive changes that threaten incumbent companies with obsolescence. To maintain a competitive edge and gain entry into new business segments, many companies are racing to build and deploy AI applications. In the frenzy to be the first to deploy these AI solutions in the marketplace, it’s easy to overlook the ultimate objective of embarking on data science initiatives — not just extracting business-focused insights from data but also communicating these insights to the intended audience. Let’s face it... data is boring to the average audience. Communicating insights without visuals is tantamount to displaying data in a vacuum — it doesn’t help audiences understand the significance of what they’re looking at. While data scientists can analyze and extract intelligent information from mountains of historical and real-time data, they are sometimes unable to effectively relay these hidden insights to audiences. This is where data storytelling comes in. Data storytelling focuses on communicating insights to audiences through the use of appealing visuals and narratives. It holds the power to place the human perspective on increasingly complex, expanding and rapidly changing data sets. To learn more about the importance of data storytelling in shaping a data science product, ML pipelines can be problematic since automating the entire flow of preparing data through training to deployment is a key concept of MLOps. MLOps ensures that the entire ML pipeline flows seamlessly by addressing and automating deployment, scalability, maintainability and the upgrade or...
---
### How to Build Real-Time Feature Engineering with a Feature Store
Simplifying feature engineering for
---
### Predictive Real-Time Operational ML Pipeline: Fighting First-Day Churn
Retaining customers is more important for survival than ever. For businesses that rely on very high user volume, like mobile apps, video streaming, social media, e-commerce and gaming, fighting churn is an existential challenge. Data scientists are leading the fight to convert and retain high LTV (lifetime value) users. Fighting First-Day Churn with Real-Time Data Consumers have round-the-clock access to infinite innovative products and services, and brands must work continually to keep users engaged. The challenge is to maximize the likelihood that people will stick around and use more of the product, and minimize the probability that they’ll quit. User acquisition is expensive: it’s five times cheaper to retain an existing user than to acquire a new one, and an existing customer is three times likelier to convert than a new one. Overall a 5% increase in customer retention can increase profits by 25% to 125%. This is where data science can be a critical ingredient for retention strategy. By processing data from multiple sources, a machine learning model can identify patterns and predict churn before it happens. Algorithms can identify patterns that can’t be detected with human cognition. Once a pattern has been detected, brands can take smart actions in real time—like offering a coupon, or directing them to more engaging content, or another reward—to ensure that users find continued success from day one. Most churn prediction algorithms are based on ensembles with a Survival algorithm that predicts when the “about to churn” moment is about to...
---
### Kubeflow: Simplified, Extended and Operationalized
The success and growth of companies can be determined by the technologies they rely on in their tech stack. To deploy AI enabled applications to production, companies have discovered that they’ll need an army of developers, data engineers, DevOps practitioners and data scientists to manage Kubeflow — but do they really? Much of the complexity involved in delivering data intensive products to production comes from the workflow between different organizational and technology silos. Integrating components manually requires a significant amount of resources, maintains the organizational silos and creates large technical debts. Our approach to the challenge of getting AI enabled applications to production has been to embrace Kubeflow, adding it to our managed services catalog, and bridging the functionality gaps that exist. Nuclio and MLRun — our open-source frameworks — extend Kubeflow’s functionality by enabling small teams to build complex real-time data processing and model serving pipelines from the training results and data from the real-time feature store in a matter of minutes. We believe that organizations should deploy a data science solution that breaks down silos between roles and abstracts away much of the complexity, while enabling high-performing, scalable and secure models that can be deployed in any cloud or on-prem. For our approach to extending Kubeflow’s functionality into a true end-to-end ML and MLOps solution, read the full article on Towards Data Science. Start Your Journey: Book a Live Demo.
---
### Elevating Data Science Practices for the Media, Entertainment & Advertising Industries
As more and more companies are embedding AI projects into their systems, attracted by the promise of efficiencies and competitive advantages, data science teams are feeling the growing pains of a relatively immature practice without widespread established and repeatable norms. The media and advertising industry has experienced an explosion in new technologies to serve and win customers, much of it driven by AI technologies. From the way consumers select content, to the way content is served and created, to how it’s measured, competitive brands must incorporate AI into their business strategy or risk being left behind. Content platforms are largely governed by AI now, and while it’s important to have a human in the loop, AI decision-making has become essential for ad and content management. Staying Ahead of the Curve The data science community has grown to support several conferences where business leaders and technical practitioners can meet to share experiences and discover new solutions in a rapidly changing ecosystem. One such event, the Data Science Salon (DSS) is a unique vertical-focused conference, with three separate salons for Media, Advertising and Entertainment; Retail and eCommerce; and Finance and Technology. This year, DSS went fully online, with “DSS: Applying AI & Machine Learning to Media, Advertising & Entertainment”, held September 22-25. Practitioners from Netflix, The New York Times, Salesforce, and others shared how their organizations have been leveraging AI projects to stay competitive in an industry that has seen dramatic change — both enormous growth of new brands...
---
### Building ML Pipelines Over Federated Data & Compute Environments
A
---
### How to Run Spark Over Kubernetes to Power Your Data Science Lifecycle
A step by step tutorial on working with Spark in a Kubernetes environment to modernize your data science ecosystem Spark is known for its powerful engine which enables distributed data processing. It provides unmatched functionality to handle petabytes of data across multiple servers and its capabilities and performance unseated other technologies in the Hadoop world. Although Spark provides great power, it also comes with a high maintenance cost. In recent years, innovations to simplify the Spark infrastructure have been formed, supporting these large data processing tasks. Kubernetes, on its right, offers a framework to manage infrastructure and applications, making it ideal for the simplification of managing Spark clusters. It provides a practical approach to isolated workloads, limits the use of resources, deploys on-demand and scales as needed. With Kubernetes and the Spark Kubernetes operator, the infrastructure required to run Spark jobs becomes part of your application. Adoption of Spark on Kubernetes improves the data science lifecycle and the interaction with other technologies relevant to today's data science endeavors. However, managing and securing Spark clusters is not easy, and managing and securing Kubernetes clusters is even harder. So why work with Kubernetes? Well, unless you’ve been living in a cave for the last 5 years, you’ve heard about Kubernetes making inroads in managing applications. Your investment in understating Kubernetes will help you leverage the functionality mentioned above for Spark as well as for various enterprise applications. Before You Start It’s important to understand how Kubernetes works, and even before that,...
---
### MLOps for Python: Real-Time Feature Analysis
Data scientists today have to choose between a massive toolbox where every item has its pros and cons. We love the simplicity of Python tools like pandas and Scikit-learn, the operation-readiness of Kubernetes, and the scalability of Spark and Hadoop, so we just use all of them. What happens? Data scientists explore data using pandas, then data engineers use Spark to recode the same logic to scale or with live streams or operational databases. We keep walking the same path again and again whenever we need to swap datasets or change the logic. You have to manage the entire data and CI/CD pipelines manually, as well as taking care of the business logic and the clusters you build on Kubernetes, Hadoop, or possibly both. It takes a DevOps army to manage all these siloed solutions. We end up like hamsters on treadmills, working hard without getting anywhere. In addition, when you process your features using different tools or languages, you end up having different implementations, and different values, leading to skewed models and lower accuracy. Well, here’s the good news — there’s an alternative and it’s in the form of MLOps for Python code. When you operationalize machine learning, you can deploy your Python code into production easily without rewriting it, gaining the same accuracy, super fast performance, scalability and operational readiness (logging, monitoring, security, etc. ) This saves you significant time and resources. You can write your code in Python and run it fast without...
---
### What's All the Hype About? Iguazio Listed in Five 2020 Gartner Hype Cycles
We are delighted to announce that Iguazio has been named a sample vendor in the 2020 Gartner Hype Cycle for Data Science and Machine Learning, as well as four additional Gartner Hype Cycles for Infrastructure Strategies, Compute Infrastructure, Hybrid Infrastructure Services, and Analytics and Business Intelligence, among industry leaders such as DataRobot, Amazon Web Services, Google Cloud Platform, IBM and Microsoft Azure (some of whom are also close partners of ours). The 2020 Gartner Hype Cycle for Data Science and Machine Learning takes a look at how organizations are industrializing their DSML initiatives through increased automation and improved access to ML artifacts, and by accelerating the journey from proof of concept to production, including everything MLOps. The 2020 Gartner Hype Cycle for Infrastructure Strategies focuses on infrastructure architecture, automation/intelligence, AI/ML, IoT and hyperconverged innovations. The 2020 Hype Cycle for Compute Infrastructure covers AI, cloud and security with a focus on urgently supporting the new imperatives around remote work and cost reduction due to COVID-19. The 2020 Hype Cycle for Hybrid Infrastructure Services assesses the maturity of emerging and evolving services and solutions, enabling organizations to achieve business advantage through planned adoption. The 2020 Gartner Hype Cycle for Analytics and Business Intelligence evaluates the maturity of innovations across the analytics and BI space, including edge analytics, with a focus on enabling organizations to make appropriate use of data and analytics. What does Iguazio bring to the machine learning and compute infrastructure spaces? By marrying a strong data engineering infrastructure with the...
---
### Predicting 1st Day Churn in Real Time
Continuously predict user retention in the crucial first seconds and minutes after a new user onboards. Survival analysis is one of the most developed fields of statistical modeling, with many real-world applications. In the realm of mobile apps and games, retention is one of the initial focuses of the publisher once the app or game has been launched. And it remains a significant focus throughout most of the lifecycle of any endeavor. The inverse of retention is the churn rate. Stated simply, how many new users remain on board after a given period? Usually, this is measured in cohort data analysis across days, weeks or months. For some segments of the mobile gaming industry, the average for 1st day churn hovers at a mind-boggling 70%. This is not a new benchmark; there are endless SaaS vendors, blogs and other resources and solutions that address this problem. However, they seem to focus on dealing with 1st day churn only after it has occurred, where the price to re-engage and reactivate a churned user is already increasing. This is in addition to the even greater costs of acquiring new users daily to compensate for the persistent loss. Some companies go the extra step of combating churn in real time. They will often conduct a delicate exercise using programmatic rules set by product managers and analysts to segment users into groups based on similar attributes. They will then engage with the users according to their behavior patterns. Machine learning models...
---
### Breaking the Silos Between Data Scientists, Engineers & DevOps with New MLOps Practices
Effectively bringing machine learning to production is one of the biggest challenges that data science teams today struggle with. As organizations embark on machine learning initiatives to derive value from their data and become more “AI-driven” or “>retraining of the model. All this becomes really complex once data scientists need to transition the entire work done on their Jupyter notebooks to a production environment and distribute at scale. And every time there’s changes to the data/data preparation logic, the customer requires new features to be added or the model needs to be retrained (due to drift), the entire cycle has to be repeated again. Furthermore — not only do teams need to manage the software code artifacts but they also have to handle the machine learning models, data sets, parameters, and the hyperparameters used by said models. What’s more, these artifacts need to be managed, versioned, and promoted through various stages before being deployed to production, making it harder to achieve repeatability, reliability, quality control, audibility, and versioning through the lifecycle. Doing this in a traditional engineering environment is extremely difficult, if not impossible), leading many enterprises to abandon their data science initiatives mid-process. Also, data science and engineering teams are composed of experts with different skill sets, working processes, backgrounds, and varying degrees of exposure and preferences for open source tools. Essentially, each team ends up working in silos (using different tools, methodologies, and practices), leading to friction, stifled collaboration, and inadequate knowledge-sharing between teams, ultimately increasing the complexity...
---
### Git-based CI / CD for Machine Learning & MLOps
For decades, machine learning engineers have struggled to manage and automate ML pipelines in order to speed up model deployment in real business applications. Similar to how software developers leverage DevOps to increase efficiency and speed up release velocity, MLOps streamlines the ML development lifecycle by delivering automation, enabling collaboration across ML teams and improving the quality of ML models in production while addressing business requirements. Essentially, it’s a way to automate, manage, and speed up the very long process of bringing data science to production. Right now, data scientists are adopting a paradigm that centers on building “ML factories”, i. e. , automated pipelines that take data, pre-process it, then train, generate, deploy, and monitor models. But as is the case with all models deployed in real-world scenarios, the code and data change, causing drifts and compromising the accuracy of models. ML engineers often have to run most, if not all of the pipeline again to generate new models and productionize it. And they have to do this each time the data or codebase changes. This is the major problem with MLOps... it incurs significant overhead because data scientists spend most of their time on data preparation and wrangling, configuring infrastructure, managing software packages, and frameworks. In DevOps, the twin development practices... Continuous Integration and Continuous Deployment (CI/CD), enables developers to continuously integrate new features and bug fixes, initiate code builds, run automated tests and deploy to production thus automating the software development lifecycle and facilitating fast product iterations....
---
### Iguazio Releases v2.8 with Automated Pipeline Management, Monitoring
Overview We’re delighted to announce the release of the Iguazio Data Science Platform version 2. 8. The new version takes another leap forward in solving the operational challenge of deploying machine and deep learning applications in real business environments. It provides a robust set of tools to streamline MLOps and a new set of features that address diverse MLOps challenges. This version takes another step in making the transition from AI in the lab to AI in production much easier, faster and more cost-efficient. Version 2. 8 also includes the enterprise-grade integration of our open source project for E2E pipeline automation, MLRun, as a built-in service providing a full-fledged solution for experiment tracking and pipeline automation. Note that this integration is still in its early stages and is currently being released as a tech preview. Version Highlights: Next Level ML Pipeline & Lifecycle Automation The current trend in data science is to build “ML factories,” which, much like in agile software development, consists of building automated pipelines that take data and pre-process it, and then train, generate, deploy and monitor the models. Iguazio has embedded the Kubeflow pipeline 1. 0 within the platform as part of its managed services. Once you have a workflow, you can run it once, at scheduled intervals, or trigger it automatically. The pipelines, experiments and runs are managed, and their results are stored and versioned. Pipelines solve the major problem of reproducing and explaining the ML models. They also enable you to visually compare between...
---
### 5 Incredible Data Science Solutions For Real-World Problems
Data science has come a long way, and it has changed organizations across industries profoundly. In fact, over the last few years, data science has been applied not for the sake of gathering and analyzing data but to solve some of the most pertinent business problems afflicting commercial enterprises. Streaming sites like Netflix and Spotify use their user data to personalize recommendations for end-users, while web and mobile apps like Facebook, OkCupid, and Twitter use specific algorithms to match their users to content they're likely to enjoy. In addition to that, some companies like Google and Uber use data to better understand human behavior and model better business outcomes. This has led to innovative solutions in various fields including personalized banking, fraud detection, ride-hailing optimization, and other applications that would not be possible without harnessing processed data. Let’s take the time to reflect on development of PCB data management – the administration of all data connected to how PCBs are assembled. But as IoT use cases explode, new PCB designs need to be made. Data science has solved this by introducing real-time performance data and quality assurance as well as predictive modeling to hasten prototyping. Professional Basketball Data science has also transformed this multibillion-dollar sport. In elite sports, the difference between success and failure comes down to the minute details. Pioneered by NBA teams, sports arenas have now installed video tracking systems to mine data from their games. In the case of the Houston Rockets, what they discovered changed the...
---
### Concept Drift and the Impact of COVID-19 on Data Science
Modern business applications leverage Machine Learning (ML) and Deep Learning (DL) models to analyze real-world and large-scale data, to predict or to react intelligently to events. Unlike data analysis for research purposes, models deployed in production are required to handle data at scale and often in real-time, and must provide accurate results and predictions for end-users. In production, these models must often be agile enough to continuously handle massive streams of real-time data. However, at times such
---
### AI, ML and ROI – Why your balance sheet cares about your technology choices
Much has been written on the growth of machine learning and its impact on almost every industry. As businesses continue to evolve and digitally transform, it’s become an imperative for businesses to include AI and ML in their strategic plans in order to remain competitive. In Competing in the Age of AI, Harvard professors Marco Iansiti and Karim R. Lakhani illustrate how this can be confounding for CEOs, especially in the face of AI-powered competition. Beyond cultural resistance to change, the accelerating pace of technological advancement has leaders facing the prospect of embracing the future as it’s being written. Statistics suggest that businesses haven’t yet reached critical mass with respect to ML having a significant effect on their balance sheets. This blog examines why that is, and what business leaders can do to operationalize machine learning applications faster and see positive business results sooner. Obstacles to Business ImpactLast year, Forbes reported that only 4% of executives had successfully adopted AI into their businesses, often due to a lack of depth in their understanding of the technology and the value it can unlock. Fortune said, businesses using AI have yet to see meaningful impact. Why? For one... there is broad acknowledgment that more than 80% of AI and ML projects fail to deploy from research to production. This reality compounds the perception that ROI is elusive. This remains true even when the initial results from development are encouraging. Clearly the odds are stacked against those seeking to show ROI from...
---
### How GPUaaS On Kubeflow Can Boost Your Productivity
Tapping into more compute power is the next frontier of data science. Data scientists need it to complete increasingly complex machine learning (ML) and deep learning (DL) tasks without it taking forever. Otherwise, faced with a long wait for compute jobs to finish, data scientists give in to the temptation to test smaller datasets or run fewer iterations in order to produce results more quickly. NVIDIA GPUs are an excellent way to deliver the compute power the DS team demand, but they bring their own challenges. Unlike CPUs, you can't run multiple parallel workloads or containers on GPUs. The result is that GPUs stand idle when they complete their tasks, wasting your money and your work time. The solution lies in using orchestration, clustering, and a shared data layer to combine containers so that you can harness multiple GPUs to speed up tasks, and allocate tasks to them as desired. We use MLRun, an open-source ML Orchestration framework, to define serverless ML functions that can run either locally, or in dynamically provisioned containers. The whole system can run as one logical unit, sharing the same code and data through a low-latency shared data plane. MLRun builds on Kubernetes and KubeFlow, using Kubernetes API and the KubeFlow custom resources (CRDs). Every task executed through MLRun is tracked with the MLRun service controller, while a versioned database stores all the inputs and outputs, logs, artifacts, etc. You can browse the database using simple UI, SDK, or REST APIs, and link MLRun...
---
### MLOps Challenges, Solutions and Future Trends
Summary of my MLOps NYC talk, major AI/ML & Data challenges and how they will be solved with emerging open source technologies AI and ML practices are no longer the luxury of research institutes or technology giants, they are becoming an integral part of any modern business application. According to analysts, most organizations fail to successfully deliver AI-based applications and are stuck in the process of turning >A new engineering practice called MLOps has emerged to address the challenges. As the name indicates, it combines AI/ML practices with DevOps practices, and its goal is to create continuous development and delivery (CI/CD) of data and ML intensive applications. Recently Iguazio, the provider of an end-to-end data science platform and developer of open-source MLOps technologies, together with the major cloud providers, leading enterprises and technology giants, held the MLOps NYC event. The agenda was to discuss different approaches, best practices, and start creating collaboration and standardization for MLOps. My session summarized the current challenges of MLOps and the trends we will be seeing in the near future (see the 12 minute video). Challenges According to different surveys, src="https://miro. medium. com/max/838/1*wGQ57P2WgcS_NU81QrWzuw. png" width="838" height="160" /> The MLOps Challenge >Unlike research or postmortem data analysis, business applications need to be able to handle real-time and constantly changing data, they must be on 24/7, they must meet decent response times, support a large number of users, etc. What once was the goal — producing an ML model — today is just the first step in a very long process...
---
### Top Trends for Data Science in 2020
With 2019 coming to an end and 2020 just around the corner, we reflect on a year that was full of new innovations related to machine learning, deep learning and real-time analytics. Our customers, partners, and the industry in general is doing some incredible things with data science; from self-healing systems to personalized recommendations for millions of customers in real-time. However, these innovations are just a glimpse into how AI will revolutionize our world in the coming year – and the question is whether the data science infrastructure will be able to keep up with these innovative AI applications. Because the truth is that even - or especially- today companies still struggle to turn their advanced AI models into real business applications. Most companies lack the data science strategy and infrastructure needed to support their ambitions. 2020 will be about simplifying the way from data science to production, with an emphasis on bringing real – and scalable - business value. MLOps and Serverless will be the Foundation of Data Science Applications As more businesses are focused on bringing data science to production, it’s no surprise that MLOps and serverless have become hot buzzwords and will continue to grow in 2020. Today, the overwhelming number of tools, data complexities and siloed development and engineering environments, make machine learning in production a true center of gravity. MLOps - Machine Learning Operations - incorporates standardized methods to streamline machine learning to production and manage end-to-end pipelines. It brings CI/CD to machine learning, with...
---
### SUSE and Iguazio Offer Open Source Solution for Data Science Teams
The notions of collaborative innovation, openness and portability are driving enterprises to embrace open source technologies. Anyone can download and install Kubernetes, Jupyter, Spark, TensorFlow and Pytorch to run machine learning applications, but making these applications enterprise grade is a whole different story. Delivering enterprise grade applications involves scalability, high-performance, tuning, monitoring, security and automation of infrastructure tasks. It can take months and typically requires a large team of developers, data scientists and data engineers. Today at KubeCon+CloudNativeCon in San Diego, SUSE demonstrated its new open, high performance data science solution powered by Iguazio. Instead of working for months until reaching production, data science teams using SUSE can now develop and deploy enterprise grade, machine learning applications applications in days, by leveraging a fully managed data science platform running over SUSE CaaS Platform. SUSE CaaS Platform is a certified Kubernetes software distribution. It provides an enterprise-class container management solution that enables IT and DevOps professionals to more easily deploy, manage and scale container-based applications and services. Together with Iguazio, it also simplifies and accelerates machine learning pipelines. Users can: Manage End-to-End Workflows Users manage and automate the entire workflow while tracking experiments and models running in the serving layer. The platform is integrated with pre-installed open source tools like Jupyter, Spark, TensorFlow, Pytorch and Kubeflow, all managed from a friendly UI. Automate with Serverless: Nuclio Serverless Functions enables building, training, optimizing, and deploying models in a production-ready environment, while automating DevOps tasks. Run in Real-time and at Scale: The platform provides...
---
### Iguazio + NVIDIA EGX: Unleash Data Intensive Processing at the Intelligent Edge
Nowadays everyone’s moving core IT services to the cloud and the incentive is clear. Why buy servers and software to manage your own CRM, ERP or Office systems when you can use cloud services like Salesforce or Office365? Developers love the cloud. In just a few clicks they spin up a new VM or container and attach it to a scalable database as a service, object storage and API gateways. To think we used to wait for IT to buy, install and configure servers with all that software stack. However, it is evident that with new applications, the explosion of intelligent devices and privacy constraints mandate data and computing at the edge. Maybe Peter Levine’s end of the cloud prophecy is overstated, but analysts agree that edge computing will become dominant. For example, retail stores embed cameras and sensors to track customer purchases, provide real-time recommendations and monitor inventory levels, but face challenges as forwarding massive volumes of video and sensor data to the cloud for processing is not practical. I’ll give you another example. In a large ship or oil rig, large amounts of data are collected from sensors and cameras which are used to detect critical failures. Short response times are needed, and internet connectivity cannot be relied on. The only practical solution is to have a “mini cloud” on board. Many organizations collecting large amounts of data at the edge or dealing with sensitive data want to run machine learning models. There must be a way to...
---
### MLOps NYC Panel: Recorded Sessions
It's a wrap! We had a full house at MLOps NYC, Iguazio's annual conference about managing and automating machine learning pipelines in order to bring data science into business applications. With an outstanding caliber of speakers and audience, the MLOps conference went beyond theory, shedding light on painful and successful machine learning experiences which involve running experiments at scale, versioning, delivery to production, reproducibility and data access. Watch sessions presented by leaders from Netflix, Uber, The New York Times, Iguazio and more: https://www. youtube. com/playlist? list=PLH8M0UOY0uy6d_n3vEQe6J_gRBUrISF9m
---
### Modernize IT Monitoring by Combining Time Series Databases, Machine Learning
Let’s explore the complexity and vulnerability of IT infrastructure and how to build a modern IT infrastructure monitoring solution, using a combination of time series databases with machine learning. IT Infrastructure: Complex and Vulnerable iCloud recently joined Google, Facebook, Amazon on the list of major companies that have experienced massive cloud outages. Check out ZDNet’s series of articles detailing the outages. The outage caused disruptions to the likes of YouTube, Snapchat, and Gmail, among others. iCloud’s failure also affected all their third-party apps and ApplePay, which resounded globally. We have quickly embraced the cloud as more resilient than on-premise infrastructure, so this news is sobering. It also shows the vulnerability of the IT infrastructures, both cloud-based and on-premise, that power much of our software-dependent world — a world that now includes entertainment and personal, as well as professional connections. IT infrastructure encompasses all related components, including network, security, storage, operating systems, links to hubs, and computers. Each component has numerous subcomponents, such as memory, central processing units, etc. On top of that, cloud adoption and virtualization add more intricacy. Software-defined networks make fast and automatic infrastructure changes, making it harder to track which workload resides on which virtual machine and correlate them to the physical server. Measuring the impact of one machine’s performance at any given moment becomes a serious challenge! In this digital age, companies (and people! ) are dependent on good infrastructure to power their critical functions like communication, financing, etc. Downtime is costly and damaging to a business. This...
---
### Python Pandas at Extreme Performance
Today we all choose between the simplicity of Python tools (pandas, Scikit-learn), the scalability of Spark and Hadoop, and the operation readiness of Kubernetes. We end up using them all. We keep separate teams of Python-oriented data scientists, Java and Scala Spark masters, and an army of devops to manage those siloed solutions. Data scientists explore with pandas. Then other teams of data engineers re-code the same logic and make it work at scale, or make it work with live streams using Spark. We go through that iteration again and again when a data scientist needs to change the logic or use a different data set for his/her model. In addition to taking care of the business logic, we build clusters on Hadoop or Kubernetes or even both and manage them manually along with an entire CI/CD pipeline. The bottom line is that we’re all working hard, without enough business impact to show for it... What if you could write simple code in Python and run it faster than using Spark, without requiring any re-coding, and without devops overhead to address deployment, scaling, and monitoring? Continue reading on Towards Data Science.
---
### Why is it So Hard to Integrate Machine Learning into Real Business Applications?
You’ve played around with machine learning, learned about the mysteries of neural networks, almost won a Kaggle competition and now you feel ready to bring all this to real world impact. It’s time to build some real AI-based applications. But time and again you face setbacks and you’re not alone. It takes time and effort to move from a decent machine learning model to the next level of incorporating it into a live business application. Why? Continue reading on Towards Data Science.
---
### Automating Machine Learning Pipelines on Azure and Azure Stack
Ever wonder if it’s possible to train machine learning (ML) models with regulated data which can’t be sent to the cloud? Has your edge solution gathered so much data that it just doesn’t make sense to send it all to the cloud? Iguazio brings the cloud’s intelligence to the edge enabling you to perform analytics locally and send aggregated data to the cloud. Iguazio’s partnership with Microsoft creates new possibilities for Azure and Azure Stack customers to develop end-to-end ML-based applications which may reside in the cloud, at the edge or across a hybrid deployment. Iguazio augments the capabilities of Azure and Azure Stack by providing: A self-service framework for ML tools based on leading open source ML projects A high performance and unified data fabric Serverless automation with Nuclio A Complete Machine Learning Pipeline Running in the Cloud, at the Edge or as a Hybrid Deployment With Iguazio’s Nuclio Serverless Functions, users collect data from various sources and types. Nuclio provides fast and secure access to real-time and historical data at scale, including event-driven streaming, time series, NoSQL, SQL and files. Data scientists explore and access data using a Jupyter notebook and work with popular frameworks such as Spark, Presto and Pandas. Users store and access data with different formats, including NoSQL, time series, stream data and files, while leveraging different APIs to access and manipulate the data, all from a collaborative development environment running at the cloud and edge. Iguazio’s open Python environment with built-in ML libraries like...
---
### Horovod for Deep Learning on a GPU Cluster
Here’s the problem: we are always under pressure to reduce the time it takes to develop a new model, while datasets only grow in size. Running a training job on a single node is pretty easy, but nobody wants to wait hours and then run it again, only to realize that it wasn’t right to begin with. The solution is where Horovod comes in - an open source distributed training framework which supports TensorFlow, Keras, PyTorch and MXNet. Horovod makes distributed deep learning fast and easy to use via ring-allreduce and requires only a few lines of modification to user code. It's an easy way to run training jobs on a distributed cluster with minimal code changes, as fast as possible. The main benefits in Horovod are (a) the minimal modification required to run code; and (b) the speed in which it enables jobs to run. We’ll use a Keras example in this post. Horovod is installed using pip and it requires the prior installation of Open MPI and NVIDIA’s NCCL - two libraries which support inter-GPU communication. A Data Science Platform such as Iguazio offers them already deployed. Woof vs. Meow Here's an example that demonstrates how to use Horovod to take an existing training job and run it as a distributed job over several GPUs. We used an existing image classification demo for image recognition. The demo application builds and trains an ML model that identifies (recognizes) and classifies images: Data is collected by downloading images of dogs...
---
### Data Science in the Post Hadoop Era
With all the turmoil and uncertainty surrounding large Hadoop distributors in the past few weeks, many wonder what’s happening to the data framework we’ve all been working on for years? Hadoop was formed a decade ago, out of the need to make sense of piles of unstructured weblogs in an age of expensive and non-scalable databases, data warehouses and storage systems. Since then Hadoop has evolved and tried to take on new challenges, adding orchestration (YARN) and endless Apache projects. But times have changed, and businesses are discovering simpler solutions to facilitate their more sophisticated machine learning applications. These applications are real-time and use data in motion, requirements that Hadoop was never designed to handle. The Modern Data Science Toolkit Today’s data scientists write code in Python using Jupyter notebook or PyCharm and work with modern machine learning frameworks like TensorFlow, PyTorch and Scikit Learn. All of these tools are now offered by open-source applications outside of the Hadoop ecosystem running over Kubernetes. Kubernetes is Everywhere The popularity of Kubernetes is exploding. IBM acquired RedHat for its commercial Kubernetes version (OpenShift) and VMware purchased Heptio, a company founded by Kubernetes originators. This is a clear indication that more and more companies are betting on Kubernetes as their multi-cloud clustering and orchestration technology. While some still think it makes sense to manage big data as a technology silo on Hadoop, early adopters are realizing that they can run their big data stack (Spark, Presto, Kafka, etc. ) on Kubernetes in a much simpler manner. Furthermore, they can run all of the...
---
### Paving the Data Science Dirt Road
Reimagine the Data Pipeline and Focus on Creating Intelligent Applications Guest blog by Charles Araujo, Principal Analyst, Intellyx Bump. Clank. Slosh. Moving around a city or town in the early 1800s was a bit of a slow, messy slog. But nobody knew any better. Then, in 1824, the city of Paris covered the Champs-Elysees with asphalt creating the first paved road, and kicked off a new movement that would transform cities around the world. By the late 1800s, cities around the globe were in the throes of a massive effort to pave their roads — and the impact to commerce and the quality of life was phenomenal. But paving didn’t just transform the roads – it also transformed the nature of transportation itself as a paved road opened the door to a wholesale reenvisioning of how cities worked. It may be shocking given the futuristic images it conjures, but when it comes to data science and the creation of intelligent, alt="" width="832" height="328" /> The Bane of the Data Scientist: Data & Data Complexity I love the term, data scientist. For me, it conjures up images of smart people, wearing white lab coats, with beakers full of ones and zeroes gleefully experimenting and coming up with all new ways to apply data to solve business problems. If only it were so glamorous. If my conjuring were more realistic, I would see a frazzle-haired person running from room to room, collecting armfuls of data, none of which matched, dropping some along...
---
### Kubernetes, The Open and Scalable Approach to ML Pipelines
Still waiting for ML training to be over? Tired of running experiments manually? Not sure how to reproduce results? Wasting too much of your time on devops and data wrangling? It’s okay if you’re a hobbyist, but data science models are meant to be incorporated into real business applications. Businesses won’t invest in data science if they don’t see a positive ROI. This calls for the adoption of an “engineered” approach — otherwise it is no more than a glorified science project with data... Continue reading Yaron's post on Towards Data Science.
---
### Serverless: Can It Simplify Data Science Projects?
How much time do you think data scientists spend on developing and testing models? Only 10%, according to this post by Google’s Josh Cogan. Most of their time is spent on data collection, building infrastructure, devops and integration. When you finally build a model, how long and complex is the process of delivering it into production (assuming you managed to get that far)? And when you finally incorporate the models into some useful business applications, how do you reproduce or explain its results? Do you monitor its accuracy? And what about continuous application and model upgrades? One way to simplify data science development and accelerate time to production is to adopt a serverless architecture for data collection, exploration, model training and serving. This post will explain serverless and its limitations, and provide a hands-on example of using serverless to solve data science challenges. Serverless Overview The term “serverless” was coined a few years ago by Amazon to describe its Lambda functions, a service where developers write some code and specification and click “deploy. ” Lambda automatically builds a containerized application, deploys it on a production cluster, and provides automated monitoring, logging, scaling and rolling upgrades. Other benefits of serverless are a lower cost pay-per-use model and its native integration with platform resources and events. Overall, serverless addresses three main data science challenges: It reduces the overhead and complexity of developing, deploying and monitoring code. Serverless leads to a faster time to production and allows both data scientists and developers to...
---
### Operationalizing Data Science
Imagine a system where one can easily develop a machine learning model, click on some magic button and run the code in production without any heavy lifting from data engineers... Why? Because the market is currently struggling with the entire width="700" height="394" frameborder="0" allowfullscreen="allowfullscreen"> Shifting the Paradigm We at Iguazio believe that the current paradigm is broken - there must be an easier way, based on the following principles: Data scientists work on datasets that are similar to the ones in production, with minimal deviation. This means that their behavior in training and in production is the same. Data is not moved around or duplicated just for the sake of building or training a model. The transition from training to inferencing is smooth: once a prediction pipeline (model and ETL) is created, it doesn’t require any further development effort in order to work in production. Updating models is automatic without requiring human interference. Models are validated automatically as an ongoing process and new models are automatically transferred to production. The environment supports languages and frameworks that are popular with data scientist while at the same time enables popular analytics framework for data exploration. Data scientist are able to collaborate and share notebooks in a secured environment, making sure users view data securely, based on their individual permission. GPU resources are easily shared and used by data science teams without DevOps overhead. The Solution Let’s take a look at what Iguazio does with each of the four steps in the data...
---
### Intelligent Cloud-to-Edge Solution with Google Cloud
Data gravity and privacy concerns require federated solutions across public clouds and multiple edge locations. For example, retail stores embed cameras and sensors to track customer purchases, monitor inventory and provide real-time recommendations, but face challenges as forwarding massive volumes of video and sensor data to the cloud for processing is not practical and adds significant latency. Edge solutions allow users to ingest, analyze and quickly act on large volumes of data. However, up until now they required significant development efforts and did not take full advantage of the vast resources running in the cloud. Iguazio’s Intelligent Cloud-to-Edge solution with Google Cloud addresses the challenges of various industries including leading retail software providers like Trax which required one solution federated across the cloud and thousands of retail stores. The Intelligent Cloud-to-Edge solution is the first solution which seamlessly extends the cloud experience to the edge: Develop and test software in the cloud and automatically deploy at the edge Manage, control and monitor multiple edge systems from the cloud Run real-time analytics and AI at the edge powered by machine learning in the cloud Automatically move data to/from the cloud and the edge Focus on building applications without managing infrastructure or middleware The solution leverages Kubernetes and microservices, making it possible to seamlessly migrate workloads and functions from the cloud to the edge and conduct live software upgrades. It includes unique managed services provided by Google and Iguazio as described below to deliver a comprehensive solution. The solution consists of multiple...
---
### Will Kubernetes Sink the Hadoop Ship?
The popularity of Kubernetes is exploding. IBM is acquiring RedHat for its commercial Kubernetes version (OpenShift) and VMware just announced that it is purchasing Heptio, a company founded by Kubernetes originators. This is a clear indication that companies are increasingly betting on Kubernetes as their multi-cloud clustering and orchestration technology. At the same time, in far, far away IT departments, developers are struggling with a 10+-year-old clustering technology built specifically for big data called Hadoop. Surprisingly enough, some of them still think it makes sense to manage big data as a technology silo, while early adopters are realizing that they can run their big data stack (Spark, Presto, Kafka, etc. ) on Kubernetes in a much simpler manner. Furthermore, they can run all of the cool post-Hadoop AI and data science tools like Jupyter, TensorFlow, PyTorch or custom Docker containers on the same cluster. This trend is taking its toll, as Hadoop’s two leading rivals, Cloudera and Hortonworks have recently decided to merge. The slow market growth just couldn’t justify the existence of two companies any longer. The History of Hadoop and the Kubernetes Transformation Hadoop was formed a decade ago, out of the need to make sense of piles of unstructured weblogs in an age of expensive and non-scalable databases, data warehouses and storage systems. Hadoop’s value proposition was letting developers write all of their data to hundreds or even thousands of servers, fitted with many cheap disks orchestrated using a minimalistic distributed file system (HDFS), and clusters had...
---
### Wrapping Up Serverless NYC 2018
The serverless revolution isn’t coming soon, it’s already here! Our recent Serverless NYC 2018 show was packed due to a wide range of great speakers. After a full day of serverless insights, our big takeaway is that serverless looks primed to become the fastest growing new application architecture to be adopted for both new and existing development, primarily because it lets developers focus on building and running auto-scaling applications without worrying about managing servers, as IT operations are all taken care of behind the scenes. The great lineup of vendor presenters included key serverless solution providers (such as Google, IBM, Microsoft, and Iguazio) who directly create and deliver serverless solutions to the market. While this is an emerging and competitive market, our key speakers agreed that committed developers already have all they need to successfully build large-scale serverless applications today. To prove that point, several real-world enterprise serverless champions presented how they are deploying significant production applications completely built out of serverless functions. In fact, more than one presenter explained how they’ve already moved their entire IT architecture into a serverless cloud and are now benefiting from vastly increased development agility, TCO savings and elastic scalability. Ben Kehoe of iRobot explained that once it went 100% serverless – no VMs, machine instances or containers – iRobot significantly lowered its IT costs, eliminated operational burdens to now bring new solutions to market faster. Kehoe spends most of his time improving business value instead of dealing with infrastructure firefighting. Watch Ben's presentation here,...
---
### Can Open-Source Serverless Be Simpler than Lambda?
While browsing the CNCF Serverless Slack channel recently, I noticed a message; someone needed help writing a function which processes S3 update events. He didn’t want to use AWS Lambda and alternatively was looking for an open source serverless solution over Kubernetes. I took on the challenge of writing, as a response, a function for the open sourcenucliohigh-performance serverless event and data processing platform. It was simpler than you would imagine. There is a constant debate around whether to use Serverless (managed cloud) or FaaS (open source functions-as-a-service). Serverless platforms are simpler, fully orchestrated and cost less (only per-invocation). Why would anybody ever want to use open-source Serverless/FaaS? Probably when reality hits hard... I attended the recent Serverlessconf in Paris where serverless worriers presented real-world use-cases. One of them described a scenario in which the function needed more execution time than what Amazon Lambda permits and was too slow due to lack of concurrency. She had to break the code to smaller tasks, use S3 to store intermediate state, SQS for intermediate messages and somehow make it work. As you may have guessed it took far more time and money than they anticipated, and instead of a single function call she ended up with 65,000 calls! At a certain point the workflow accidentally took down the entire company service. Open-source serverless is not always as integrated as cloud provider services, but it gives you far more choices in setting your own parameters, choosing which data or API gateways you want...
---
### Big Data Must Begin with a Clean Slate
More than a decade has passed since we coined the term “big data,” and a decade in the tech world is almost infinity. Is big data now obsolete? The short answer is that although big data in itself may still have its place for some apps, the focus has shifted to integrating target="_blank" rel="noopener noreferrer">elephants can’t fly. Technologies designed for log analysis using immutable column structures, or unorganized textual and unstructured data, aren’t so useful when data keeps on changing and responses are expected immediately. People add microbatch, streaming solutions or real-time or NoSQL databases to the unstructured and unindexed data lake, hoping it will solve the problem. Instead, they created a multiheaded beast built from discrete parts, which cannot be easily tamed. They spend days tuning performance or resource and memory allocations and handling “occasional” hiccups, fantasizing about a better future. So let’s begin with a clean slate. Here’s what we want: Simple and continuous development, followed by automated testing and deployment into production systems, without compromising application security, scalability or availability. Analytics as part of a continuous workflow with requests, events and data flowing in on one end and returning responses on the other, driving actions or presenting dashboards as quickly as possible. This is best served by a continuous analytics approach, combined with a cloud-native microservices-based architecture. Delivering Actionable Insights with Continuous Analytics After completing the first step in data science — building a model for predicting behavior or the classification of information — we deploy it...
---
### CNCF Webinar on Serverless and AI
iguazio's Yaron Haviv and Microsoft's Tomer Rosenthal provide an overview of serverless architectures and the efforts to encourage collaboration and portability through CNCF working groups. Skip to minute 38:54 for a demo of nuclio on Azure.
---
### In 2018, Can Cloud, Big Data and AI Stand More Turmoil?
The amount of new technologies in 2017 has been overwhelming: The cloud was adopted faster than analysts projected and brought several new tools with it; AI was introduced into just about all areas of our lives; IoT and edge computing emerged; and a slew of cloud-native technologies came into fruition, such asKubernetes, serverless, and cloud databases, to name a few. I covered some of these a year ago inmy 2017 predictionsand it’s now time to analyze the trends and anticipate what will likely happen in the tech arena next year. While we love new tech, the average business owner, IT buyer, and software developer glaze over this massive innovation and don’t know how to start turning it into business value. We will see several trends emerge in 2018, and their key focus will be on making new technology easy and consumable. Integrated Platforms and Everything Becomes Serverless Amazon and the other cloud providers are in a race to gain and maintain market share, so they keep on raising the level of abstraction and cross-service integration to improve developer productivity and strengthen customer lockins. We saw Amazon introducing new database-as-a-service offerings and fully integrated AI libraries and tools at last month’s AWS Re:Invent. It also started making a distinction between different forms of serverless: AWS Lambda is now about serverless functions, while AWS Aurora and Athena are about “serverless databases,” broadening the definition of serverless to any service that hides underlying servers. Presumably, many more cloud services will now be able...
---
### Tutorial: Faster AI Development with Serverless
https://hackernoon. com/tutorial-faster-ai-development-with-serverless-684f3701b004 The two most trending technologies are AI and serverless and guess what? They even go well together. Before getting into some cool examples, let’s start with some AI basics: AI involves a learning phase in which we observe patterns in historical datasets, identify or learn patterns through training and build machine learned models. Once the model has been created, we use it for inferencing (serving) to predict some outcome or to classify some inputs or images. Traditional machine learning methods include a long batch or iterative process, but we’re seeing a shift towards more continuous processes, or re-enforced learning. The inferencing part is becoming more event driven, for example a bot accepts a line of text from a chat and responds immediately; an ecommerce site accepts customer features and provides buying recommendations; a trading platform monitors market feeds and responds with a trade; or an image is classified in real-time to open a smart door. AI has many categories. Different libraries and tools may be better at certain tasks or only support a specific coding language, so we need to learn how to develop and deploy each of those. Scaling the inferencing logic, making it highly available, addressing continuous development, testing and operation makes it even harder. This is where serverless comes to the rescue and provides the following benefits: Accelerated development Simplified deployment and operations Integrated event triggers and auto scaling Support of multiple coding languages and simplified package dependencies Serverless also comes with some performance and...
---
### Cloud Native Storage: A Primer
We recently debated at a technical forum what cloud native storage is, which led me to believe that this topic deserves a deeper discussion and more clarity. First though, I first want to define what cloud native applications are, as some may think that containerizing an application is enough to make it “cloud-native. ” This is misleading and falls short of enabling the true benefits of cloud native applications, which have to do with elastic services and agile development. The following three attributes are the main benefits, without which we’re all missing the point: Durability — services must sustain component failures Elasticity — services and resources grow or shrink to meet demand Continuity — versions are upgraded while the service is running The cloud-native architecture originating in hyper-scale cloud applications revolves around microservices, i. e. small stateless and decoupled application fragments. Many similar microservice instances can be deployed (using Docker or Kubernetes) to address service elasticity and resiliency. Multiple tiers of microservices are part of a bigger and evolving application. The 12-Factor methodology specifies that microservice instances must not persist any configuration, logs, or data, enabling cloud-native durability, elasticity and continuous integration attributes. State and data are stored in decoupled scale out log streams, message queues, object storage, key-value and databases. This methodology is quite different from the one used in traditional monolithic/scale-up enterprise apps and current IT infrastructure (sometimes called IT “pets”), where apps require lots of configuration, logs, data and state (stored per workload in “virtual disks”). Applications...
---
### NYC Meetup: How to Go Serverless to Enable Faster and Simpler Analytics
Watch this video of Yaron Haviv talking about using serverless in Big Data and AI. Yaron spoke at a Meetup which was part of the NYC Database Month series in November 2017:
---
### AWS re:Invent is about Data, Serverless, and AI
AWS re:Invent is all about how managed data services, serverless, and AI work together to enable new business applications. The focus is shifting from building the infrastructure of your choice in a playground that has an endless number of toys (services) to an opinionated, pre-packaged approach that enables customers to focus on business applications. Why? Businesses are aligning with the digital age, as the world becomes increasingly digital. The ability to attract new customers and ensure retention depends on constant delivery of new interactive services which leverage a variety of data assets and AI. Furthermore, profitability depends on automated and optimized operations, which surprisingly enough, also require data and AI technologies. A great example is the Amazon store, where customers receive product recommendations and robots handle packing and shipping. This means that we need to run fast - faster than our competitors - and focus on applications as opposed to infrastructure hassles. AWS gets that, which is why it’s climbing up the stack with an integrated approach. Forward thinking companies and startups get it as well and therefore challenge incumbent vendors, but most enterprises are still struggling with legacy approaches. If you think that buying some virtualization or HCI (hyper-converged) cluster, or even playing with Kubernetes is “cloud native,” think again (see my post). You need to build and support the entire stack; use an endless number of commercial or open-source packages; integrate them; make sure they scale, work together and don’t break; integrate security; think about high-availability and service upgrades; patching of OS and...
---
### The Future of Serverless Computing
Serverless computing allows developers to focus on building and running auto-scaling applications without worrying about managing servers, as server provisioning and maintenance are taken care of behind the scenes. Industry demand for instant results has therefore made serverless platforms the new buzz. However, serverless computing has challenges which limits its usability and applicability: Slow performance and lack of concurrency (single threaded) Lock-in to platform specific event and data sources Complexity of application state maintenance, code dependencies and service dependencies They are impossible to develop, debug, test and deploy in a hybrid or multicloud environment Latency of tens or hundreds of milliseconds is the norm today when developing serverless functions in the cloud, and we’re lucky if we manage to run more than a few thousands of events/sec without taking on a second mortgage. This limits the usage to non-performance-sensitive front-end apps or glue logic. Serverless computing can address many more workloads if it were faster or more efficient. We need to feed functions from messaging or streaming sources as we break past the comfort zone of simple web apps. We must also store state in cloud-specific databases/storage and use cloud-specific logging and monitoring tools. These integrations are not trivial and means that our function code is now tied to a specific cloud platform. What if we wanted to swap vendors? Or use a multicloud deployment? Or debug some code on our laptop? Or regression test on our mini cluster? Just forget about it. Yeah, But What about These Numerous Open...
---
### VMworld 2017: VMware Feeds Off OpenStack Decay
I used to be sure OpenStack would take over VMware. Five years ago, when I worked on OpenStack, I felt freedom – users could manage the entire cluster as a cloud and say goodbye to $5,000 fees per license. However, in hindsight VMware hasn’t been cannibalized by OpenStack or by the public cloud. It is now increasing revenues, while OpenStack fades away. VMware adopted OpenStack APIs, kept the orchestration story and added in the battle-tested VMware stack. So how did VMware grow despite the massive transition to the public cloud, open source adoption and new container technologies making hypervisors redundant? It’s actually quite simple: most IT organizations are not ready for a true digital transformation. They struggle to keep the lights on due to a myriad of legacy applications, Oracles and SAPs to manage. VMware, Azure Stack and Nutanix are the only integrated solutions for IT automation, as opposed to DIY and OpenStack solutions that require skilled DevOps, pro-services and can’t appropriately address business requirements. The VMware-Amazon partnership gives VMware viability and allows CIOs to put a checkmark near “cloud” without truly embracing the cloud’s service-centric model. In the long run much of those workloads will move to Amazon’s native services, databases and serverless functions. However, it still buys time for VMware... The Pivotal Container Service announcements make everyone happy. VMware will ride the cloud-native story, Pivotal embraces Kubernetes so it doesn’t hurt the rumored IPO and Google can access more enterprise customers. Surprisingly enough, this major VMworld announcement doesn’t...
---
### iguazio Raises $33M to Accelerate Digital Transformation
Today we announced a $33M investment from top VCs - Verizon Ventures, Robert Bosch Venture Capital, CME Ventures and Dell Technologies Capital. It’s a major step in our ambitious goal of accelerating digital transformation by enabling modern cloud and analytical services close to the edge. iguazio simplifies the data stack and enables organizations to automate their operations, improve customer interactions and provide new services to stay competitive in the digital era. Raising financing from a leading service provider, IoT vendor, the financial sector and infrastructure players demonstrates the criticality of this transformation to enterprises. It also validates our vision and execution capabilities. Our approach is simple: companies want to consume well-engineered services and APIs so that they can focus on their actual business, instead of on the complexity of data pipelines. Public cloud is one option, but it doesn’t fit every need because data has gravity which is best handled with fast and agile services close to the edge. Yes, you can buy some legacy infrastructure or “converged” offerings disguised as a “private cloud”. Go ahead and try implementing 10-year-old Hadoop and BI technologies which force you to build and integrate the entire stack from fragments of technologies and layers of inefficiency. Continuous Analytics: Begin with the End in Mind iguazio’s new funding round follows successful deployments across major financial, ride-sharing, media, cyber security and IoT customers. Among other initiatives, we plan to utilize the investments to rapidly expand our field and data science teams to help more customers...
---
### IT Vendors Don't Stand a Chance Against the Cloud
Last week I sat in on an AWS event in Tel Aviv. I didn’t hear a single word about infrastructure or IT and nothing about VMs or storage, either. It was all about developers and how they can build production grade solutions faster, at a fraction of the cost, while incorporating Amazon’s latest security, AI and analytics capabilities. Why mess with VMs, deploy a bunch of software packages, write tons of glue logic and abstraction layers, figure out end-to-end security, debug and tune that all? The new model presented was based on taking an API gateway, plugging into a user directory-as-a-service, then databases-as-a-service (DBaaS), some “serverless” functions with business logic and voila! You’re are done. Want to take it to the next level? Configure multi-factor authentication in a few clicks, add some AI services and analytics dashboards and even hook IoT devices using the AWS SDK. Yes, they forgot to mention all those are proprietary APIs and services that lock you in, and that they're often quite expensive over time, but that's the price of agility. Now contrast that with the IT landscape, which is still too in love with hardware and system innovations that make no impact on the bottom line. Breaking news in IT land is the hyper-converged infrastructure (HCI) and SSD storage sold to infrastructure teams, while those teams struggle to stay relevant in the face of cloud innovation. The business units, development, and DevOps in large enterprises have two options: either build fast with the cloud,...
---
### Using Containers As Mini-VMs is NOT Cloud-Native!
Cloud and SaaS companies invented the notion of micro-services and the “cloud-native” model to gain efficient scaling along with continuous development and operations. Legacy approaches don’t work for global services like Facebook, Google or eBay, which are always on. Containers and Docker were created as the ultimate packaging for such micro-services and new orchestration platforms like Kubernetes, Docker Swarm, and DC/OS handle their deployment, scheduling and life-cycle. Serverless and FaaS are basically an evolution of this model with more automation. What Does Cloud-Native Mean? We want to deliver elastic applications which evolve or scale over time. The way to do this is by breaking apps into multiple tiers (micro-services), each with its own elastic scaling, while communicating between these micro-services with reliable messages. A micro-service cannot be stateful if we want to scale, handle failures or change versions on the fly. Unlike legacy apps, micro-services use immutable images and store configuration, logs, stats and data in elastic cloud data services (object, NoSQL/NewSQL, log/message streams). Cloud data services are usually built by clustering a set of commodity servers (with local disks). We use pre-integrated cloud provider data services or role our own using open-source or commercial software. Developers and business owners immediately get the benefits of a cloud-native approach: it allows them to develop apps faster in an agile and continuous methodology, while elastic scaling meets demand fluctuations. Why Containers Are Not VMs Traditional infrastructure teams and vendors don’t think like cloud users and providers. They still see the world as...
---
### Continuous Analytics: Real-time Meets Cloud-Native
iguazio combines real-time analytics, concurrent processing and cloud-native agility. We started working on this completely new approach to data management after witnessing countless customers struggle with traditional big data and analytics solutions. As a result, we are seeing that iguazio’s platform leads to magnitudes faster time to insights, faster time to market, greater simplicity and robust security. This post provides information and details about one of the demos we are using this week at Strata + Hadoop World in San Jose. Continuous Analytics in a Nutshell Traditional big data solutions involve building complex data pipelines that have separate stages for collecting and preparing data, ingestion, analytics, etc. Pipelines are serialized: they involve many data transformations and copies, requiring different tools and data repositories for each stage. These solutions are complex to develop, require too much time to generate insights, are an operational nightmare and are vulnerable due to security gaps. Continuous analytics is better. Data is ingested, enriched, analyzed and served simultaneously to/from the same data repository. Various micro-services and processing frameworks access the data concurrently, each making its own real-time adjustment. They add insights, or query in parallel and data doesn’t move. Results are always up to date. Processing is 100% stateless and updates are atomic in continuous analytics. This means we can elastically scale processing, add new applications and change versions on the fly in an agile cloud-native manner, allowing for rapid development and continuous integration. Since data doesn’t move, it is just enriched and enhanced. It’s also...
---
### AWS S3 Outage Signals We MUST Decentralize Cloud
Earlier this week, a significant portion of the internet was down due to an Amazon Web Services’ (AWS) S3 outage. While AWS blames the problem on the removal of a couple of servers, the real question is why have we created such a dependency on services like AWS? Won’t it get worse with the coming IoT, 5G networking, DDoS wars and the constant movement of essential services to cloud infrastructures? Tweets from a gentleman whose home “didn’t work” since his home automation is connected to AWS may seem humorous at first. But the tweets raise legitimate questions about what will happen in the future when all our devices and driverless cars will be connected. Here’s Amazon’s official statement on the outage: “The servers that were inadvertently removed supported two other S3 subsystems. One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region. This subsystem is necessary to serve all GET, LIST, PUT, and DELETE requests. The second subsystem, the placement subsystem, manages allocation of new storage and requires the index subsystem to be functioning properly to correctly operate. ” What they are saying is that big chunks of the internet depend on just one or two local services to function. Clearly it’s a system design flaw that AWS can work to avoid it in the future. However, this week’s outage happened without malicious intent. What happens if failures are initiated by terror organizations or rogue states? When DARPA...
---
### Serverless: Background, Challenges and Future
Serverless computing is the latest buzz, driven by the digital economy’s demand for instant results without the hassle. The serverless concept is a prepacked flavor of modern cloud-native architecture which decomposes applications to multiple stateless and elastic micro-service tiers. Micro-services shrink or grow to satisfy demand; they are restarted in case of a resource failure and change versions without breaking or taking down the entire app. The cloud-native approach is not entirely new and has roots in SOA and grid computing. It evolved to use containers for workload isolation. Cloud-native is used across hyper-scale applications and SaaS offerings (Google, Facebook, Netflix, eBay, etc. ). We at iguazio have designed our platform as a high-performance cloud-native application, because it’s the only way to run continuous applications at scale. A fundamental requirement to achieving this state of nirvana has to do with keeping micro-services stateless and immutable. This is tough, or at least it has been until our current age of “serverlessness. ” Before using it for everything, beware of the limitations and inefficiencies in first gen serverless solutions. Why is it Tough... ? I Can Bring Up My Docker Image in a Snap! Right, I love Docker! Docker makes it very easy to package and run a micro-service while removing the challenge of installation and dependency. But a commercial cloud-native app requires setting up quite a bit of infrastructure. If the micro-service is stateless, we need to store state somewhere and state has many forms (configuration, data, logs, etc. ). Apps...
---
### 2017 Predictions: Clouds, Thunder and Fog
The IT industry is still in flux and in 2016 we saw some tectonic shifts. Setting aside noise and hype, lets analyze the trends and predict what will likely happen in 2017. Rapid Growth of Enterprise Cloud Adoption It doesn’t take a prophet to notice the public cloud is growing and Enterprise IT is shrinking. In yesterday’s world we had terminals and mainframes and locality mattered, but today between a mobile work force, mobile clients and globalization, there isn’t much incentive to have an on-prem infrastructure. Most companies prefer using always-connected services on the internet’s backbone. This move of enterprise organizations to the cloud will accelerate in 2017, driven by two main factors: The greatest growth is in SaaS. Small and large enterprises are moving to cloud-based email, collaboration, content management, ERP, CRM, HR, etc. Seventy percent of companies already use SaaS and IT infrastructure is taking a hit. This trend will likely accelerate, as there will be even less need for Exchange servers, Oracle, SAP, etc. and all their related hardware. Another key trend is the move to digital businesses. Companies must now align technology with their core business and maintain constant engagement with customers, otherwise they face disruption by new entrants and competitors. This requires agility and adoption of new technologies like cloud-native, containers, server-less functions and real-time analytics. Business units turn to the cloud because the complexity of DIY hurts productivity and it’s just easier to avoid IT getting in the way. The Cloud of Musical Chairs...
---
### Did Amazon Just Kill Open Source?
After this week at re:invent, it is clear that Amazon is unstoppable. AWS announced many more products, all fully integrated and simple to use and if you thought infrastructure companies are its competition, think again. The new Amazon offering competes with established database vendors, the open-source big data eco-system and container eco-system, security software and even developer and APM tools. Open-source is a key ingredient, but Amazon seems to prove that usability and integration are more important to many customers than access to an endless variety of overlapping open source projects. It is interesting to see how Amazon on one hand bashes the open-source eco-system and highlights the advantage of its own tools, while at the same time taking projects like Presto, which was developed in the open by Facebook, and turn it into a packaged, revenue generating product (the newly announced Athena service). This should be a wake-up call for the tech and software industry ! ! ! Back in the days, we used to focus on creating modular architectures. We had standard wire protocols like NFS, RPC, etc. and standard API layers like BSD, POSIX, etc. Those were fun days. You could buy products from different vendors, they actually worked well together and were interchangeable. There were always open source implementations of the standard, but people could also build commercial variations to extend functionality or durability. The most successful open source project is Linux. We tend to forget it has very strict APIs and layers. New kernel implementations...
---
### iguazio Collaborates with Equinix to Offer Data-Centric Hybrid Cloud Solutions
Placing Governed Data and Analytics Closer to Their Sources, While Leveraging Amazon Web Services Compute Elasticity to Generate Business Insights at Extreme High Speeds AWS re:invent, Las Vegas, NV, November 30th, 2016 iguazio today announced that it is collaborating with Equinix (Nasdaq: EQIX) to power the announced the Enterprise Data Cloud in September 2016 and is already operational in large scale IoT deployments and leading financial institutions. iguazio’s platform will be generally available in Q2 2017. About iguazio iguazio was founded in 2014 with a fresh approach to the data management challenges faced by today’s enterprises. The iguazio Enterprise Data Cloud has fundamentally redesigned the entire data stack to bridge the enterprise skill gap and accelerate performance of real time and analytics processing in big data, the Internet of Things (IoT) and cloud-native applications. Backed by top VCs and strategic investors, the company is led by serial entrepreneurs and a diverse team of seasoned innovators in the USA and Israel. Visit http://www. iguaz. io/ or follow @iguazio to learn more about iguazio.
---
### VMware on AWS: A Scorecard for Winners and Losers
Last month Amazon and VMware announced a partnership which will enable VMware software to run on a dedicated space within Amazon cloud. This unnatural act is mainly due to Azure which is getting stronger, it is the only cloud provider with a real hybrid cloud story. The same exact software stack which was initially limited to a few hardware vendors like HP, Dell and Lenovo, can now run on-prem, The public cloud is rapidly growing, running cloud-native workloads used by many of the new companies and startups. But major enterprises are still hesitant and use it for development and testing. Amazon, Microsoft and Google want to accelerate the enterprise's move to the cloud by offering a “Hybrid” approach to ease this migration. The biggest issue is that public and on-prem clouds are quite different creatures. On-prem focus on IT and virtual machines, while the public cloud is focused on services and APIs for developers and modern apps. Azure has a cohesive stack with the same services running on-prem and in the cloud, while this “AWMware” combination is basically made of two unrelated stacks. As I wrote before, the IT world is going through a radical change. Analysts have called it a “digital transformation," “application modernization,” “the 3rd platform”... In essence it’s about building globally distributed cloud-native apps that are not tied to terminals and stand-alone VMs. It’s about stateless and elastic apps in micro-services that serve local or mobile users which constantly share and analyze data. The VMware/AWS move just...
---
### Streamlined IoT at Scale with iguazio
iguazio
---
### Cloud Data Services Sprawl … it’s Complicated
Legacy data management didn’t offer the scalability one finds in Big Data or NoSQL, but life was simple. You’d buy storage from your vendor of choice, add a database on top and use it for all your workloads. In the new world, however, there are data services for every application workload. Targeted services may sound great, but multiple workloads mean complex data pipelines, multiple data copies across different repositories and complex data movement and ETL (Extract, Transform, Load) processes. With single-purpose data silos, the cost of storage and computation grows quickly. Companies like Amazon or Google have jumped in, selling targeted services – raking in lots of money, at higher margins and often with tricky pricing schemes. It’s all very complicated. But now it’s time for enterprises to demand unified data services that have better API variety and a combination of volume and velocity. There’s no need for so many duplicated copies, for such complex data pipelines or for ETLs. AWS as a Case Study Amazon Web Services (AWS) offers 10 or more data services. Each service is optimized for a specific access pattern and data “temperature” (see Figure 1 below). Each service has different (proprietary) APIs, and different pricing schemes based on capacity, number and type of requests, throughput, and more. In most applications, data may be accessed through several patterns. For example, it may be written as a stream but read as a file by Hadoop or as a table by Spark. Or, perhaps individual items are updated...
---
### The Next Gen Digital Transformation: Cloud-Native Data Platforms
The software world is rapidly transitioning to agile development using micro-services and cloud-native architectures. And there’s no turning back for companies that want to be competitive in the new digital transformation. Evolution in the application space has a significant impact on the way we build and manage infrastructure. Cloud-native applications in particular require shared cloud-native data services. The Old Apps and Storage To better understand this new approach, let’s take a quick look back. With legacy applications, servers had disk volumes that held the application data. As the markets matured and services changed, things shifted to clouds and infrastructure-as-a-service (IaaS) which meant virtualized servers (virtual machines, or VMs) were mapped to disk partitions (vDisks) in a 1:1 relationship. Storage vendors took pools of disks from one or more nodes, added redundancy, and provisioned them as virtual logical unit numbers (LUNs). Then came Hyper-Converged. This technology wave enhanced the process and pooled disks from multiple nodes. Real security wasn’t required; rather this solution relied on isolation to ensure only the relevant server talked to its designated LUN. The process is also known as zoning. The New Apps and Data As the evolution continues, the new phase is platform-as-a-service (PaaS). Rather than managing virtual infrastructures such as virtual machines, apps are now managed. Rather than managing virtual disks, data is now managed. The applications don’t store any data or state internally because they are elastic and distributed. The applications use a set of persistent and shared data services to store data...
---
### It’s Time for Reinventing Data Services
During the last decades, The IT industry have used and cultivated the same storage and data management stack. The problem is, everything around those stacks changed from the ground up — including new storage media, distributed computing, NoSQL, and the cloud. Combined, those changes make today’s stack exceedingly inefficient — slowing application performance, creating huge operational overhead, and hogging resources. An additional impact of today’s stack is multiple data silos that are each optimized to a single application usage model, and the requirement for data duplication to handle the case when multiple access models are used. With the application stack now adopting a cloud-native and containerized approach, what we also need are highly efficient cloud-native data services. The Current Data Stack Figure 1 shows, in red, the same functionality being repeated in various layers of the stack. This needless repetition leads to inefficiencies. Breaking the stack into many standard layers is not the solution, however, as the lower layers have no insight into what the application was trying to accomplish, leading to potentially even worse performance. The APIs are usually serialized and force the application to call multiple functions to update a single data item, leading to high overhead and data consistency problems. A functional approach to update data elements is being adopted in cloud applications and can solve a lot of chatter. Figure 1: Current Data Stack In the current model the application state is stored with the application (known as stateful or persistent services). This is incontrast to...
---
### DC/OS Enables Data Center “App Stores”
Containers are getting widely adopted as they simplify the application development and deployment process, the transition into continuous integration with Microservices and Cloud-Native architectures will have significant impact on the businesses agility and competitiveness- They bring the notion of Smartphone Apps to the datacenter. Unfortunately, today there are still gaps, and Cloud-Native implementations require quite a bit of integration and DevOps. A few commercial and proprietary offerings try to tackle the integration challenge, but without a real community behind them they may not gain enough momentum. This is why Mesosphere latest move to open source DC/OS can be a significant milestone in enabling the Enterprise “App Store” experience. DC/OS Enabling Data Center “App Stores” The Enterprise “App Store” Micro-Services Stack When building a Micro-Services architecture, we need several key ingredients such as: Trusted and searchable image repository (Application Marketplace) A way to monitor and manage the physical or virtual cluster (The Devices) Scheduler & Orchestrator to automate deployment and resource management of apps Cloud-Native Storage to host shared data and state For a real enterprise deployment, you would need a bunch of additional components like Service discovery, Network isolation, Identity management, and the list goes on. Integrating the above discrete components manually is resource consuming and useless given everyone would need similar components, which is why we see the emergence of commercial “Cloud-Native Stacks” and formation of standard organizations like CNCF (Cloud-Native Foundation). Commercial or “best of bread” stacks have limited impact since they do not attract collaboration from multiple vendors,...
---
### Re-Structure Ahead in Big Data & Spark
Big Data used to be about storing unstructured data in its raw form – . “Forget about structures and Schema, it will be defined when we read the data”. Big Data has evolved since – The need for Real-Time performance, Data Governance and Higher efficiency is forcing back some structure and context. Traditional Databases have well defined schemas which describe the content and the strict relations between the data elements. This made things extremely complex and rigid. Big Data initial application was analyzing unstructured machine log files so rigid schema was impractical. It then expanded to CSV or Json files with data extracted (ETL) from different data sources. All were processed in an offline batch manner where latency wasn’t critical. Big Data is now taking place at the forefront of the business and is used in real-time decision support systems, online customer engagement, and interactive data analysis with users expecting immediate results. Reducing Time to Insight and moving from batch to real-time is becoming the most critical requirement. Unfortunately, when data is stored as inflated and unstructured text, queries take forever and consume significant CPU, Network, and Storage resources. Big Data today needs to serve many use cases, users, and large variety in content, data must be accessible and organized for it to be used efficiently. Unfortunately traditional “Data Preparation” processes are slow and manual and don’t scale, data sets are partial and inaccurate, and dumped to the lake without context. As the focus on data security is growing,...
---
### Wanted! A Storage Stack at the speed of NVMe & 3D XPoint
Major changes are happening in storage media hardware – Intel announced a 100X faster storage media, way faster than the current software stack. To make sure the performance benefits are evident, they also provide a new block storage API kit (SPDK) bypassing the traditional stack, so will the current stack become obsolete? Some background on the current storage APIs and Stack Linux has several storage layers: SCSI, Block, and File. Those were invented when CPUs had a single core, and disks were really slow. Things like IO Scheduling, predictive pre-fetch, and page caches were added to minimize Disk IO and seeks. As more CPU cores came, basic multi-threading was added but in an inefficient way using locks. As number of cores grew further and flash was introduced, this became a critical barrier. Benchmarks we carried out some time ago showed a limit of ~300K IOPs per logical SCSI unit (LUN) and high latency due to the IO serialization. LINUX STORAGE STACK NVMe came to the rescue, borrowing the RDMA hardware APIs and created multiple work queues per device which can be used concurrently by multiple CPUs. Linux added blk-mq, eliminating the locks in the block layer, with blk-mq we saw 10x more IOPs per LUN and lower CPU overhead. Unfortunately locks and serialization are still there in many of the stack components like SCSI, File systems, RAID/LVM drivers, . . and it will take years to eliminate them all. NVMe drivers are quite an improvement and solve the case for...
---
### Cloud-Native Will Shake Up Enterprise Storage!
Enterprise IT is on the verge of a revolution, adopting hyper-scale and cloud methodologies such as Micro-services, DevOps and Cloud-Native. As you might expect the immediate reaction is to try and apply the same infrastructure, practices and vendor solutions to the new world, but many solutions and practices are becoming irrelevant, SAN/VSAN and NAS among others. Read my previous blog post for background on Cloud-Native, or this nice post from an eBay expert. Overview In the new paradigms we develop software the way cloud vendors do: We assume everything can break Services need to be elastic Features are constantly added in an agile way There is no notion of downtime The way to achieve this nirvana is to use small stateless, elastic and versioned micro-services deployed in lightweight VMs or Docker containers. When we need to scale we add more micro-service instances. When we need to upgrade, DevOps guys replace the micro-service version on the fly and declare its dependencies. If things break the overall service is not interrupted. The data and state of the application services are stored in a set of “persistent” services (Which will be elaborated on later), and those have unique attributes such as Atomicity, Concurrency, Elasticity, etc. specifically targeting the new model. If we contrast this new model with current Enterprise IT: Today, application state is stored in Virtual Disks. This means we have to have complex and labor intensive provisioning tools to build it, snapshot, and backup. Storage updates are not atomic so we...
---
### Architecting BigData for Real Time Analytics
BigData is quite new, yet when we examine the common solutions and deployment practices it seems like we are going backwards in time. Manual processes, patches of glue logic and partial solutions, wasted resources and more ... are we back in the 90’s? Can we build it more efficiently to address real-world business challenges? Some Background First It all began more than a decade ago, a few folks at Google wanted to rank the internet pages, and came with a solution based on GFS (Google File System) and a MapReduce batch processing concept. This concept was adopted by people at Yahoo who formed the Hadoop open source project. Hadoop key components are the same, a distributed file system (HDFS) and MapReduce. Over time more packages were added to the Hadoop and Apache family (some that are promoted by competing vendors, have significant overlap). Due to the limitations of HDFS a new pluggable file system API called HCFS (Hadoop Compatible File System) was introduced. It allows running Hadoop over file or object storage solutions (e. g. Amazon S3, CephFS, etc. ), and the performance limitations of MapReduce led to alternative solutions like Apache Spark for processing and YARN for scheduling, those changes are quite substantial given HDFS and MapReduce were the Hadoop foundations. The original assumptions and requirements from Hadoop environments were quite modest not to say naïve; Data was uploaded to the system for batch processing – so no need to support data modification The entire data was scanned – so no need...
---
## News
### Iguazio CTO: Successful AI depends on data AND trust
---
### Iguazio Named to the Constellation ShortList™ for MLOps Q1 2025
---
### An Architect’s Guide to the Top 10 Tools Needed to Build the Modern Data Lake
---
### McKinsey offering aims to bridge the gap from AI prototypes to production
---
### 2024 Top Performer MLOps Platform
---
### AiThority Interview with Asaf Somekh, Co-Founder & CEO of Iguazio (acquired by McKinsey)
---
### 5 Best End-to-End Open Source MLOps Tools
---
### The Architect’s Guide to the GenAI Tech Stack — 10 Tools
---
### McKinsey accelerates gen AI value creation with Iguazio
---
### Iguazio named to Constellation's ShortList™ MLOps – Feb 2024
---
### Iguazio named to the CB Insights LLMOps (Large Language Model Operations) Market Map
---
### McKinsey & Company receives the MongoDB 2024 Transformation Partner Award for its work with Iguazio
---
### The 21 Best Artificial Intelligence Platforms Of 2024
---
### Iguazio Listed in Constellation's ShortList™ MLOps
---
### The (un)real world of Generative AI
---
### Musk, AI, And 'Civilizational Destruction: Prophecy or Product Launch?
---
### McKinsey acquires Iguazio, a leader in AI and machine-learning technology
---
### Iguazio Named a Major Player in the IDC MLOps MarketScape 2022
---
### Iguazio Named a Leader and Outperformer In GigaOm Radar for MLOps 2022
---
### Iguazio Named in 8 Gartner Hype Cycles for 2022
---
### Sense Selects Iguazio for AI Chatbot Automation
---
### Iguazio Partners with Snowflake to Automate and Accelerate MLOps
---
### Gartner 2022 Market Guide for DSML Engineering Platforms
---
### Iguazio named in The Coolest Data Science And Machine Learning Tool Companies Of The 2022 Big Data 100
---
### Iguazio named in Forrester's Now Tech: AI/ML Platforms, Q1 2022
---
### All That Hype: Iguazio Listed in 5 Gartner Hype Cycles for 2021
---
### Iguazio Named a Fast-Moving Leader by Gigaom in the Radar for MLOps Report
---
### Iguazio Partners with Pure Storage to Operationalize AI for Enterprises
---
### Iguazio MLOps Platform Launches in AWS Marketplace
---
### MLOps: The Latest Shift in the AI Market in Israel
---
### Iguazio Announces ‘MLOps for Good’ Virtual Hackathon
---
### Boston Limited and Iguazio Partner to Operationalize AI for the Enterprise
---
### Iguazio Named A Fast Moving Leader by GigaOm in the ‘Radar for MLOps’ Report
---
### The Coolest Data Science And Machine Learning Tool Companies Of The 2021 Big Data 100
---
### Iguazio Named Leader and Fast Mover in GigaOm Radar for Evaluating Machine Learning Operations (MLOps)
---
### The Next-Level of Operationalizing Machine Learning: Real-time Data Streaming into Data Science Environments
---
### Iguazio Receives an Honorable Mention in the 2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms
---
### The AI Infrastructure Alliance Launches With 25 Members to Create the Canonical Stack for Artificial Intelligence Projects
---
### Git-based CI/CD for Machine Learning and MLOps
---
### Sheba, Iguazio to Develop Real-Time AI to Optimise Patient Care
---
### Iguazio Signs Strategic Agreement with Sheba Medical Center for Real-Time Covid-19 Treatment
---
### Iguazio Launches the First Integrated Feature Store Within its Data Science Platform
---
### The 12 Coolest Machine-Learning Startups Of 2020
---
### An AI Engineer Walks Into A Data Shop...
---
### Why You Need to Start Thinking About MLOps
---
### SFL Scientific And Iguazio Partner To Speed Up Custom AI Development For Fortune 1000 Companies
---
### SFL Scientific and Iguazio Partner to Speed Up Custom AI Development for Fortune 1000 companies
---
### NetApp Deploys Iguazio to Run AI-Driven Digital Advisor on Active IQ
---
### Iguazio Becomes Certified for NVIDIA DGX-Ready Software Program
---
### Iguazio & NetApp Partner to Accelerate Deployment of AI
---
### NetApp, Iguazio Build Joint Tech To Accelerate AI Deployments
---
### The Coolest Data Science And Machine Learning Tool Companies Of The 2020 Big Data 100
---
### Enabling end-to-end machine learning workflows with Iguazio
---
### Iguazio Receives an Honorable Mention in the Gartner MQ for Data Science and ML Platforms
---
### Iguazio raises $24 million for AI development and management
---
### Dell Technologies Introduces New Solutions with Iguazio
---
### PICSIX Partners with Iguazio
---
### The Rise of MLOps: What We Can All Learn from DevOps
---
### Bringing AI and Machine Learning to the Masses
---
### Takeaway from MLOps NYC: Open Source Frameworks Need TLC
---
### Top 10 IoT Startups Of 2019
---
### Hitting the Reset Button on Hadoop
---
### CEO Q&A: Modern Platforms for Data Science
---
### Iguazio Brings Its Data Science Platform to Azure and Azure Stack
---
### Samsung SDS Backs Data Company Iguazio
---
### Q&A with Iguazio: on Data Science, Data Analytics, and Serverless
---
### How Serverless Platforms Could Power an Event-Driven AI Pipeline
---
### Removing Data Blockage at the Edge
---
### The 10 Coolest New Open-Source Technologies And Tools Of 2018
---
### Google Cloud collaborating with Iguazio to enable real-time AI across the cloud and intelligent edge
---
### Don’t get cloudwashed: The case for cloud on-prem in hybrid computing
---
### Even in the cloud, banking is tied to legacy tech
---
### Other Vendors to Consider for Operational DBMSs
---
### Iguazio’s New Nuclio Release Enables Serverless Agility for Enterprises Deploying Real-time Intelligent Applications
---
### SD Times Blog: Getting a serverless reality check
---
### Iguazio revamps its Nuclio serverless computing platform
---
### SD Times news digest: Iguazio’s Nuclio release, Kotlin 1.3, and reCAPTCHA v3
---
### Iguazio Selected for CNBC's Upstart 100 List of Promising Startups
---
### Instead of sending data to the cloud, why not send the cloud to the edge?
---
### Is cloud native starting to kill Hadoop? This CTO says yes
---
### This startup thinks it knows how to speed up real-time analytics on tons of data
---
### Equinix and Iguazio partner to drive smart mobility
---
### Enabling smart transportation in today's interconnected world
---
### Iguazio’s Nuclio Serverless Software Aims to Outrun AWS
---
### The Car's Eyes and Ears at TC TLV 2018
---
### Enterprise startups in Israel worth getting to know
---
### Bigger than Linux: The rise of cloud native
---
### 2018 Big Data 100: 45 Coolest Data Management And Integration Vendors
---
### Serverless computing takes a big step into the multicloud world
---
### Serverless framework Nuclio released for enterprise customers
---
### This Israeli Startup Partners With Amazon -- But Could Compete With AWS In $8B Edge Cloud Market
---
### Can Open-Source Serverless Be Simpler than AWS Lambda?
---
### Asaf Somekh, Founder & CEO at iguazio, talks delivering dreams
---
### To reach its full promise, big data must begin with a clean slate
---
### Big Data Startup iguazio Debuts Its First Channel Program, Seeks VARs and SIs with Vertical Industry Expertise
---
### Asia Pacific expansion on the horizon for iguazio with opening of Singapore headquarters
---
### iguazio Boldly Taunts AWS’ Lambda with nuclio Serverless Platform
---
### Surge pricing: How it works and how to avoid it
---
### Serverless Framework for Real-Time Apps Emerges
---
### AI depends on having the right data for real-time decision-making
---
### With an eye on Asia, Israeli startup iguazio counts IPO in its roadmap
---
### iguazio releases high-speed serverless platform to open source
---
### Tutorial: Faster AI Development with Serverless
---
### This tech firm offers a big data solution for businesses
---
### 2 lessons cloud native companies have for enterprise leaders
---
### nuclio and the Future of Serverless Computing
---
### Actionable Insights: Obliterating BI, Data Warehousing as We Know It
---
### Iguazio, the Anti-Hadoop, Goes GA
---
### Yaron Haviv, iguazio's CTO on theCube
---
### As its data cloud launches, iguazio nabs Grab as a marquee customer
---
### Entrepreneur Spotlight: Asaf Somekh, Founder & CEO of iguazio
---
### AI&ML tech talk: iguazio
---
### Reimagining the Data Pipeline Paradigm as a Continuous Data Insights Platform
---
### Verizon, CME Group, Bosch Invest In Continuous Data Analytics Start-Up Iguazio
---
### Extend Kubernetes 1.7 with Custom Resources
---
### Bosch: Investing In Data-Driven Innovation
---
### Iguazio nabs $33M to bring big data edge analytics to IoT, finance and other enterprises
---
### Robert Bosch Venture Capital invests in iguazio
---
### Data analytics startup Iguazio reaps $33m in second funding round
---
### Israeli Startup Iguazio Attracts $33M Series B
---
### CRN: The 10 Coolest Big Data Startups Of 2017 (So Far)
---
### Opinion: We’ll Be Enslaved to Proprietary Clouds Unless We Collaborate
---
### Opinion: It’s time open source focused on usability
---
### And the 2017 Cool Vendors Are…
---
### Podcast: Intel and iguazio Processing with Continuous Analytics
---
### To get the most from containers, go cloud-native or go home
---
### iguazio Re-Architects the Stack for Continuous Analytics
---
### iguazio highlights continuous analytics use cases for converged data services platform
---
### Opinion: Managing Data on the Edge
---
### Cool Company: iguazio
---
### Strata: Cloudera, MapR and others focus on consolidating the sprawl
---
### iguazio speeds up big data delivery with continuous analytics platform
---
### Exclusive Interview with Asaf Somekh, CEO and Co-Founder, iguazio
---
### A Hacker’s Guide to Kubernetes Networking
---
### iguazio takes its performance-boosting data platform global with help from Equinix
---
### iguazio and Equinix join forces to deliver a new Data-Centric Processing platform
---
### iguazio: Made from Kia parts but faster than a Ferrari with 1,000 drivers
---
### The Hadooponomics Podcast – Building Big Data, Better: Why Integration, Not Infrastructure, Is Key
---
### Organizations Look for Simplicity, Affordability in Data Lakes
---
### Fast Enterprise Data Cloud Platform by iguazio
---
### Disruptive Technology, Monotonous Marketing At Strata+Hadoop World
---
### New products & solutions shaping the enterprise IT landscape
---
### iguazio’s CTO, Yaron Haviv on theCUBE
---
### iguazio launches Enterprise Data Cloud service to speed Big Data
---
### Rethinking the cloud platform with an integrated, turnkey solution
---
### iguaz.io Unveils Virtualized Data Services Architecture
---
### Iguaz.io promises AWS-like storage in the data center
---
### iguaz.io Unveils Virtualized Data Services Architecture
---
### Data services startup Iguaz.io aims to untangle Big Data hairball
---
### Harnessing data in real time | #SparkSummit
---
### These are the coolest big data startups of 2015!
---
### These Are the Big Data Startups That Won 2015
---
### Israeli Stealthy Start-Up Iguaz.io Raises $15 Million in Series A
---
### Startup Iguaz.io is creating real-time Big Data analytics storage
---
### Iguaz.Io raises $15 million in series a funding to disrupt big data storage
---
### Data management start-up iguaz.io raises $15m
---
## Events
### MLOps Live #37 Building Agent Co-pilots for Proactive Call Centers
---
### MLOps Live #36 How to Manage Thousands of Real-Time Models in Production
---
### MWC 2025
---
### MLOps Live #35 - Beyond the Hype: Gen AI Trends and Scaling Strategies for 2025
---
### MLOps Live #34 - Agentic AI Frameworks: Bridging Foundation Models and Business Impact
---
### MLOps Live #33 - Deploying Gen AI in Production with NVIDIA NIM & MLRun
---
### MLOps Live #32 - Gen AI for Marketing - From Hype to Implementation
---
### MLOps Live #31 - Building Scalable Customer-Facing Gen AI Applications Effectively & Responsibly
---
### MLOPs Live #30 - Implementing Gen AI in Highly Regulated Environments
---
### MLOps Live #29 - Transforming Enterprise Operations with Gen AI
---
### MLOps Live #28 - Improving LLM Accuracy & Performance
---
### MLOps Live #27 - LLM Validation & Evaluation
---
### MLOps Live #26 - Implementing a Gen AI Smart Call Center Analysis App
---
### MLOps Live #25 - GenAI for Financial Services
---
### MLOps Live #24 - How to Build an Automated AI ChatBot
---
### ODSC West 2023
---
### AI at Scale
---
### NVIDIA GTC
---
### Future of AI 2022
---
### MLOps Live #22 How Seagate Handles Data Engineering at Scale
---
### MLOps Live #20: How to Easily Deploy Your Hugging Face Model to Production at Scale
---
### AIIA: Data-Centric AI Summit
---
### MLOps LATAM Micro-Summit (Hybrid)
---
### TMLS MLOps World: Conference on Machine Learning in Production 2022
---
### Snowflake Summit
---
### MLOps: Machine Learning in Production / New York City Summit
---
### ODSC Europe
---
### ODSC East
---
### Understanding Fraud Prediction for Banking & Finance Sector through Iguazio & Royal Cyber
---
### MDLI Ops Conference 2023
---
### ODSC Webinar: Git Based CI/CD for ML
---
### Session #17: Scaling NLP Pipelines at IHS Markit
---
### GTC
---
### ODSC West
---
### NetApp Insight 2021
---
### Iguazio MLOps Platform Launches in AWS Marketplace
---
### Kubecon North America 2021
---
### MLOps: Machine Learning in Production New York City
---
### "Building a Real-Time ML Pipeline with a Feature Store" - MLOps Live Webinar #16
---
### #MLOpsforGood Award Ceremony
---
### MLOps in Finance Summit
---
### MLOps for Good Hackathon
---
### MLOps World: Machine Learning in Production Conference
---
### ODSC EUROPE 2021
---
### Expert Panel: AI for Connected Vehicles
---
### Webinar: "Activate Data: Data Science Innovation with MongoDB & Iguazio"
---
### KubeCon NA
---
### Toronto Machine Learning Summit
---
### NetApp INSIGHT 2020
---
### Data Science Salon | Applying AI & ML to Healthcare, Finance & Technology
---
### MLOps Series Library
---
### MLOps Live Webinar #15: 'Automated Model Management for CPG Trade Effectiveness with Tredence'
---
### MLOps NYC
MLOps NYC 2019 gathered industry leaders from companies like Netflix, Google, Twitter and Uber to share their biggest MLOps challenges and solutions.
---
### NVIDIA GTC: Deep Learning & AI Conference
---
### KubeCon Europe
The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud-native communities in Amsterdam
---
## PRs
### McKinsey & Company Acquires Iguazio to Accelerate & Scale Enterprise AI
Acquisition will enable Artificial Intelligence’s (AI) full power and potential to be realized across commercial, social, and environmental initiatives. McKinsey & Company today (23rd January) announced that it has acquired Iguazio, a Tel-Aviv-based leader in Artificial Intelligence and Machine Learning. McKinsey will be able to accelerate and scale AI deployments dramatically with the addition of Iguazio’s technology and a team of 70+ data and AI experts. To thrive in today’s competitive market, harnessing the power of Artificial Intelligence (AI) is essential. According to McKinsey research, more than $490B was invested in AI by organizations around the globe from 2012-2021. But for most, the actual value of those investments has yet to be realized, with only one in ten projects making it outside the lab. QuantumBlack, AI by McKinsey has been working with clients to address these challenges head-on and embed AI into real-time decision-making. The final element we have been working on is a technology solution that will accelerate AI deployment, embedding it in real-time and in any environment. “After analysing more than 1000 AI companies worldwide, Iguazio was identified as the best fit to help us significantly accelerate our AI offering – from the initial concept to production, in a simplified, scalable and automated manner,” said Ben Ellencweig, McKinsey senior partner and QuantumBlack global leader of alliances and acquisitions. “By joining forces with Iguazio, we can now deepen the unparalleled, disruptive, end-to-end AI capabilities we offer to our clients. ” Working with Iguazio, QuantumBlack will now be able to...
---
### Sense Selects Iguazio for AI Chatbot Automation with AWS, Snowflake and NVIDIA
Sense will use the Iguazio MLOps platform for a large range of AI products, beginning with the Sense AI Chatbot, an intelligent, automated recruiting assistant that speeds the hiring process and provides uncompromised personalization, already available to 700+ customers. San Francisco, 27th July, 2022 – Iguazio, the MLOps platform provider, and Sense, a market leader in artificial intelligence (AI) driven talent engagement for recruiting, today announced that the Iguazio MLOps platform has been selected to power a wide range of AI products aimed at increasing the efficiency and scalability of Sense’s AI operations. Sense is leveraging automation and AI to speed up the recruitment process, while delivering a hyper-personalized candidate experience. Sense is a leader in talent engagement, with over 700 customers and an annual growth rate of over 100%. Sense provides the Sense AI Chatbot - an automated recruiting assistant that can engage with candidates 24/7, responding to their queries in real-time even when human recruiters are offline. It engages with candidates across SMS, mobile, and web, matches them to jobs, schedules interviews, and handles intelligent communications, including FAQs. The chatbot pairs conversational AI with automated communication and engagement workflows so organizations can engage with candidates at scale. Sense has a large team of data scientists and machine learning (ML) engineers with deep expertise in conversational AI – both voice and text. The team’s challenge was building the complex natural language processing ( NLP) serving pipeline, with custom model ensembles, to track question-to-question context and enable...
---
### Iguazio Partners with Snowflake to Automate and Accelerate MLOps
The Iguazio MLOps Platform and built-in Feature Store now offers connectivity to the Snowflake Data Cloud, providing a full solution for enterprises looking to generate real business value with data science in an efficient, scalable and repeatable way Las Vegas, 14th of June, 2022 – Iguazio, the MLOps platform provider, today announced a new partnership with Snowflake, the Data Cloud company, which includes connectivity of Iguazio’s solution for automating ML pipelines and built-in feature store with Snowflake’s Data Cloud. The solution is already being implemented by numerous customers, including Fortune 500 companies, to abstract away the complexities of MLOps and create a joint, production-ready environment for data scientists to continuously roll out new AI services across the organization. The Iguazio MLOps platform can accelerate the data science process up to 12x and make more efficient use of AI resources, like GPUs, through better orchestration and automation. “We are excited to partner with Iguazio to bring end-to-end MLOps capabilities, including a built-in feature store, to our joint customers,” said Tarik Dwiek, Sr. Director Technology Alliances at Snowflake. “We see the need for MLOps automation continuing to grow as data science matures in the enterprise and we look forward to offering deeper integrations with Iguazio in the coming months. ” The Iguazio platform now comes with a built-in Snowflake connector that powers the built-in online and offline feature store in Iguazio with data from Snowflake, allowing enterprises to seamlessly access the Data Cloud to build, store and share features that are...
---
### LATAM Airlines Chooses Iguazio to Operationalize Machine Learning
LATAM Airlines Group is planning to deploy over 40 AI products on the Iguazio MLOps platform in the upcoming weeks, paving the way for other airlines that are choosing AI innovation as their strategy for 2022 NEW YORK, NY, 29th of March, 2022 – Iguazio, the MLOps (machine learning operations) platform provider, today announced that LATAM - the leading airline group in Latin America - has selected its MLOps platform for a large scale, cross-company AI innovation project. The project will span the entire organization and will include use cases such as optimizing and safeguarding the company’s popular frequent flyer program from fraud, improving pilot training through better understanding of the factors that create un-stabilized approaches to landing, and intelligent route planning to reduce CO2 emissions. “The airline industry needs to rethink its strategy post-COVID”, commented Juliana Rios, IT & Digital Vice President at LATAM. “We chose the Iguazio MLOps platform to operationalize data science, fueling more efficient and environmentally-friendly operations, enhanced safety, and better customer service”. LATAM Airlines Group works extensively with GCP, utilizing tools like Google Big Query, Google Cloud Storage, and Google Workload Identity. Iguazio is fully compatible with GCP and has a strong partnership with Google. LATAM Airlines is planning to deploy its AI products on GCP using Iguazio. “LATAM is leading the way for other Airlines in their vision, and how they are using AI to create business value and plan for the post-pandemic future”, said Asaf Somekh, Co-Founder and CEO at Iguazio. “They have...
---
### Iguazio Partners with Pure Storage to Operationalize AI Production-First
Through integrations with Pure Storage and the Iguazio MLOps platform, enterprises can deploy and scale AI across multi-cloud and hybrid use cases New York, NY, 26th October 2021 - Iguazio, the MLOps (machine learning operations) company, today announced that it has become a Pure Storage technology partner. The new partnership will empower enterprises to unlock the value of their data and bring data science projects to life in an efficient, automated, and repeatable way. Together, Iguazio and Pure Storage empower enterprises to continuously roll out new AI services by adopting a production-first mindset using technologies that later allow them to scale with AI. By providing enterprise-grade capabilities around data management, scale and performance, and a layer of abstraction and automation, enterprises can start to focus on their business applications and not the underlying infrastructure. “Many of our customers are seeing the enormous potential for AI-driven innovation, so we strive to provide our customers with the tools and infrastructure they need for the next step of AI” said Grace Chung, Director, Strategic Alliances, Analytics at Pure Storage. “We’re excited to help Pure and Iguazio’s join customers unlock the value of their data with an agile MLOps solution where compute and storage scales along with the needs of the business”. “Most organizations starting out with AI initially focus on research and model development, because that is often seen as the natural first step to the data science process”. Commented Asaf, Co-Founder and CEO of Iguazio. “However, in order to effectively industrialize AI, enterprises need to think about the components they will need once they are running their AI applications in production at scale - as early on as possible. Together, Pure and Iguazio provide a complete MLOps solution with enterprise-grade data management capabilities, providing the necessary building blocks to efficiently and cost-effectively support diverse and growing AI use cases which fuel business growth”. With the new...
---
### Iguazio MLOps Platform Now Supports Amazon FSx for NetApp ONTAP
Iguazio is the first MLOps platform to enable FSx for ONTAP as a part of its end-to-end capabilities for bringing data science to production at scale, in real-time and in hybrid environments NEW YORK, NY, 20th October, 2021 – Iguazio, the MLOps (machine learning operations) company, today announced its support for the new FSx for ONTAP. FSx for ONTAP provides fully managed shared file and block storage on AWS Cloud with the popular data access and management capabilities of ONTAP. Iguazio provides a leading MLOps platform used by enterprises worldwide to accelerate the deployment of AI in production, reduce complexities and minimize cost of AI infrastructure through end-to-end ML pipeline automation. Iguazio also supports complex use cases requiring scale and real-time, or deployment in hybrid environments. “So many of our customers are adopting cloud native strategies and AI to create new competitive advantages, increase agility and reduce costs”, commented Ronen Schwartz, SVP and GM Cloud Volumes at NetApp. “Enterprises building AI Applications at scale with enterprise grade storage now have the option to efficiently roll out new AI services and support real-time ML applications at peak performance with Iguazio and FSx for ONTAP, all from within their AWS account. Iguazio is a strategic NetApp partner. The new support of FSx for ONTAP provides customers with enterprise-level data management, advanced storage services including as tiering and snapshotting, and high performance and scalability, supporting the most extreme workloads. It provides a simple end-to-end solution for deploying and managing...
---
### Iguazio MLOps Platform Launches in AWS Marketplace
AWS customers globally can now access Iguazio and unlock a faster path to the deployment of AI at scale NEW YORK, NY, 6th of October, 2021 – Iguazio, the MLOps (machine learning operations) company, today announced its availability in the AWS Marketplace, a digital catalog with thousands of software listings from independent software vendors that makes it easy to find, test, buy, and deploy software that runs on Amazon Web Services (AWS). This new availability provides AWS customers with access to Iguazio’s MLOps solution, which automates machine learning (ML) pipelines end-to-end and accelerates deployment of artificial intelligence (AI) to production by 12X. By way of example, the Hydroinformatics Institute in Singapore (H2i) uses Iguazio on AWS to build and run a real-time ML pipeline that predicts rainfall by analyzing videos of cloud formations and running CCTV-based rainfall measurements. “With Iguazio, we are now able to analyze terabytes of video footage in real time, running complex deep learning models in production to predict rainfall,” said Gerard Pijcke, Chief Consultancy Officer, H2i. “Repurposing CCTV-acquired video footage into rainfall intensity can be used to generate spatially distributed rainfall forecasts leading to better management of urban flooding risks in densely populated Singapore. ” AWS customers can now purchase Iguazio through their AWS Marketplace account in just a few clicks, without having to deal with new contracts or legal hassles. “Iguazio is a great addition to AWS Marketplace, with their unique solution to accelerate AI deployment, even at scale, in real time,” said Mona Chadha,...
---
### Boston Limited and Iguazio Partner to Operationalize AI for the Enterprise
Tel Aviv and London, 15th June 2021 - Iguazio, the data science & MLOps platform build for production, today announced a strategic partnership with Boston Limited, an NVIDIA Elite Partner and leading provider of high-performance, mission critical server and storage solutions. The partnership enables both companies to extend their offerings to enterprises across industries looking to bring data science into real life applications, and accelerate their path to production. Data science is becoming a critical element of business strategy in enterprises across industries. Companies need better ways to implement their AI solutions in real-world environments, to help them cut costs, work more efficiently, and accelerate the rollout of new AI services and products for customers. Yet as enterprises navigate the journey from data science to live AI applications, they often find the move to production challenging and complex. They need new tools and technologies to help manage this, especially as they scale. This radical shift to transforming business models with AI typically requires highly customizable infrastructure, as well as a streamlined data science workflow to navigate the transformation effectively and efficiently. With the new partnership, Boston Limited will offer high-performance data center hardware and technical services, while Iguazio provides its data science platform which saves time and cost on getting AI to production. “Our primary focus is to provide our customers with the ability to customize their solutions based on their toughest business requirements. The partnership with Iguazio allows us to facilitate greater enterprise AI capabilities within our existing...
---
### Iguazio Announces First-Ever ‘MLOps for Good’ Virtual Hackathon
Data Scientists, Data Engineers and MLOps Practitioners called upon to bring data science to production for immediate social impact. NEW YORK, NY, May 24, 2021 --Iguazio, the Data Science Platform built for production, today announced its first-ever global virtual hackathon, which is starting today and will take place until June 29th, 2021. With a mission to foster projects that can immediately impact real-world issues, Iguazio, partners Microsoft, MongoDB, and sponsor Aztek seek individuals who want to bring data science to production to do good together. Teams can register to join the hackathon at mlopsforgood. devpost. com. 2020 has been a tough year. AI models that can tackle topics like improving healthcare, detecting fake news and making the web safer for children have the potential to create positive change in many areas, but unfortunately, bringing data science to production so that it can generate real-world impact is still a largely unsolved challenge. This hackathon calls out to data scientists, data engineers and MLOps practitioners who want to create real world impact now, by building not models but fully functional AI applications to generate immediate positive change. Global challenges are defined by the UN Sustainable Development Goals and will serve as guidestars for the teams to focus on real-life social and environmental goals. Submissions will be judged on solution innovation, commerciabilty & applicability, and ML operationalization / repeatability or the completeness of the functional ML pipeline, with additional points awarded for building real-time pipelines, creating the ability to retrain models,...
---
### Iguazio Launches Integrated Feature Store to Accelerate AI Deployment
Iguazio Launches the First Integrated Feature Store within its Data Science Platform to Accelerate Deployment of AI in Any Cloud Environment The first production-ready integrated solution for enterprises to catalogue, store and share features centrally, and use them to develop, deploy and manage AI applications across hybrid multi-cloud environments Iguazio’s feature store tackles one of the greatest challenges in machine learning operations (MLOps) today - feature engineering The feature store is a key component in Iguazio’s data science platform, which is used by customers such as Payoneer, Quadient and Tulipan to deploy AI faster, and has just been selected by the Sheba Medical Center to deliver real-time AI for COVID-19 patient treatment optimization Joint solutions with strategic partners Tredence, NetApp, MongoDB and others are already enabled by Iguazio’s feature store, offering reproducible real-time ML pipelines Iguazio Founders left to right: Yaron Haviv, Yaron Segev, Orit Nissan-Messing, Asaf Somekh NEW YORK, NY, December 16, 2020 --Iguazio, the Data Science Platform built for production and real-time machine learning (ML) applications, today announced that it has launched the first production-ready integrated feature store. The feature store, which sits at the heart of its data science platform, enables enterprises to catalogue, store and share features for development and deployment of AI in hybrid multi-cloud environments and is built to handle real-time use cases. According to Gartner, one of the top barriers to AI implementation is the “complexity of AI solution(s) integrating with existing infrastructure”. At the core of machine learning is the data, and...
---
### Sheba Medical Center Partners with Iguazio for Real-Time COVID-19 AI
Iguazio was selected to facilitate Sheba’s transformation with AI through clinical and logistical use cases such as predicting and mitigating COVID-19 patient deterioration and optimizing patient journey with smart mobility Joint projects include collaboration on hybrid and multi-cloud AI deployments using Microsoft Azure and Google GCP On Dec. 30, Sheba will be holding a Big Data and AI conference, where the projects will be presented. Other hospitals and medical centers worldwide are invited to get in touch with ARC (Accelerate Redesign Collaborate) at Sheba for more information and to discuss how to implement real-time AI in their health facilities Sheba ARC and Iguazio Ink Agreement from left to right: Asaf Somekh, Eyal Zimlichman (Herzliya, Israel, December 15, 2020) -- Iguazio, developers of the Data Science Platform built for production and real-time machine learning (ML) applications, announced that it is working with the Sheba Medical Center’s ARC innovation complex to deliver real-time AI across a variety of clinical and logistical use cases in order to improve COVID-19 patient treatment. Sheba is the largest medical facility in Israel and the Middle East and has been ranked amongst the Top 10 Hospitals in the World by Newsweek magazine. Iguazio was selected to facilitate Sheba’s transformation with real-time AI and MLOps (machine learning operations) in a variety of projects. One of these projects is optimization of patient care through clinical, real-time predictive insights. Using the Iguazio Data Science Platform, Sheba is incorporating real-time vital signs from patients by utilizing the patient’s medical history...
---
### Iguazio Achieves AWS Outposts Ready Status to Accelerate AI in Hybrid
NEW YORK, NY, November 30, 2020 --Iguazio, the Data Science Platform built for production and real-time machine learning (ML) applications, today announced that it has achieved the AWS Outposts Ready designation, part of the Amazon Web Services (AWS) Service Ready Program. This is a notable development for AWS and Iguazio customers who can utilize Amazon SageMaker to develop artificial intelligence (AI) models and data pipelines, and easily deploy and manage these in production using the Iguazio Data Science Platform on AWS and now also on AWS Outposts, benefiting from the same high performance at scale in hybrid AWS environments. Two main challenges are hindering adoption of AI for Enterprises and Government agencies. The first is an increase in the need for hybrid solutions to manage data and data science applications, to address data locality in accordance with a rise in regulation and data privacy considerations. The second is an increase in first-hand experiences with the challenges and complexities involved in operationalizing machine learning, especially when considering hybrid deployment options, and when scaling data science across the organization. AWS Outposts is a fully managed service that extends AWS infrastructure, AWS services, API and tools to virtually any datacenter, co-location space, or on premises facility for a truly consistent hybrid experience. Iguazio’s platform offers a fully automated MLOps solution for data engineers, data scientists, and DevOps teams, which includes a high-performance serverless framework and a fast online and offline feature store. The new seamless integration of Iguazio’s technology with...
---
### Faktion and Iguazio Bring Data Science to Production for Smart Mobility
HERTZELIA, Israel, 21 September 2020 - Iguazio, the data science platform built for production and real-time machine learning applications, today announced a strategic partnership with Faktion, a boutique, end-to-end Artificial Intelligence (AI) service provider for smart mobility technologies across Europe. The partnership enables both companies to provide AI software infrastructure and services to smart mobility companies looking to harness big data and create AI applications that make travel in cities safer, greener, and more efficient. The two companies already have several joint European customers in the Smart Mobility space, tackling challenges such as dynamic congestion charges based on real-time data and detecting and acting upon driver fatigue. Congestion cost the U. K. nearly £8 billion in 2018. This is the main problem Airvi, a startup which works with cities to regulate and reduce traffic congestion and pollution according to data delivered in real-time, is working to fix. Airvi works with cities to regulate and reduce traffic congestion and pollution using data delivered in real-time. Along with Faktion and using Iguazio's Data Science Platform, it is creating virtual, dynamic zones that adapt to fluid air and congestion patterns to better manage traffic. By harnessing data at scale, from sources such as cell phones, weather sensors and other diverse data sources, through Iguazio’s platform, it can recommend dynamic tariffs - that will deter drivers from entering high traffic and pollution zones. “Faktion’s deep industry experience and Machine Learning skill set, combined with Iguazio’s platform, has made our product viable...
---
### PadSquad Deploys the Iguazio Data Science Platform to Predict Ad Performance in Real-Time
PadSquad uses Iguazio to deliver high-performing interactive campaigns served in real-time for Fortune 500 brands including Intel, Verizon and Novartis. The company analyzes behavioral data from multiple sources in order to serve the right ad creative to the right customer at the right moment. Iguazio selected for ease of AI deployment and ability to bring data science solutions to market quickly and cost-effectively. Padsquad uses Iguazio’s platform end-to-end on AWS. NEW YORK, NY, September 21st, 2020 --Iguazio, the data science platform built for production and real-time machine learning applications, today announced it has been deployed by mobile software company PadSquad, to improve the relevance and performance of the digital campaigns they run for their customers worldwide. PadSquad is revolutionizing traditional media with interactive features and innovative technologies that transform the audiences’ experience and engagement with ad creatives. Iguazio was deployed by PadSquad to use AI to improve ad performance and reduce media costs for their customers. They do this by ingesting and acting upon real-time events - from contextual content on the page, engagement with creative elements like video views, swipeable panels and hot spots, to the season and time of day - at a rate of over 3,000 events per second. Utilizing online and offline behavioral data from multiple sources, available to them through third party platforms and their own internal tools, Padsquad can now harness machine learning to optimize ad performance and provide a better and more personalized user experience for their customers’ audiences. “Real time...
---
### SFL Scientific and Iguazio Partner to Accelerate Custom AI Development
● Top tier consultancy partners with a leading data science platform company to simplify and expedite development and deployment of AI for enterprises across industries: Finance, insurance, healthcare, retail, manufacturing, gaming, AdTech, etc. ● Enterprises can now easily incorporate AI and MLOps automation to create business impact through endless applications such as: Predictive maintenance, real time recommendations, KYC (know your client) and fraud prevention● The partnership will speed up deployment of AI services at lower cost 14 July 2020 – Tel Aviv and New York, NY - Iguazio, the data science platform for real-time machine learning applications, today announced a strategic partnership with SFL Scientific, a leading data science consulting firm. The partnership will enable both companies to extend their offerings to enterprises of all industries looking to apply AI to real life applications, regardless of the size or skill set of their internal teams. In today’s economic environment, enterprises across industries are looking to develop viable AI solutions that help them cut costs, work more efficiently, and develop new services and products for customers. However, as enterprises navigate the journey to develop viable AI solutions and derive the business benefits from analyzing big data, they must transform not only their workforce, standard processes, and operating models, but also modernize critical applications in their infrastructure and architecture. This is often a tremendous task, and one that requires expert support to navigate the transformation effectively and efficiently. With the new partnership, SFL Scientific will offer data strategy and support for algorithm...
---
### NetApp Deploys Iguazio to Run AI-Driven Digital Advisor on Active IQ
NetApp deploys the Iguazio Data Science Platform to boost the infrastructure behind its Active IQ solution, responding in real-time to 10 trillion data points per month collected from storage controllers globally, providing actionable intelligence for predictive maintenance and optimal data management. New York, June 10th, 2020 – Iguazio, the data science platform for real-time machine learning applications, today announced a new strategic customer, NetApp, who is using its platform to analyze 10 trillion data points per month, to automate the support and optimization of storage. NetApp’s Active IQ uses predictive analytics to automate the proactive care and optimization of storage controllers owned by customers around the world. NetApp wanted to build a digital advisor that uses AI at scale and in real-time to continually gain insights on these devices and conduct predictive maintenance on storage, while constantly learning and getting smarter over time. Iguazio has implemented similar predictive maintenance solutions for other customers in the past. Previously built on Hadoop, NetApp was also looking to modernize the service infrastructure to reduce the complexities of deploying new AI services and the costs of running large-scale analytics. In addition, the shift was needed to enable real-time predictive AI and to abstract deployment, allowing the technology to run on multi-cloud or on-premises seamlessly. NetApp turned to Iguazio to replace their traditional data warehouse and Hadoop-based data lake with a Kubernetes-powered, cloud-native, serverless data science platform which can analyze massive amounts of data in real-time. The platform is deployed both in the cloud...
---
### Iguazio Becomes Certified for NVIDIA DGX-Ready Software Program
The Iguazio Data Science Platform enables greater GPU efficiency and helps to democratize AI infrastructure deployment for every enterprise. New York, May 14, 2020 - Iguazio, the data science platform for real-time machine learning applications, announced today that the company’s solution for utilizing GPU-as-a-Service has been certified as part of the
---
### Iguazio and NetApp Collaborate to Accelerate Deployment of AI Applications
New York, May 4th, 2020 – Iguazio, the data science platform for real-time machine learning applications, today announced a strategic partnership with NetApp that provides enterprises with a simple, end-to-end solution for developing, deploying and managing AI applications at scale and in real-time on top of the ONTAP AI framework. Despite the great promise of AI for business applications, many data science projects fail to create business value. In fact, according to Gartner, 85 percent of data science projects fall short of expectations. One of the reasons is that model creation is just the first step, while moving working models into a production environment introduces a whole set of complexities, such as handling data at scale, working in hybrid environments, and harnessing real-time data for predictive applications. Businesses can overcome these challenges by working in a production-ready environment, with a simplified infrastructure that enables them to focus on creating business value. Iguazio’s data science platform provides end-to-end machine learning pipeline automation, coupled with performance & scale, enabling real-time machine learning (ML) applications. It introduces a fresh approach to simplifying MLOps and enabling enterprises to deploy their AI projects quickly and seamlessly. Iguazio’s integration with NetApp ONTAP AI leverages enterprise grade data management, data versioning and NetApp Cloud Volumes for a seamless hybrid cloud experience. It is also fully compatible with KubeFlow 1. 0, offering a managed KubeFlow solution for enterprises. The platform tightly integrates with NVIDIA DGX, allowing customers to utilize GPU-as-a-Service and NGC containers, making more efficient use...
---
### Iguazio Deployed by Payoneer to Prevent Fraud with Real-time Machine Learning
Payoneer uses Iguazio to move from detection to prevention of fraud prevention with predictive machine learning models served in real-time New York, January 13th, 2020 - Iguazio, the data science platform for real time machine learning applications, today announced that Payoneer, the digital payment platform empowering businesses around the world to grow globally, has selected Iguazio’s data science platform to provide its 4 million customers with a safer payment experience. By deploying Iguazio, Payoneer moved from a reactive fraud detection method to proactive prevention with real-time machine learning and predictive analytics. Payoneer overcomes the challenge of detecting fraud within complex networks with sophisticated algorithms tracking multiple parameters, including account creation times and name changes. However, prior to using Iguazio, fraud was detected retroactively, enabling customers to only block users after damage had already been done. Payoneer is now able to take the same sophisticated machine learning models built offline and serve them in real-time against fresh data. This ensures immediate prevention of fraud and money laundering with predictive machine learning models identifying suspicious patterns continuously. The cooperation was facilitated by Innovigates-BeLocal, one of the leading Data and IT solutions integrator for mid and enterprise companies. “We’ve tackled one of our most elusive challenges with real-time predictive models, making fraud attacks almost impossible on Payoneer” noted Yaron Weiss, VP Corporate Security and Global IT Operations (CISO) at Payoneer. “With Iguazio’s Data Science Platform, we built a scalable and reliable system which adapts to new threats and enables us to prevent fraud...
---
### Iguazio Raises $24M to Accelerate Growth of Its Data Science Platform
Iguazio’s data science platform automates machine learning pipelines, enabling a wide range of industries to bring their data science to life. This investment brings Iguazio’s total funding to $72M. HERZLIYA, Israel – January 27th, 2020 – Iguazio, the data science platform for real time machine learning applications, today announced that it has raised $24M of funding. The round was led by INCapital Ventures, with participation from existing and new investors, including Samsung SDS, Kensington Capital Partners, Plaza Ventures and Silverton Capital Ventures. The funds will be used by Iguazio to accelerate its growth and expand the reach of its data science platform to new global markets. The demand for AI applications is on the rise. According to Gartner, AI augmentation alone will create $2. 9 trillion of business value in 2021. However, there are still many challenges in deploying AI solutions in an effective and scalable way. An estimated 87% of data science models which have shown great promise in the lab never make it to production. This is due to the challenges of transforming a great AI model, which is functional in lab conditions, to a fully operational AI application that can deliver business impact at scale and in real time. Iguazio solves this problem and brings data science to life for enterprises worldwide. The Iguazio data science platform helps data scientists create real-time AI applications while working within the familiar machine learning stack they know and love. The platform has been deployed by enterprises spanning a variety of...
---
### PICSIX Launches Investigative Intelligence Platform Powered by Iguazio
PICSIX uses Iguazio to provide an AI-based platform addressing the ever-changing threats to homeland security and public safety Paris, Milipol 2019, November 19th 2019 – Iguazio, the Data Science Platform for automating machine learning pipelines, today announced that PICSIX, a leader in tactical intelligence solutions is using its platform to deliver an AI-based Investigative Intelligence Platform. The platform makes real-time machine learning accessible to any agency, providing the flexibility required to address a diverse range of ever-changing threats. As opposed to other managed solutions, PICSIX customers design their own mission specific workflow, at a substantially lower cost. Furthermore, with Iguazio under the hood, users work in an efficient and open environment which automates technical heavy lifting and easily integrates with third-party systems and open source tools. Iguazio’s unified data layer analyzes all types of data sources at scale and in real-time, powering the integration of PICSIX Tactical Intelligence databases, government databases and publicly available electronic information (PAEI) used for OSINT. Machine learning models are applied on the data, generating up-to-date profiles and interactive dashboards for real-time insights, alerts and actions. PICSIX deploys the system in multiple edge locations, while Iguazio’s Data Science Platform can also be deployed on-premises or in multi-cloud environments. “PICSIX has been aiding law enforcement and homeland security agencies in their continuous battle against terror, drug trafficking, human trafficking and more, with the ultimate goal of making the world a safer place,” said Menachem Kenan, CEO. “By deploying our Intelligence Investigative Platform on Iguazio, we are able to help our...
---
### Iguazio Expands Serverless To Scale-out Machine Learning and Analytics Workloads
New serverless capabilities in Iguazio’s Data Science Platform enable on-demand resource consumption, elastic scaling, and simpler ML pipelines New York City, MLOps NYC19, September 24th 2019 – Iguazio, the Data Science Platform for automating machine learning pipelines, today announced Nuclio ML Functions, broadening the serverless capabilities of Iguazio’s Data Science Platform for scalable machine learning training and data preparation. Nuclio is the only open source serverless framework that extends beyond event driven workloads to long lasting, parallel, and target="_blank" rel="noopener noreferrer" shape="rect">MLops NYC and will be generally available later this year across both cloud and edge versions of the Iguazio platform. *Gartner, How to Operationalize Machine Learning and Data Science Projects, Erick Brethenoux et al. , 3 July 2018 About Iguazio Iguazio provides a Data Science Platform to automate machine learning pipelines. It accelerates the development and deployment of AI applications, enabling data scientists to focus on delivering better, more accurate and more powerful solutions instead of spending most of their time on infrastructure. The platform is open and deployable in public clouds, on-premises or at the intelligent edge. Iguazio powers data science applications for manufacturing, smart mobility, financial services and telcos and is backed by Bosch, Verizon Ventures, Samsung SDS, CME Group, Dell and top VCs. The company is led by serial entrepreneurs and a diverse team of seasoned innovators in the USA, UK, Singapore and Israel. Iguazio brings data science to life. Visit www. iguazio. com or follow @iguazio to learn more about iguazio. Contacts Media: Kiki Keating kiki@kikinetwork. com
---
### MLOps NYC19 Conference to Promote the Standardization of Machine Learning Operations
Speakers from Google, Walmart, Netflix, Uber, Twitter, Microsoft, Bloomberg and more will convene in New York to improve machine learning automation of development and deployment New York City, August 6th 2019 – Iguazio, the Data Science Platform for automating machine learning pipelines, today announced MLOps NYC19, a community-organized conference taking place September 24th in New York’s Hudson Mercantile. The MLOps Call for Proposals is open and early bird ticket sales end August 24th. More than half of data science projects are not fully deployed, according to Gartner: “Many organizations struggle when it comes to systematically productizing machine learning results, as the production process is either overlooked or left solely to the DevOps team. ”* MLOps NYC19 will reflect the current state of machine learning operations with accomplished industry leaders sharing insights and experiences, such as Bill Groves (Walmart), Julie Pitt (Netflix), Karl Weinmester (Google), Brittany Wills (Twitter) and Josh Patterson (NVIDIA). Attending MLOps NYC19 are data scientists, machine learning engineers and enterprise CTOs and CDOs, to participate in presentations and training sessions about: AI in business applications Kubeflow and MLSpec standardization MLflow Serverless in machine learning Best practices for ML workloads ML workflows for GPUs “Machine learning is expanding from research and bleeding edge companies to any modern app,” said Yaron Haviv, Iguazio CTO. “Much like other software practices, the industry needs to define and adopt successful ML development and CI/CD patterns. MLOps NYC19 will drive collaboration to accelerate mainstream ML adoption, enabling portability and interoperability. ” Microsoft’s Head of Open Source...
---
### Iguazio to Operationalize Data Science and AI on Azure and Azure Stack
Enabling Leading Enterprises to Bring Machine Learning into Business Applications and Remove AI Project Complexity Microsoft Build Seattle, May 8th 2019 – Iguazio, provider of the data science platform built for production, today announced the availability of its platform on Microsoft Azure and Azure Stack, serving a wide range of cloud and intelligent edge use cases. The platform enables fast development and deployment of data science applications with unified data services, serverless functions and integrated AI tools, all consumed in a managed environment. Iguazio is a Microsoft co-sell partner and offers its data science platform through the Azure Marketplace. “Iguazio is proud to collaborate with Microsoft to enable AI-driven actions both with Microsoft Azure and Azure Stack at the intelligent edge,” said Iguazio CEO, Asaf Somekh. “Iguazio’s platform brings data science to life with its production-native architecture and it can now serve the entire data science lifecycle with Azure and Azure Stack. Combining Iguazio and Azure, harnessing AI and cloud, enables high paced innovation and the faster delivery of advanced services. ” Henry Jerez, Principal Group Product Manager at Microsoft’s Intelligent Edge Solutions Platform Group said, “Partnering with Iguazio we can offer additional options for AI applications in the cloud to also run on the edge. Iguazio provides an additional path to run AI on the edge beyond our current Microsoft Azure Machine Learning inferencing on the edge. This new marketplace option provides an additional alternate path for our customers to bring intelligence close to the data sources for applications such...
---
### Iguazio’s Platform Scales NVIDIA GPU-Accelerated Deployments
Samsung SDS Integrates Serverless and Big Data to Automate Machine Learning Application Scaling and Productization GTC San Jose, March 19th 2019 – Iguazio, provider of the high-performance platform for serverless and machine learning applications, today unveiled native integration with NVIDIAⓇ GPUs to eliminate data bottlenecks, provide greater scalability and shorten time to production. Iguazio’s platform powers machine learning and data science over Kubernetes, enabling automatic scaling to multiple NVIDIA GPU servers and rapid processing of hundreds of terabytes of data. Iguazio’s data science platform provides: Serverless functions that run target="_blank" rel="noopener noreferrer">Nuclio) improves GPU utilization and sharing, resulting in almost four times faster application performance when compared to the use of GPUs within monolithic architectures. Nuclio is fifty times faster than serverless solutions that do not offer GPU support, such as Amazon’s Lambda. Serverless and Kubernetes target key challenges in data science: they simplify operationalization, eliminate manual devops processes and cut time to market. Samsung SDS will use Iguazio to speed up machine learning applications and leverage the automatic scaling of Iguazio’s serverless framework to increase efficiency and sharing. Samsung SDS announced its investment in Iguazio on March 6th. The company has accelerated its pipeline with Iguazio to streamline the delivery of intelligent applications, analyzing models built directly in its production environment and generating predictions. “The integration of Iguazio with NVIDIA RAPIDS provides a breakthrough in performance and scalability for data analysis and a broad set of machine learning algorithms,” said Iguazio CTO Yaron Haviv. “Our platform is already powering a...
---
### Samsung SDS Invests in Iguazio to Boost Cloud Services
Samsung Adopts Iguazio’s Nuclio Serverless PaaS for Real-time Intelligent Applications Herzliya, March 7th 2019 – Iguazio, provider of the high performance platform for serverless and machine learning applications, announced it is partnering with Samsung SDS, a global software solutions and IT services company, to accelerate and streamline the delivery of intelligent applications. Samsung SDS has invested in Iguazio and will incorporate its platform into Samsung’s cloud services portfolio, powering serverless agility and data science operations for cloud native and AI-driven applications. "Samsung SDS is excited to invest in Iguazio. We look forward to providing our customers with intelligent, serverless applications by implementing Iguazio's technology to our cloud's PaaS," said Dr. Shim Yoon, Executive Vice President, Cloud Business Division Leader of Samsung SDS. “Iguazio welcomes Samsung SDS as a strategic Investor,” said Asaf Somekh, CEO, Iguazio. “We’re already working with different Samsung SDS groups on financial services and manufacturing deployments and are excited about the value Iguazio has created by powering Samsung’s cloud with serverless and machine learning. ” Iguazio’s platform includes data services and AI tools, empowering end-to-end serverless agility in the enterprise and real-time applications to improve performance, security, collaboration and the scalability of machine learning. Iguazio’s Nuclio is the leading open source serverless framework, enabling the development of modern applications over Kubernetes without having to manage infrastructure. About Samsung SDS Samsung SDS was founded in 1985 and has been leading the digital transformation and innovation of its clients for over 30 years. With the vision to become a target="_blank"...
---
### Iguazio Powers the Intelligent Edge for Smart Retail and IoT Solutions with Google Cloud
Collaborating with Trax on a Kubernetes-powered hybrid cloud for real-time supply chain and intelligent operations KUBECON, Seattle, Tuesday, Dec 11, 2018 – Iguazio, the serverless platform for intelligent applications, today announced it is partnering with Google Cloud to enable real-time AI across the cloud and intelligent edge. Google Cloud and Iguazio’s hybrid cloud is enabling Trax, the leading provider of computer vision and analytics solutions for retail, to benefit from Kubernetes and a cloud-native architecture without managing its underlying infrastructure. Trax’s retail solutions leverage image recognition and predictive analytics to efficiently manage the physical shelf for consumer packaged goods manufacturers and retailers. “At Trax, we’re digitizing the world of retail by monitoring, predicting and optimizing store-and-field performance in real-time to improve on-shelf availability, optimize click-and-collect processes and modernize the shopping experience,” said Trax Chief Technology Officer, Yair Adato. “We recognized that we needed an edge-to-cloud solution that was built for speed, scale and intelligence – one that allowed us to focus on our application, versus the management of our infrastructure. ” “Kubernetes exemplifies the power of the consistent platform where customers appreciate learning once and use anywhere. Building applications on top of Kubernetes ensures companies can deploy workloads either on premises or in the cloud of their choice,” said Aparna Sinha, Group Product Manager, Kubernetes and GKE, Google Cloud. “The retail industry requires data portability across the cloud and intelligent edge. We are excited to collaborate with Iguazio to deliver a solution that enables real-time analytics of store data, all...
---
### Iguazio's Nuclio Update Enables Serverless Agility for Real-Time Apps
Leading open source serverless framework now includes capabilities that enable faster end-to-end enterprise and IoT deployments with reduced operational complexity SERVERLESS NYC, NEW YORK – October 30, 2018 – Iguazio, the Continuous Data Platform for real-time intelligent applications, today announced the release of a new version of Nuclio, the fastest fully integrated multi-cloud, on prem and edge enterprise serverless platform. Nuclio is already powering real-time intelligent operation management applications for financial institutions, telcos, transportation and manufacturing companies. Some examples include: Predicting and eliminating network latencies in electronic trading platforms to guarantee continuous trading. Predicting network failures, outages and cyber-attacks in robust telco operations, enabling proactive problem resolution. Real-time stockout predictions for intelligent supply chain applications. Predictive maintenance and corrective actions for real-time smart manufacturing. Nuclio gains enterprise capabilities when running on the Iguazio platform across all these use cases. It enables rapid development in the cloud alongside deployment at the edge for IoT, retail and manufacturing, delivering real-time analytics closer to the sources of data. “Nuclio is the first open, fast, enterprise-ready, ‘everywhere’ serverless framework,” said Iguazio’s CTO, Yaron Haviv. “This new version represents the second generation of serverless, taking usability, data integration and performance to the next level. We bring the serverless vision to enterprise applications and an end-to-end alternative over slower and cloud-specific solutions. Users seeking alternatives can now avoid tedious integration of point solutions. ” New features in this Nuclio release include: Security: authentication, authorization, data security, “dark-site” and offline deployment Persistent functions: with real-time access to...
---
### Iguazio Hosts Serverless NYC: Enterprise Deployments and Case Studies
Serverless NYC Speakers Include Google, IBM, Oracle and Capital One NEW YORK – October 10, 2018 – Iguazio, the Continuous Data Platform for real-time, intelligent applications, announced today Serverless NYC, a community-organized ServerlessDays conference taking place October 30th at Galvanize New York. Serverless NYC will reflect the state of serverless computing as it continues to generate buzz. It is a community-organized conference, attracting industry leaders such as Kelsey Hightower (Google), Jason Katzer (Capital One), Dave Grove (IBM) and Gwen Shapira (Confluent). Attending Serverless NYC are developers, architects, CTOs and engineers from a wide range of industries in the New York area. The conference will go beyond the hype to address real use cases ranging from success stories to challenges that have yet to be resolved. “Our goal is to make serverless available to the mainstream developer across multi-cloud and edge deployments,” said Yaron Haviv, Iguazio CTO. “We made this conference vendor neutral because serverless is still nascent. We’re bringing the community together to address issues currently holding serverless back from becoming the ubiquitous, mainstream development framework of the future. ” In addition to panel discussions about serverless implementations, the conference will include training sessions, during which developers can experiment with serverless frameworks on Nuclio, OpenWhisk, MongoDB Stitch, Spotinst and Microsoft Azure to expand their skillsets. “Serverless computing is the next step toward unshackling developers towards benefitting from the flexibility and elasticity of the cloud,” said Principal Ovum Analyst, Tony Baer. “While serverless may make infrastructure almost disappear, developers must understand how...
---
### Equinix and Iguazio Collaborate to Drive Smart Mobility Vision
Hybrid cloud solution enables leading Asian ride-hailing applications to deliver real time and event-driven insights SINGAPORE – August 15, 2018 – Equinix, Inc. (Nasdaq: EQIX), the global interconnection and data center company, today announced that it is collaborating with Iguazio to enable a single unified data platform to power continuous analytics and event-driven applications at scale. Iguazio’s Continuous Data Platform on Platform Equinix® enables a real time solution for high performance AI and event-driven applications to optimize performance in the smart mobility industry in Asia-Pacific. Ride-hailing services in Southeast Asia are expected to continue to grow in popularity, with six million rides booked through platforms daily, according to the “e-Conomy Southeast Asia Spotlight 2017” report by Google and Temasek. This, along with the booming internet economy in the region, is leading to an increased demand for interconnection that can enable businesses to securely and directly connect with customers, partners and service providers. Equinix Interconnection Oriented Architecture™ (IOA™) places strategic control points next to users, clouds and networks to build a digital edge. Iguazio, which simplifies the development and deployment of real time AI applications, leverages Platform Equinix to ensure that customers have the freedom to run at the edge for real time processing and reliability, while benefiting from multiple clouds for compute elasticity. This new hybrid cloud solution enables companies offering smart mobility to manage data intelligently and gain actionable business insights, ensuring reduced latency, maximized performance and fine-grained security. The combination of large data volumes and AI deployed in...
---
### Iguazio's Real-Time Serverless Framework Now Available for Enterprises
Nuclio enterprise edition gains momentum with leading cloud providers, as well as in on prem deployments and at the edge KUBECON AND CLOUDNATIVECON EUROPE, Copenhagen, Denmark – Wednesday, May 2, 2018 – Iguazio, the Continuous Data Platform for real-time applications, today released the enterprise version of Nuclio, its open-source serverless framework. Nuclio is the first fully integrated, cloud-neutral serverless framework for high-volume data processing, real-time analytics and artificial intelligence (AI). Iguazio and Microsoft have developed native integration for Nuclio with multiple Microsoft Azure services, providing the fastest, most advanced serverless framework deployable at the edge, in leading cloud vendors or on prem, while offering seamless workload portability among all three environments. Iguazio powers serverless for the enterprise with its Continuous Data Platform, simplifying operational challenges and enabling a faster delivery of solutions. “Nuclio is now running natively in the Azure Kubernetes Service and is fully integrated with Azure’s Application Insights,” said Liam Kelly, GM Commercial Software Engineering EMEA at Microsoft. “Iguazio is enabling Microsoft clients to rapidly build real-time applications, leveraging Azure’s native platform features such as events, monitoring and microservices. ” “Iguazio’s platform is unique in that it combines high performance and data services with serverless functions,” added Mika Borner, Management Consultant for Data Analytics at LC Systems. “We use Nuclio to develop solutions in the cloud for deployment either at the edge or on prem, allowing us to rapidly develop and operationalize applications. ” “We’re working to define standards for serverless with the Cloud Native Computing Foundation (CNCF), enabling...
---
### PickMe Deploys Iguazio’s Platform for Real-Time Heatmaps, Fraud Detection
Sri Lanka’s highest performing mobility app uses iguazio to maintain its competitive edge and operational efficiency Herzliya, Israel – Wednesday, April 11, 2018 – iguazio, the provider of the Continuous Data Platform powering >PickMe, the leading on-demand transportation service in Sri Lanka. The PickMe ride-hailing app runs iguazio’s platform for data and operational efficiencies, including fraud detection and real-time heatmaps. The news of this partnership comes at a time when Asian internet connectivity is booming. As a result, the Southeast Asian market for connected vehicles has quadrupled in size since 2015 and will be valued at $20. 1 billion by 2025, according to a report co-authored by Google. At present six million rides are booked in Southeast Asia daily. PickMe’s deployment of iguazio enables the ride-hailing company to leverage iguazio’s real-time and unified database engine for rapid and actionable business insights via edge and hybrid cloud deployments. iguazio’s platform allows PickMe to combine historical and current data, using AI to detect and reduce fraud. iguazio’s smart mobility supply and demand heatmaps enable PickMe to maximize driver benefits and minimize passenger wait times. "As a fast growing business, it is important that PickMe obtain better and more efficient technologies that keep up with the pace of our growth,” said Jiffry Zulfer, CEO of PickMe. “By deploying iguazio we wish to do just that, resulting in seamless door-to-door transport so that our growth team can produce better results in real-time data processing, fraud detection and analytical accuracy. " "It’s been exciting to...
---
### iguazio Featured in CRN’s 2018 Partner Program Guide
Annual Guide Recognizes the IT Channel’s Top Partner Programs Herzliya, Israel, April 3, 2018 – iguazio, the provider of the Continuous Data Platform powering >CRN®, a brand of The Channel Company, has recognized iguazio in its 2018 Partner Program Guide. This annual guide is the definitive listing of partner programs from technology vendors that provide products and services through the IT channel. To compile the guide, The Channel Company’s research team assessed each vendor’s partner program based on investments in program offerings, partner profitability, partner training, education and support, marketing programs and resources, sales support and communication. iguazio’s channel partners are
---
### iguazio Extends Global Reach with New Channel Partner Program
Continuous Data Platform Now Available to More Enterprises Through Systems Integrators, VARs, OEMs HERZLIYA, February 28th, 2017 – iguazio, the leading data platform powering target="_blank" rel="noopener noreferrer">iguazio’s iCDE certification (Certified Data Engineer) ensures iguazio’s partners and customers have the skills and expertise required to develop and deploy applications with a fully integrated, continuous data platform. iguazio’s platform is available either through software licensing or as an appliance. It offers an annual subscription license per server, while appliances are priced separately. “iguazio is unique in its ability to process critical and time-sensitive data close to the source,” said Krishna K. Chittabathini, CEO of California-based partner 3K Technologies which specializes in financial, healthcare and government services, both local and federal. “iguazio offers an innovative platform for us to perform pilots, so that customers are able to test and evaluate its technology as part of the training and setup we offer. ” “The European market is at a critical juncture around big data and data regulation,” said Rolf Niederer, CEO of Switzerland-based LC Systems. “iguazio addresses these issues for custom application development with a range of technologies in sectors such as manufacturing, telcos and automotive. ” The iguazio Continuous Data Platform comes with fully integrated essential applications dynamically deployed over Kubernetes, including artificial intelligence and machine learning frameworks like Spark, R and Tensor Flow, visualization and a real-time serverless framework. To join iguazio’s partner program or to find a partner near you, please visit https://www. iguazio. com/partners/. About iguazio The iguazio Continuous Data Platform...
---
### Unified Data Platform Provider iguazio Opens APAC Headquarters in Singapore
Demand for iguazio’s Hybrid Cloud and Edge Solutions Drives Continued Global Expansion STRATA DATA CONFERENCE, Singapore, December 6th, 2017 – iguazio, the leading data platform powering continuous analytics and event-driven applications, announced today the opening of its Asia Pacific (APAC) regional headquarters in Singapore. iguazio
---
### iguazio Debuts the nuclio Serverless Platform for Multi-Cloud and Edge Deployments
nuclio expands iguazio’s data platform to provide a complete cloud experience that allows faster, flexible deployment in the cloud, on prem or at the edge KUBECON AND CLOUDNATIVECON, Austin, December 6, 2017 – iguazio, the leading data platform enabling continuous analytics and event driven-applications, today unveiled nuclio: its open source, ultra-fast, multi-cloud serverless platform. nuclio functions are faster than bare-metal, processing up to 400,000 events per second with a single process, address a broad set of applications, are simpler to develop and can be deployed anywhere. Building modern applications with nuclio minimizes operational overhead and frees developers from lock-ins to cloud specific APIs or services. iguazio’s data platform converges and simplifies data management. With nuclio, iguazio completes a full-blown cloud experience of data services, AI and serverless - all delivered in one integrated and self-managed offering, at the edge or in a hosted cloud. “Customers tell us they want to bring the cloud experience to their data, not force fit all their workloads into the public cloud,” said SiliconANGLE Media CEO Dave Vellante. “In our opinion, iguazio’s data and serverless platforms are a leading example of just that; a set of on-prem services that enable modern application delivery with a true cloud experience close to data sources. ” nuclio’s top strengths include: A real-time function OS delivering 30-100x faster performance and lower latency when compared to other public cloud or FaaS offerings (see benchmark) Automated function deployment, scaling and operations Support for a large variety of open or cloud-specific event and data...
---
### iguazio Announces General Availability of Its Unified Data Platform
Early customer adoption includes Grab, the Largest Ride-Hailing Service in Southeast Asia STRATA DATA CONFERENCE, New York, September 27, 2017 – iguazio, the leading data platform for continuous analytics and event driven-applications, announced today the general availability of its Unified Data Platform. iguazio simplifies data pipeline complexities while providing a turnkey solution to accelerate the development and deployment of machine learning and artificial intelligence in the enterprise, generating fresh insights in real-time. In a separate press release, iguazio today also announced Grab, the Singapore-based ride-hailing giant, has selected iguazio’s Unified Data Platform to simplify and accelerate its data pipeline and analytics. See related press release
---
### Grab, Southeast Asia’s #1 Ride-Hailing Service, Selects iguazio’s Unified Data Platform
Largest Ride-Hailing Service in Southeast Asia Uses iguazio’s Platform to Ingest, Enrich and Analyze Data for Continuous Analytics and Event-Driven Applications STRATA DATA CONFERENCE, New York, September 27, 2017 – iguazio announced today that Grab, the leading on-demand transportation and mobile payments service in Southeast Asia, has selected iguazio’s Unified Data Platform to accelerate innovation and boost its competitive edge in a market that serves more than 600 million consumers. The iguazio Unified Data Platform deployed at Grab – which facilitates more than 3 million rides per day -- is being used for a variety of data monetization and operational efficiency objectives. “Ride hailing is more than just an app – there’s a lot going on in the background that leads to great experiences for both the passenger and driver,” said Ditesh Gathani, Director of Engineering at Grab. “Grab’s business thrives because of our ability to manage a continuous stream of data that facilitates real-time matching, booking, payment and location services. iguazio will enable us to continue offering new innovative services at a faster time to market. ” Grab selects iguazio By selecting iguazio’s Unified Data Platform, Grab is innovating a variety of applications, such as: Driver incentives to determine driver bonuses in real-time, by analyzing and increasing driver effectiveness, the number of runs during peak driving times and rider satisfaction. Maximize driver profits while reducing passenger wait times by optimizing driver the decision-making process using advanced real-time supply and demand heatmaps. Surge pricing optimization which correlates passenger demand data...
---
### Iguazio Raises $33M in Series B as Leader in Real-Time Analytics, Edge Data
New round includes strategic investors from financial services, IoT and service providers following successful early deployments HERZLIYA, July 25th, 2017 iguazio, a global pioneer in real-time edge analytics, announced today a Series B investment of $33 million led by Pitango Venture Capital, with additional funds from Verizon Ventures, Robert Bosch Venture Capital GmbH (RBVC), CME Ventures and the company’s existing investors, Magma Venture Partners, Jerusalem Venture Partners and Dell Technologies Capital. iguazio accelerates the digital transformation of enterprise companies and simplifies real-time analytics at the edge, on-premises and in hybrid environments, complementing the offering of leading cloud providers. This new financing brings the company’s total investment to $48 million. “iguazio’s team has an outstanding track record of innovation and execution and we are delighted to back these stellar managers once again,” said Eyal Niv, Managing General Partner at Pitango. “While the majority of big data deployments fail due to over complexity, iguazio’s platform has proven to be simple, fast and secure, making it exceptional for artificial intelligence and machine learning use cases. We’ve already received overwhelming feedback from beta customers generating actionable real-time insights with significant business impact. ” “As one of the largest telecom companies in the world, we witness the importance of real-time continuous analytics and the way it has become crucial across businesses. Yet, there are not many existing scalable solutions,” said Merav Rotem-Naaman, Managing Director at Verizon Ventures Israel. “iguazio is aiming to become a trusted partner for companies looking to use data to make real-time...
---
### iguazio Selected as a Gartner Cool Vendor in Data Management, 2017
Delivering continuous analytics and offering greater simplicity, performance, security and agility for next generation applications HERZLIYA, May 4, 2017 iguazio today announced it has been included in the “Cool Vendors in Data Management, 2017” report by Gartner, Inc. The company’s real-time continuous analytics platform simplifies the data pipeline to accelerate business insights and enhance security. According to the Gartner Cool Vendors in Data Management, 2017 report, “unified data access, multimodel DBMS approaches, metadata management techniques and real-time data integration top the list of innovations for CIOs and data and analytics leaders. ” The report also finds that “real-time data integration and access to data remain core challenges for data and analytics leaders looking to modernize data management ecosystems. ” “We are honored to be recognized by Gartner as a Cool Vendor in Data Management,” said Asaf Somekh, CEO of iguazio. “We’ve been helping our customers in IoT and financial services to dramatically simplify their data pipeline and generate real-time business insights. iguazio has taken the bold approach of completely redesigning the data stack with our high-performance unified data model architecture. Today’s digital transformation requires agile data consumption and we believe being one of Gartner’s Cool Vendors in Data Management is a testament of iguazio’s disruptive technology. ” Data is at the center of iguazio’s continuous analytics platform: iguazio ingests, enriches, analyzes and serves data – securing data and allowing access to the same records simultaneously through streaming, object, file and database APIs in real-time. The iguazio solution integrates with the...
---
### iguazio Demos Industry’s First Integrated Real-time Continuous Analytics Solution
Complete re-thinking of the traditional data pipeline reduces time-to-insights from hours to seconds STRATA+HADOOP WORLD, SAN JOSE, March 14, 2017 iguazio is demonstrating its groundbreaking continuous analytics solution at Strata + Hadoop World. Designed to help enterprises solve big data operational challenges and generate real-time insights, iguazio’s real-time continuous analytics platform reduces time to insights from hours to seconds, eliminating data pipeline complexities, while seamlessly integrating with Apache Spark and Kubernetes. Data is at the center of iguazio’s real-time continuous analytics platform, which simplifies the data pipeline and speeds it up: iguazio ingests, enriches, analyzes and serves data – all in one unified platform. It integrates with the open-source frameworks of Spark and Kubernetes to accelerate insight generation and enable rapid deployment of a variety of stateless analytics services and data processing tasks. iguazio’s platform secures the data and allows accessing the same records simultaneously through streaming, object and database APIs. “Digital transformation is leading us to agile data consumption, and continuous analytics is its cornerstone. In different industries, especially in financial services, healthcare and IoT, organizations have a need to tackle the challenge of complexity across the entire data lifecycle,” said John L. Myers, Managing Research Director at Enterprise Management Associates – a Boulder, CO based industry analysis firm. “iguazio’s continuous data consumption approach empowers organizations to ingest data into a unified repository and run multiple stateless processing engines which enrich, aggregate, infer and act on the data to enable this transformation. ” iguazio early deployment customers benefitting from...
---
### iguazio Collaborates with Equinix to Offer Data-Centric Hybrid Cloud Solutions
Placing Governed Data and Analytics Closer to Their Sources, While Leveraging Amazon Web Services Compute Elasticity to Generate Business Insights at Extreme High Speeds AWS re:invent, Las Vegas, NV, November 30th, 2016 iguazio today announced that it is collaborating with Equinix (Nasdaq: EQIX) to power the
---
### iguazio Announces the World’s Fastest, Simplest and Lowest-Cost Enterprise Data Cloud
Delivers 100x faster performance and 10x lower cost for on-premises and hybrid cloud deployments STRATA+HADOOP WORLD NEW YORK 2016, NEW YORK, September 27, 2016 iguazio today introduced its flagship product, the Enterprise Data Cloud platform, unleashing the full potential of megatrend applications and analytics for big data, the Internet of Things (IoT) and cloud-native applications. iguazio has pioneered a new service-driven approach to enterprise data management, redesigning the entire data stack to accelerate performance and bridge the enterprise skill gap. iguazio’s Enterprise Data Cloud is the only secure data platform-as-a-service deployed either on-premises or in hybrid cloud architectures, with self-service portals and APIs for developers and operators. Previewed at
---
### Iguazio Unveils World’s First Virtualized Data Services Architecture
Reveals Details of Extremely Efficient Architecture that Seamlessly Accelerates Spark and Hadoop, Busts Silos and Ends ETL SPARK SUMMIT 2016, SAN FRANCISCO, June 7, 2016 iguaz. io, the disruptive company challenging the status quo for big data, the Internet of Things (IoT) and cloud-native applications, today unveiled its vision and architecture for revolutionizing data services for both private and public clouds. This new architecture makes data services and big data tools consumable for mainstream enterprises that have been unable to harness them because of their complexity and internal IT skills gaps. Data today is stored and moved between data silos optimized for specific applications or access patterns. The results include complex and difficult-to-maintain data lakes, constant data movement, redundant copies, the burdens of ETL (extract/transform/load), and ineffective security. While popular cloud services like Amazon’s AWS and Microsoft’s Azure Data Lake introduce some level of simplicity and elasticity, under the hood they still move data between different data stores, lock customers in through proprietary APIs and onerous pricing schemes, and, at times, provide unpredictable performance. Data is proliferating at an unprecedented pace — analyst firm Wikibon predicts the big data market will grow to $92. 2b by 2026 — requiring a new paradigm for building and managing a growing and complex environment. With its first-ever high-performance virtualized target="_blank" rel="noopener noreferrer">@iguazio to learn more about Iguazio.
---
### Iguaz.Io raises $15 million in series a funding to disrupt big data storage
Iguaz. Io raises $15 million in series a funding to disrupt big data storage Herzliya, Israel. Nov. 25, 2015 – Iguaz. io, a provider of innovative data management and storage solutions for Big Data, IoT and cloud applications today announced a $15 million Series A funding round. Led by Magma Venture Partners, the funding includes additional investments from JVP and large strategic investors. The iguaz. io founding team is comprised of a group of former executives from successful technology companies in the fields of storage, cloud computing, high-speed networking, analytics and cyber-security. These companies include XtremIO (acquired by EMC), XIV (acquired by IBM), Mellanox (NASDAQ: MLNX), Voltaire (NASDAQ: VOLT, acquired by Mellanox) and Radvision (acquired by Avaya). Over the past two decades, the team has enabled enterprise customers to evolve their data centers and has helped leading cloud operators design and deploy their hyper-scale data centers. Iguaz. io will use the funding to continue growing its team of experts in the spaces of Big Data, storage, security and networking (see www. iguaz. io/careers). According to IDC, the market for Big Data will surpass $41 billion by 2018, growing six times faster than the overall IT market. “Big Data and cloud computing are creating tectonic shifts in the market, setting the stage for market disruptions,” said Yahal Zilka, Managing Partner at Magma Venture Partners. “The iguaz. io team, with its innovative approach, is well positioned to disrupt the market. ” “Iguaz. io has attracted top talent with diverse, multidisciplinary skills and...
---
## Sessions
### Building Agent Co-pilots for Proactive Call Centers
---
### Real-time Agent Co-pilot Demo
---
### How to Manage Thousands of Real-Time Models in Production
---
### Beyond the Hype: Gen AI Trends and Scaling Strategies for 2025
---
### Agentic AI Frameworks: Bridging Foundation Models and Business Impact
---
### Deploying Gen AI in Production with NVIDIA NIM & MLRun
---
### Gen AI for Marketing - From Hype to Implementation
---
### Building Scalable Customer-Facing Gen AI Applications Effectively & Responsibly
---
### Implementing Gen AI in Highly Regulated Environments
---
### Transforming Enterprise Operations with Gen AI
---
### Improving LLM Accuracy & Performance
---
### LLM Validation & Evaluation
---
### Implementing a Gen AI Smart Call Center Analysis App
---
### GenAI for Financial Services
---
### Sheba Medical Center Improves Patient Outcomes and Experiences with AI
---
### How to Build an Automated AI ChatBot
---
### Demo: LLM Call Center Analysis with MLRun
---
### MLOps for Gen AI in the Enterprise
---
### MLOps for Generative AI
---
### MLOps for LLMs
---
### How Seagate Runs Advanced Manufacturing at Scale
---
### HCI’s Journey to MLOps Efficiency
---
### How to Easily Deploy Your Hugging Face Model to Production at Scale
---
### Breaking AI Bottlenecks with Iguazio + Amazon FSx for NetApp ONTAP
---
### From AutoML to AutoMLOps: Automated Logging & Tracking of ML
---
### Best Practices for Succeeding with MLOps
---
### Simplifying Deployment of ML in Federated Cloud and Edge Environments
---
### Predicting 1st Day Churn with Real-Time AI
---
### LATAM Customer Testimonial
---
### Git Based CI/CD for ML
---
### Scaling NLP Pipelines at S&P Global (IHS Markit)
---
### Building a Real-Time ML Pipeline with a Feature Store
---
### Automated Model Management for CPG Trade Effectiveness
---
### Automating & Governing AI Over Production Data on Azure
---
### How Feature Stores Accelerate & Simplify Deployment of AI to Production
---
### Simplifying Deployment of ML in Federated Cloud and Edge Environments
---
### Handling Large Datasets in Data Preparation & ML Training Using MLOps
---
### Quadient Customer Testimonial
---
### NetApp Customer Testimonial
---
### Siemens on the Importance of Data Storytelling in Shaping a Data Science Product
---
### NVIDIA on Industrializing Enterprise AI with the Right Platform
---
### NetApp's Michael Oglesby on Building ML Pipelines Over Federated Data
---
### Product Madness (an Aristocrat co.) on Predicting 1st-Day Churn in Real Time
---
### Greg Hayes on Uniting Data Scientists, Engineers, and DevOps with MLOps
---
### NetApp’s Shankar Pasupathy on Building Scalable Predictive Maintenance
---
### Microsoft & GitHub on Git-Based CI / CD for Machine Learning & MLOps
---
### How to Deal With Concept Drift in Production with MLOps Automation
---
### Quadient’s Jason Evans on Saving Time & Costs Bringing AI to Production
---
### S&P Global’s Ganesh Nagarathnam on Bringing ML Pipelines to Production
---
## MLOps Terminologies
### Arithmetic Intensity
---
### Frontier model
---
### Context Window
---
### Excessive Agency
---
### Reasoning Engine
---
### LLM Orchestration
---
### AI Scalability
---
### LLM Tracing
---
### Human in the Loop
---
### LLM Monitoring
---
### Random Forest
---
### Prompt Management
---
### LLM Customization
---
### LLM as a Judge
---
### Chain-of-Thought Prompting
---
### LLM Embeddings
---
### Gen AI App
---
### AI Infrastructure
---
### Diffusion Models
---
### Generative Agents
---
### LLM Optimization
---
### LLM Temperature
---
### LLM Agents
---
### On-Premise AI Platform
---
### False Positive Rate
---
### True Positive Rate
---
### RLHF
---
### Fine-Tuning LLMs
---
### Prompt Engineering
---
### LLM Hallucinations
---
### Auto-Regressive Models
---
### AI Tokenization
---
### Model Behavior
---
### Baseline Models
---
### ML Stack
---
### LLMOps
---
### Transfer Learning
---
### Foundation Models
---
### Large Language Models
---
### Data Ingestion
---
### Data Pipeline Automation
", "_top_content": "field_6019145ebd0a7", "list_items": "", "_list_items": "field_6019148cbd0a9", "bottom_content": "", "_bottom_content": "field_60191482bd0a8", "margin_top": "mt-default", "_margin_top": "field_5e3c7bd8719f4", "margin_bottom": "mb-default", "_margin_bottom": "field_5e3c7c33719f5", "padding_top": "pt-default", "_padding_top": "field_5e3c7c99719f6", "padding_bottom": "pb-default", "_padding_bottom": "field_5e3c7ce5719f7", "block_id_block_id": "", "_block_id_block_id": "field_62a1c4463df6f", "block_id": "", "_block_id": "field_62a2f514c3ecb" }, "align": "", "mode": "auto" } /-->
---
### Risk Management
", "_top_content": "field_6019145ebd0a7", "list_items": "", "_list_items": "field_6019148cbd0a9", "bottom_content": "", "_bottom_content": "field_60191482bd0a8", "margin_top": "mt-default", "_margin_top": "field_5e3c7bd8719f4", "margin_bottom": "mb-default", "_margin_bottom": "field_5e3c7c33719f5", "padding_top": "pt-default", "_padding_top": "field_5e3c7c99719f6", "padding_bottom": "pb-default", "_padding_bottom": "field_5e3c7ce5719f7", "block_id_block_id": "", "_block_id_block_id": "field_62a1c4463df6f", "block_id": "", "_block_id": "field_62a2f514c3ecb" }, "align": "", "mode": "auto" } /-->
---
### MLOps Governance
---
### Model Evaluation
---
### Holdout Dataset
---
### Overfitting
---
### Open Source Model
---
### Cross-Validation
---
### Automated Machine Learning
---
### Recall
---
### Classification Threshold
---
### Regression
---
### Model Training
---
### Continuous Validation
---
### Model Tuning
---
### Noise in ML
---
### Model Accuracy in Machine Learning
---
### Explainable AI
---
### Drift Monitoring
---
### Model Serving Pipeline
---
### Image Processing Framework
---
### Kubernetes for MLOps
---
### GPU for Machine Learning
---
### Machine Learning Infrastructure
---
### Model Retraining
---
### Model Deployment
---
### Deep Learning Pipelines
---
### Model Management
---
### Feature Vector
hbspt. cta. load(5272961, '20436803-fd1f-4c93-906c-39ccd3b0a5e1', {\"useNewLoader\":\"true\",\"region\":\"na1\"}); \r\nFeature Vectors in Exploratory Data Analysis\r\nIn exploratory data analysis, researchers try to discover features from raw data. They may start with qualitative research, looking at visualizations and applying their domain expertise to deduce an idea that can transform the observation into feature vectors. For example, a feature vector in data mining represents a hidden pattern in large data sets, such as equity trading buy\/sell signals from the historical trading price and volume data. \r\n\r\nIn the field of natural language processing, the process of splitting sentences into distinct entities is known as tokenization. For instance, researchers could treat each word or phoneme as a unique token to generate feature vectors for further analysis and experiments. \r\n\r\nIn computer vision, the RGB color scheme isn’t the only way to represent image pixels. For example, there is also HSL (hue, saturation, lightness) and HSV (hue, saturation, value). Sometimes, practitioners even use a monochrome scheme to reduce noises originating from color images. \r\n\r\nUltimately, researchers explore different feature vectors to evaluate the performance of their predictive models. Once the feature design is ready, they are good to go to the next stage. \r\nFeature Vectors in Feature Engineering\r\nFeature engineering is, in large part, the systematic process of generating feature vectors from raw data. There are, however, some obstacles to setting up such a process. First, we need a place to store generated feature vectors for later retrieval. We also need to update feature definitions from time to time to accommodate changes in...
---
### Model Serving
---
### CI/CD for Machine Learning
---
### Feature Engineering
---
### Real Time ML
Want to learn more about real-time machine learning? Book a live demo here.
---
### Unsupervised Machine Learning
Want to learn more about industrializing unsupervised machine learning models? Book a live demo here.
---
### Kubeflow Pipelines
Want to learn more about extending Kubeflow into a full MLOps platform? Book a live demo here.
---
### Machine Learning Lifecycle
---
### Model Monitoring
---
### Concept Drift
---
### Feature Store
---
### ML Pipeline Tools
---
### ML Pipeline
---
### Operationalizing ML
---
### Enterprise Data Science
---
## Case Studies
### HCI Builds a Mature Enterprise MLOps Practice to Deploy 73+ Financial Use Cases
---
### LATAM Airlines Group Builds an ML Factory to Generate Business Impact Across the Organization
---
### Seagate Uses AI to Detect Defects on the Factory Floor, Reduce Cost and Improve Yield
---
### Sense Scales Chatbot Automation and HR Tech
---
### Sense Personalizes the Recruitment Experience with AI Chatbot Automation
---
### LATAM Airlines Group Drives Innovation with a Cross-Company AI Strategy
---
### LATAM Airlines Group Drives Innovation with a Cross Company AI Strategy
---
### S&P Global Makes Engineering Documents Searchable and Indexable with NLP
---
### Ecolab Reduces Time to AI Deployment from 12 months to 30 days
---
### Ecolab Deploys Predictive Risk Models
---
### Sheba Medical Center Improves Patient Outcomes and Experiences with AI
---
### NetApp Deploys Real-Time Predictive Maintenance
---
### NetApp Deploys Real-Time Predictive Maintenance & Advanced Analytics
---
### Quadient Saves Time and Costs Getting AI to Production
---
### Ecolab Breaks the Silos Between Data Scientists, Engineers and DevOps with New MLOps Practices
---
### Payoneer Uses Real-Time AI for Fraud Prevention
---
### PadSquad Predicts Ad Performance in Real Time Based on Multivariate Data
---
## Solutions
### Iguazio for Data Engineers
---
## Q&A
### Why should you combine traditional ML with LLMs?
Traditional machine learning (ML) has been reliably used for computational tasks due to its deterministic and reproducible nature. These models are excellent at handling numbers and structured outputs, which can then be leveraged by LLMs to ground responses in factual data, preventing LLM hallucinations. The key approach is to use the best tool for the job—traditional ML for structured, explainable tasks and LLMs for generative tasks. Explainability is crucial, as traditional ML models are much easier to interpret than LLMs, making them valuable in contexts where transparency is required. For more on explainability, see here. For a deeper dive on combining traditional ML with LLMs, check out this blog. How do gen AI and traditional AI complement each other? Check out this related question.
---
### What are the milestones of developing a multimodal agent?
When designing a multi-agent system for process automation, such as in a contact center, the process typically starts with analyzing historical data (e. g. , call transcripts, user interactions) to understand how human agents complete tasks. An LLM-based model is often used to analyze this data and identify the common steps a person takes. For example, looking at calls from the past six months and understanding how agents authenticate users, determine intent, etc. Since there is variability in execution, a probabilistic model helps map out these actions in a structured way. Once a granular workflow is defined, SMEs validate and refine it, creating a detailed blueprint of human activities. This blueprint is then used to design agent milestones. First, deciding whether workflows require one agent or a sequence of agents. If the workflow includes multiple tasks, choose an architecture with a sequence of agents. Each agent should be assigned a specific task that can perform well to optimize efficiency—e. g. , one for user authentication, another for sentiment analysis, and another for breaking down complex requests. Finally, an orchestrator integrates these agents into a seamless system.
---
### Managing and Optimizing Costs in Production-Ready Generative AI
Open-source MLRun can be used for efficient resource management in a number of ways. A few examples include: Auto-scaling - Automated resource allocation based on workload needs. Experiment tracking to compare models and choose the best-performing one without re-running the entire training pipeline. Serverless deployments with auto-scaling. Support for model quantization and pruning. Monitoring and logging for resource usage. Parallel pipeline execution and distributed compute capabilities. Micro-batching - Processing multiple requests simultaneously, improving GPU utilization and lowering per-request costs. Read more about auto-scaling GPUs, experiment tracking, and how to use open-source Nuclio for serverless deployment.
---
### What training is recommended to upskill (gen) AI talent in the organization?
Unlike traditional machine learning, generative AI introduces unique challenges and opportunities, necessitating specialized skills and a proactive approach to learning. To upskill talent for generative AI, training recommendations vary by role: Data scientists working with generative AI must go beyond standard analytical methods, developing expertise in prompt engineering and model output monitoring. This might require adapting AI model monitoring practices to ones that are specific to generative AI. For example, addressing issues like hallucinations. Data engineers, software engineers and cloud architects focus on managing complex infrastructure needs for AI deployments. This requires staying current with cloud service offerings, AI-specific pipelines and model testing is important, as these are rapidly evolving. This knowledge is essential for transitioning from experimental models to production-ready deployments. Given the rapid pace of advancement in the AI field, it’s critical for all professionals working with generative AI to prioritize continuous learning. This can be achieved by regularly engaging with content from industry analysts, AI vendors and thought leaders in the space. Blogs, newsletters, webinars and communities provide real-time insights into best practices, new tools and emerging trends. Additionally, attending industry events and joining online communities, such as AI research forums or professional groups on LinkedIn, helps professionals network with peers, share insights, and exchange knowledge about challenges and solutions.
---
### What are some examples of how gen AI is impacting operational and business results?
Gen AI positively impacts operational and business performance. For example, companies using gen AI-powered operator and technician co-pilots in maintenance have seen significant productivity and efficiency gains. These AI tools combine data from manuals, operating procedures and past root cause analyses, suggesting hypotheses and scenarios for troubleshooting. This allows technicians to troubleshoot issues more quickly and effectively on the production line. In traditional troubleshooting scenarios, root cause analyses could take hours or even days, but with Gen AI assistance, these steps are streamlined to seconds. This reduces breakdown time and boosts productivity, with some companies reporting up to 30-40% improvements in troubleshooting speed and 5-10% increases in line output due to faster response times. Similar gains have been observed in contract feedback and improvement processes, further enhancing productivity across value chains. Although these are still individual use cases, broader end-to-end transformations integrating these capabilities could yield even greater impacts in the future.
---
### How do gen AI and traditional AI complement each other?
Gen AI and traditional AI serve different purposes. They can be used separately and together. Traditional AI, such as classification and ML models, excels at specific tasks like pricing optimization and customer segmentation. These models are designed to be precise, stable and reliable for >Gen AI, on the other hand, can be used for efforts like content creation, chatbots and agents, as a virtual assistant and more. Gen AI complements traditional AI by providing interpretive capabilities. For example: Generating informative reports or summaries based on traditional AI results A customer-facing chatbot that can route customer requests to traditional AI models tand then answer queries. For example, about package shipping times Generating synthetic data Analysis of the sentiment behind customer reviews, feedback, social media mentions, etc. Automate the labeling of textual data with high accuracy. See how this works here. Rather than replacing traditional AI, Gen AI enhances its output, making insights more accessible and adding flexibility to existing systems.
---
### What are the recommended steps for evaluating gen AI outputs?
Gen AI outputs need to be evaluated for accuracy, relevancy, comprehensiveness, how they de-risk bias and toxicity, and more. This should be done before they are deployed to production and acted on, to avoid issues like performance, ethical matters, legal issues and disruptions. The methods that can be used to evaluate outputs include: Comparing the results to the data source they were retrieved from Ensuring consistent responses by running similar prompts multiple times Using LLM-as-a-Judge to allow an additional LLM to evaluate the results Testing outputs and fine-tuning the model for adherence to industry-specific knowledge requirements or specific brand voice Reviewing responses against a checklist of essential components for the given topic or field Implementing guardrails and filters for toxicity, hallucinations, harmful content, bias, etc. Implementing guardrails for security and privacy Continuous monitoring and feedback loops to ensure ongoing quality and relevancy Establishing LLM metrics to track the overall success of the AI model in meeting its intended purpose
---
### Why is It Important to Monitor LLMs?
When deploying LLMs in production, monitoring prevents risks such as malfunction, bias, toxic language generation, or hallucinations. This allows for tracking application performance and helps identify issues before they impact users. In addition, logging interactions allows understanding user queries and the responses, which can be used for fine-tuning the model. There are multiple levels of monitoring: Functional Monitoring - Ensures that the LLM is operating correctly and identifies what models and versions are currently in use within applications. This helps manage risks if a model version becomes problematic or poses a security threat. Governance and Compliance - Centralized governance, according to the "AI Factory" approach. This helps organizations know which models are being used, which versions, when updates or patches are needed, or when security risks require patching and action. Resource Monitoring - Involves tracking the consumption of resources like CPU and memory by different applications or departments. This helps in identifying inefficient use of resources or applications that might be consuming too much without adding sufficient value.
---
### What Guardrails Can Be Implemented in (Gen) AI Pipelines?
Effective gen AI guardrails are required throughout the data, development, deployment and monitoring AI pipelines These guardrails help in mitigating risks such as biased outputs, data privacy breaches, and toxic content generation. Here are some key examples of guardrails that can be implemented in Gen AI pipelines: Prompt Engineering - Crafting input prompts to guide the AI model toward generating desirable outputs. These prompts discourage the model from generating toxic or biased outputs and provide context that steers the model in a safer direction. LLM as a Judge - Using an LLM to evaluate the output’s compliance with predefined rules and standards. When a violation is detected, the LLM flags the issue or provides recommendations for modification to ensure alignment with the guidelines. Toxicity Measurement with Language Filters - Implementing language filters to measure and flag toxicity levels in generated outputs. These filters use pre-trained models or rule-based systems to detect offensive, harmful, or otherwise inappropriate language. Bias Detection and Mitigation - Techniques such as adversarial testing, fairness metrics and model retraining to identify and mitigate biases within AI models. This involves detecting biases towards specific populations, genders, or other demographic groups. Data Privacy Checks - Checks for sensitive information. This could involve using regular expressions (regex) to identify and redact personal information such as social security numbers, addresses, and other private data. This step allows complying with regulations like GDPR or CCPA and protecting user privacy. Hallucination Detection - Using knowledge bases or fact-checking algorithms to compare generated content...
---
### RAG vs. Fine-tuning: When to Use Each One?
In RAG, the model queries an external dataset or knowledge base, typically using a vector space model where documents or data snippets are encoded as vectors. When a query (prompt) comes in, the model performs a similarity search to retrieve the most relevant information. This retrieved content is then used by the generative component of the model to craft a response that is informed by this external data. RAG is particularly useful in scenarios where the required knowledge is vast or frequently updated, such as news updates, scientific research, or detailed customer data. It ensures that the model can provide accurate and relevant information without needing frequent retraining. Fine-tuning is a technique where a pre-trained model is further trained (fine-tuned) on a smaller, specialized dataset. This secondary training phase adjusts the model’s weights to perform better on tasks specific to the characteristics of the new data. Fine-tuning involves continuing the training process of an already trained model but focuses on a narrower scope or a specific domain. This targeted training helps the model to better understand and generate responses that are aligned with the specific nuances and requirements of the target domain. Fine-tuning is helpful when the model needs to adopt a specific tone, style, or set of knowledge, such as legal terminology, technical support for a specific product, or company-specific guidelines. Fine-tuning is also used to implement "guardrails" or constraints that guide the model’s outputs to avoid undesirable content or biases. RAG and retraining are not mutually exclusive and...
---
### What are some tips and steps for improving LLM prediction accuracy?
Start by evaluating the LLM and understanding how well it performs. This involves testing the model with various inputs to understand its strengths and weaknesses. Establish clear metrics that reflect the goals you aim to achieve with the LLM. For instance, if the LLM is used for customer care, metrics could focus on accuracy and response time. Break the overall problem into small tasks. This approach allows you to address specific areas where the LLM underperforms. Optimize the LLM with different prompts. Experiment with different prompting strategies to find the most effective ways to guide the model’s responses. This might involve tweaking the length, specificity, or format of prompts. You can also consider innovative approaches like fine-tuning the model on specific datasets, incorporating external knowledge bases, or using ensemble techniques where multiple models or prompts are used to generate a single output. Analyzing the model's responses to these variations can provide insights into more subtle aspects of its behavior and how it interprets instructions. Implement feedback loops where the model’s predictions are regularly reviewed and corrected by human operators. This real-world feedback can be used to further train and refine the model. Looking for tools to assist with LLM prediction accuracy? Try open-source DeepEval or RAGAS. Additional metrics that can help you evaluate your LLM include Answer Relevancy, Faithfulness, Hallucination, and Toxicity. (These are all covered in DeepEval). These metrics can be especially helpful when dealing with unstructured text dat. Unstructured data it requires something to compare against. However, a metric like Faithfulness, for...
---
### Which gen AI smart call center app use cases are other companies implementing?
Call center companies can use gen AI for a wide range of use cases. Gen AI can assist with demand reduction, deflection and routing of calls, vendor optimization, improving operational efficiency and service optimization. Some of the most popular applications include: Co-pilots - Providing assistance and suggestions to agents in real-time based on customer history, product capabilities, contracts, policies, etc. Post-call insights - Taking the data from conversation, emails and chats, embedding into business analytics and deriving insights. Chatbots - Augmenting the use of automated chatbots to answer customer requests. Training - Simulations for new and experiences agents Companies that are able to combine their domain expertise with an understanding of their pain points, the gen AI tech and the future roadmap of gen AI products and the opportunities they hold, will be able to derive business value from gen AI apps.
---
### What is hyper-personalization in gen AI?
Gen AI has opened up new opportunities for segmentation and personalization. Access to real-time behavioral data and analytics allows catering to users based on what is appropriate for them at this given moment. Preferences can change for the same user during different moments, e. g preferring text at one moment and a call at another. With hyper-personalization, organizations can provide an accurate and high-quality user experience, building trust, enhancing customer loyalty and growing their revenue. Hyper-personalization goes beyond the personalization techniques of the previous wave of AI, by understanding and catering to the nuances of individual preferences at a much higher scale. It represents a shift towards more intuitive, user-centric models of interaction, where services (like customer call centers) and content evolve along with the needs and desires of each user, enhancing user engagement, satisfaction, and loyalty. Using the insights gained from predictive modeling, generative AI systems can create or modify content, products, or services in real-time to match the individual needs of each user (to see an example of this, watch this demo of a call center agent co-pilot). Hyper-personalization systems are dynamic, continuously learning from new data and user interactions. This allows the generative AI to refine its predictions and customizations over time, improving the relevance and accuracy of the personalized experiences it provides.
---
### How can organizations address risks in gen AI?
There are several risk factors to avoid when implementing gen AI. The most important ones are accuracy and hallucination. Despite the technological advancements, models hallucinate extensively and there are currently no good solutions for addressing this. Solutions like LLM-as-a-judge or RAG are incomplete, and many organizations put humans in the loop to label and tweak answers. This can also help with solving issues like data privacy, toxicity and bias. On the business level, enterprises must be aware that gen AI comes with unique risks that require a comprehensive strategic approach. Some of the most critical strategic initiatives are: Data governance: Implement robust data governance practices to ensure the quality, integrity, and security of data used by Gen AI systems. This includes data collection, storage, processing, access control, and compliance with data protection regulations. Security measures: Implement robust security measures to protect Gen AI systems from cyber threats, data breaches, unauthorized access, and malicious attacks. This includes encryption, access controls, authentication mechanisms, secure coding practices, and regular security audits. Continuous monitoring and evaluation: Continuously monitor and evaluate the performance, effectiveness, and impact of Gen AI systems in enterprise applications. Identify emerging risks, trends, and issues and take proactive measures to address them. For an overview and a few examples of gen AI risks, read the blog Implementing Gen AI in Practice For a detailed approach, check out the book “Implementing MLOps in the Enterprise” by Yaron Haviv and Noah Gift.
---
### Can LLMs be implemented in non-English languages?
Iguazio supports any and all languages for gen AI applications. Today, there are projects in English, Turkish, Arabic, Portuguese, Spanish, Hebrew, Norwegian and more. New languages are implemented either through translations or by fine-tuning models in the new language. Fine-tuning and implementing LLMs in non-English languages involves several key steps and considerations: 1. Fine-Tuning: Fine-tune the selected LLM on the collected data in the target language. This involves training the model on language-specific tasks or objectives, such as language modeling, text classification, machine translation, etc. 2. Evaluation: Evaluate the fine-tuned model on appropriate benchmarks or validation datasets to assess its performance, accuracy, and generalization ability in the target language. 3. Language-specific challenges: Address any language-specific challenges or nuances during the development process. These may include morphological complexity, syntactic differences, lack of labeled data, and domain-specific terminology. 4. Adaptation: Adapt the model to specific applications or use cases in the target language. This may involve domain adaptation, transfer learning, or customization of model outputs to meet the requirements of the application. 5. Testing and iteration: Test the developed LLM application rigorously in real-world scenarios to identify and address any issues or limitations. Iterate on the development process as needed to improve performance and user experience.
---
### What level of PII reduction accuracy in the AI pipeline is acceptable?
The acceptable level of PII reduction accuracy in an AI pipeline depends on various factors, including the specific use case, industry standards, regulatory requirements, and the sensitivity of the data involved. Generally, a higher level of accuracy is desirable to minimize the risk of exposure of sensitive information. However, it's essential to balance accuracy with other considerations such as performance, efficiency, and usability. Here are some factors to consider when determining acceptable PII reduction accuracy: 1. Regulatory requirements and industry standards: If your AI pipeline deals with PII, it must comply with relevant data protection regulations such as GDPR, CCPA, HIPAA, etc. These regulations may specify certain standards or requirements for the handling and protection of PII. Some industries may have established best practices or standards for handling sensitive data, including PII. Compliance with these regulations and standards may therefore necessitate a higher level of accuracy in PII reduction. 2. Risk assessment: It’s a good practice to conduct a risk assessment to evaluate the potential consequences of inaccuracies in PII reduction. Consider the impact on individuals' privacy, the organization's reputation, legal liabilities, and the likelihood of data breaches or misuse. 4. Data sensitivity: The sensitivity of the PII involved should also influence the acceptable level of accuracy. Highly sensitive information, such as financial data or health records, may require a higher degree of accuracy compared to less sensitive information like public contact details. 5. Performance trade-offs: Achieving higher accuracy in PII reduction may come with trade-offs in terms of computational...
---
### What is a LLM pricing strategy?
A Large Language Model (LLM) pricing strategy refers to how a company or organization determines the cost associated with using or accessing their language model, such as GPT-3 or similar models. Pricing strategies for LLMs can vary depending on factors such as the type of usage, the level of access, the volume of usage, and the target customer segments. Here are some common LLM pricing strategies: Pay-as-You-Go: In this pricing model, users are charged based on their actual usage of the LLM. It may involve pricing per token generated, per API request, or per hour of usage. This approach is flexible and allows users to pay only for what they use. Subscription Model: Some LLM providers offer subscription plans where users pay a fixed monthly or annual fee to access the model with certain usage limits. Subscriptions can offer cost predictability and may come with tiered pricing based on usage levels. Tiered Pricing: LLM providers may offer multiple pricing tiers with different features and usage limits. Customers can choose the tier that best suits their needs and budget, with higher tiers typically offering more features and higher usage limits at a higher price. Freemium Model: Some providers offer a free tier with limited functionality or usage to attract users, while offering premium paid tiers with more advanced features or higher usage limits. This strategy aims to convert free users into paying customers. Enterprise Plans: LLM providers may offer customized pricing for large enterprises or organizations that require high-volume access, specialized...
---
### What are the risks of open source LLMs?
Open source Large Language Models (LLMs), like any other software, can pose security risks if not properly managed and used. Here are some of the potential security risks associated with open source LLMs: 1. Malicious use of generated content: Open source LLMs can be used to generate fake news, phishing emails, spam, or other malicious content. This can be a security risk as it can deceive users and spread misinformation. 2. Bias and discrimination: LLMs trained on open source data may inherit biases present in the training data, leading to biased or discriminatory outputs. This can have ethical and legal implications and potentially harm individuals or groups. 3. Privacy concerns: If an open source LLM is used to generate text that includes personal or sensitive information, there can be privacy concerns if this information is not properly protected or handled. 4. Unauthorized access: If an open source LLM is deployed on a server or in a cloud environment, there is a risk of unauthorized access to the model or the data it processes. Security measures must be in place to prevent such breaches. 5. Model poisoning: Malicious actors can attempt to manipulate the training data or the model itself to inject malicious code or biases. This can result in the generation of harmful or malicious content. 6. Data leaks: Open source LLMs may inadvertently leak sensitive information or proprietary data if not properly configured and secured. This could occur through generated text or via attacks on the model itself....
---
### How can LLMs help determine user intent?
User intent is a classification problem. By giving the LLM model different examples of classifications, it can help simplify the process, since LLMs understand nuances. For example, by classifying the difference between booking a ticket and booking a restaurant, LLMs can understand that “booking a table” means “booking a restaurant”, even though there is no use of the word “restaurant” in the statement. For more information, NLP has a vast amount of literature dedicated to determining the most efficient classification technique to recognize user intent from a sentence.
---
### What’s the best way to get related context to feed the prompt using similarity search?
Go to the vector search Gather three to five chunks from the vector search. Feed them into the LLM for realization Ensure a proper response. If you weren’t able to select the correct chunks, you can: Extract information from the question to refine the search Apply keywords in the indexing For example, let’s assume you have Uber’s financial reports for 2021 and 2022. If a user asks a question about Uber’s financials in 2022, you can extract “2022” and ‘Uber” from the question and use them as a keyword search to filter down only Uber’s financial document. If you don’t have a 2022 document but you do have one that discusses how Lyft compared to Uber or Uber’s financials in 2023, vector search will be able to pick that one up. While the details won’t be the most accurate for summarization, refinement using keywords and understanding of the text could help improve accuracy .
---
### What are the privacy and security implications of using open source components in AI?
The topic of privacy and security is of utmost concern when dealing with AI, and specifically with generative AI. It has ethical, moral and legal implications, as can be seen by some of the recent data privacy lawsuits. One school of thought finds that open source is safer and more private. OpenAI, for example, is trained on open sources like GitHub and Wikipedia. Some of the privacy concerns arise when it comes to how these solutions are using the data users prompt. There were cases when users were able to receive other users’ private data, like phone numbers, as a response to their prompt. However, commercial solutions are not necessarily safer. When tuning your own model you own the data lifecycle and can control which data is going in. It’s important to ensure that data pipelines filter out private information and do not emit it. This call center demo example, uses an open source PII (personally identifiable information) recognizer to filter out private data patterns, like names, social security numbers, credit cards, etc. This ensures private data will not appear in the results when tuning or prompting the model.
---
### What's the best way to extract data from video with generative AI?
To summarize information and sentiment from videos, it’s recommended to take a text classification approach. This is a similar approach to managing audio streams. First, generate as many features as possible from the video stream. To do so, ask questions about the stream. E. g, ‘What is the sentiment of the video? ’, ‘What object is shown? ’, or ‘How many people are in the image? ’. Then, generate a table of metadata based on that information and structure the information in a database. Finally, look up the information and format it into text. Note: We don't recommend running queries that examine all the pictures, since this is not a cost-effective process.
---
### What’s the best way to perform search queries in documents with generative AI?
There are a number of use cases that require searching for and summarizing dynamic data. Examples include news websites, which require searching for news articles, or recruitment platforms, which require searching for candidates’ CVs. The basic way to manage this need is to take the document as is and index it. However, the resulting promo may not be the best one. Using associations, i. e vector search, which is based on seeing a familiar word, might also result in hallucinations. Instead, it is recommended to perform data preparation: processing, structuring and keyword searching, based on the document’s structure. Documents have a certain structure: a header, a body, tables, etc. Identify this structure. Then, determine which sections in the document highlight information you want to serve to your users and label them or add as a question. The paragraph within that section would be the answer. This way, the LLM knows this section is the relevant one. Understanding the document structure is very important, because it reduces the chances of hallucinations and saves resources. If the data is being regularly updated in a dynamic way, it’s important to also version the data, just like in data lakes and feature stores. Be wary of rushing to build a demo and abandoning engineering practices. When versioning, it’s important to refer to which version of the data you are using in the index search with labels and keyword searches, perform rolling upgrades, conduct metadata-based filtering, and more.
---
### Why is it important for data scientists and DevOps teams to collaborate and communicate around GenAI and MLOps?
Collaboration is key for aligning on intents before implementation, to ensure the best results for the problem we are trying to solve. Collaboration does the glue work to ensure a consistent and streamlined workstream between data scientists, data engineers, designers, etc. For example, when implementing a human-centered approach for GenAI, joint decision making and sharing of practices and technologies helps ensure the development of the right prompts that answer the customers’ needs and ethical requirements. Interested in learning more? Check out this 9 minute demo that covers MLOps best practices for generative AI applications. View this webinar with QuantumBlack, AI by McKinsey covers the challenges of deploying and managing LLMs in live user-facing business applications. Check out this demo and repo that demonstrates how to fine tune an LLM and build an application.
---
### Addressing Scalability and Performance Challenges in Generative AI
Training or even tuning a model requires lots of computation. You might even need distributed frameworks. There are two main distributed frameworks that are used in this space. Horovod, which interacts with the high-speed MPI layer for messaging underneath while running the decision training over TensorFlow or PyTorch. The second popular framework in space is Ray, which is also distributed Python. It allows you to distribute the workload and chart your model, whether it's based on the data or based on the model scaling across multiple computations. There are also frameworks that address serving at scale. An LLM like GPT-3 requires a significant number of GPUs, let’s say anywhere between four and eight. In terms of the budget, they require $20,000 - $30,000 per month for serving. Smaller frameworks like OpenLLaMA may also perform well if you teach them enough knowledge, requiring only one or two GPUs. There are also frameworks that know how to partition your model into multiple GPUs. Currently, there is a lot of research around how to build distribution more efficiently both for the serving and the training aspects. Interested in learning more? Check out this 9 minute demo that covers MLOps best practices for generative AI applications. View this webinar with QuantumBlack, AI by McKinsey covers the challenges of deploying and managing LLMs in live user-facing business applications. Check out this demo and repo that demonstrates how to fine tune an LLM and build an application.
---
### What are some best practices for establishing thresholds to trigger a new iteration of a generative AI model with new prompts in the ML lifecycle?
One method is to develop a second model to examine the first model and evaluate its response. But this space and technology are constantly evolving, so there is currently no technological market leader or established players that can evaluate responses. This is expected to change as the space matures. Until then, it is recommended to experiment as much as possible while involving humans to validate. Interested in learning more? Check out this 9 minute demo that covers MLOps best practices for generative AI applications. View this webinar with QuantumBlack, AI by McKinsey covers the challenges of deploying and managing LLMs in live user-facing business applications. Check out this demo and repo that demonstrates how to fine tune an LLM and build an application.
---
### How can organizations implement guardrails to ensure ethical use of AI?
A human-centered approach is the cornerstone of ethical AI. To implement this approach: Start by outlining the risks you want to avoid in terms of bias, transparency, explainability, fairness, toxicity, hallucinations and all the other dimensions that make up Responsible AI. Define the metrics you will use to measure the presence of these risks. Measure the developed models against these metrics to ensure their reliability and trustworthiness. For example, is the generated content compliant with the bias metrics. You might develop custom-made algorithms that provide a layer of explainability about the models’ output. You can even develop an analytical engine that monitors the ML pipeline and the compliance with these metrics. Interested in learning more? Check out this 9 minute demo that covers MLOps best practices for generative AI applications. View this webinar with QuantumBlack, AI by McKinsey covers the challenges of deploying and managing LLMs in live user-facing business applications. Check out this demo and repo that demonstrates how to fine tune an LLM and build an application.
---
### Is there a privacy issue or data leaking risk with custom models that utilize proprietary or public data?
There have been cases where chat information was leaked, which makes it all the more important to implement guardrails and involve the legal team. This will help minimize the risk of a privacy breach. As a general rule of thumb, it’s recommended to not share private data with LLMs since it might accidentally leak or get exfiltrated. This is especially important when running a POC with a public LLM. When the business is hosting an internal LLM, considerations might change. Interested in learning more? Check out this 9 minute demo that covers MLOps best practices for generative AI applications. View this webinar with QuantumBlack, AI by McKinsey covers the challenges of deploying and managing LLMs in live user-facing business applications. Check out this demo and repo that demonstrates how to fine tune an LLM and build an application.
---
### What is the key difference between fine tuning and embedding a foundational model?
Fine-tuning is the process of taking a pre-trained model and training it further on a smaller, domain-specific dataset. This additional training helps the model adapt to the nuances and requirements of the specific task or domain. Fine-tuning helps transferring the general knowledge learned from the foundational training on to specific tasks or domains, to enhance the model's performance on those tasks. Embedding a foundational model means integrating or incorporating the pre-trained model into another system or application, but without significantly altering its learned parameters. This means using the model's capabilities as they are. For example, the model could be used for generating text, answering questions, or any other task it was originally trained for. Interested in learning more? Check out this 9 minute demo that covers MLOps best practices for generative AI applications. View this webinar with QuantumBlack, AI by McKinsey covers the challenges of deploying and managing LLMs in live user-facing business applications. Check out this demo and repo that demonstrates how to fine tune an LLM and build an application.
---
### What are the steps in the MLOps workflow that are specific to LLMs?
Most of the MLOps steps are focused on structured data. To adapt them to LLMs, the steps need to be adapted. This includes, for example, embeddings, tokenization, data cleansing, and more. In addition, steps like evaluation and monitoring are more complicated and require more innovative thinking. For example, a drift analysis requires a histogram analysis and histogram comparison. This could require text analysis, conversion to numeric values, clustering or another type of logic to compare the training or the expected results with the actual results. Interested in learning more? Check out this 9 minute demo that covers MLOps best practices for generative AI applications. View this webinar with QuantumBlack, AI by McKinsey covers the challenges of deploying and managing LLMs in live user-facing business applications. Check out this demo and repo that demonstrates how to fine tune an LLM and build an application.
---
### How can LLMs be customized with company data?
Organizations that do not have an external dataset for training their models or want to have a model that is trained on their own data, can use prompt engineering or fine tuning. Prompt engineering means feeding a model with engineered requests (prompts), which include specific content, details, clarifying instructions, and examples. These prompts guide the model into the expected and most accurate answer. For example, ChatGPT is currently being trained on prompt engineering. Prompt engineering significantly improves model performance and tailoring of AI outputs, ensuring they align with desired outcomes and ethical guidelines. It also simplifies user interactions by providing clearer instructions and racks up less computational costs over fine-tuning, since optimized prompts result in more accurate results with fewer resources. While it’s certainly possible to do prompt engineering on a 3rd party model, it’s critical to understand the challenges of doing it that way: Inference performance Prompt size is usually limited Versioning sometimes changes the responses Company data is publicly exposed Another viable option is fine-tuning. Fine-tuning involves taking a pre-trained model, which has already learned from vast amounts of data, and further refining it on a specific task or domain. The model is exposed to labeled data related to the target task, allowing it to adapt and specialize its knowledge to better perform on that particular task. This process is highly beneficial as it enables the transfer of general knowledge acquired during pre-training to more specific applications. Fine-tuning also helps enhance the model's performance, boost accuracy, and ensure...
---
### How can costs be optimized when deploying LLMs?
Training and deploying LLMs can be a costly activity. This is because training and deploying LLMs requires substantial computational power, including high-performance GPUs. These computing resources are used for running multiple iterations of complex algorithms on massive datasets when training, as well as for fine-tuning and for deployment. The cost of renting or owning these resources, along with the electricity required to run them, contributes to the overall expense. Training a foundation model is so expensive–GPT-4 was trained on a cluster with 25k GPUs for over a month, at an estimated $10M–so it’s really not feasible for any but a select few technology companies. However, there is a large and growing landscape of open and commercial LLMs that organizations can leverage and customize. Customizing an LLM–whether via prompt engineering or fine tuning and transfer learning–still comes with significant costs. MLRun and Nuclio are two open source solutions that can help optimize these customized LLM deployment costs. MLRun is an open source MLOps orchestration framework. MLRun can help optimize training costs by using state of the art algorithms and automated technologies, like quantization and distributed training. Quantization enables using smaller GPUs, which are more cost-effective, and distributed training allows faster training or even enables the training itself if the user doesn't have access to large enough GPUs. To optimize serving, MLrun leverages Nuclio. Nuclio automates the data science pipeline with serverless functions. By leveraging those serverless advantages, Nuclio auto-scales and employs on-demand resource utilization automatically at each step of the...
---
### MLRun vs. Seldon: What's the difference?
Seldon is an MLOps solution but it is not a serverless technology. Users will need to provide Seldon with code, containers, YAMLs, etc. Then, Seldon can be used as a framework for launching them. It will build the entire service, including auto-scaling and APIs. MLRun, on the other hand, orchestrates ML and MLOps end-to-end, including pipelines and serving capabilities. In addition, MLRun has metadata services for saving objects, like models, datasets, features, etc. When training, MLRun automatically builds a survey with the file and information about the data schema, statistics, for drift analysis, parameters, and more. In other words, MLRun supports glueless integration with the serving layer and the monitoring layers without having to write a single line of code.
---
### Can MLRun be used with Amazon SageMaker?
MLRun can run as easily on Amazon SageMaker as it does on a local computer. In fact, it is environment-agnostic. For AWS users, the easiest way to install MLRun is to use a native AWS deployment. Here's how to do that. MLRun operates with a server side and a client side. The client side can run on everything, including AWS SageMaker, Azure ML, any Python, any Notebook, and more. MLRun can also run on Kubernetes Minikube, with containers, with Docker Compose, on EKS, on a cloud Kubernetes environment, and more. The only step required is to run pip install and configure environment variables. MLRun turns the requirements into a server function that uses auxiliary services. For running MLRun with Amazon SageMaker, MLRun provides two options: Using MLRun against workloads that will run on Amazon EKS For building a workflow around SageMaker services like SageMaker Autopilot. For more on all the ways Iguazio works together with AWS, including solution briefs, demos and more, check out the partner page. https://www. youtube. com/watch? v=ensWP77Yayo
---
### Can MLRun Support Models Built in AWS Dev Accounts for Upstream?
Yes. Here’s how it works: In MLRun, projects are the fundamental unit for working with the platform. First, users add or edit code and configurations in their project. Then, they run it and debug locally. Once ready, the code and configurations are pushed into a source repository (git) and tagged with labels. Finally, the code and configurations are loaded so they can run on development or production clusters. Users can load different versions of their projects, like development, staging and production. MLRun also integrates with CI systems. These CI systems enable automatically pushing code to the next required step. For example, to a development environment, to the testing phase, into a deployment flow with canaries, etc. Users can configure the CI to promote models to any environment or account. For more information from the docs on getting started with AWS and MLRun, see here. For more on all the ways Iguazio works together with AWS, including solution briefs, demos and more, check out the partner page.
---
### Does MLRun orchestrate on a Kubernetes operator or use a classic Helm chart?
MLRun is made up of multiple services: a UI, orchestration, security, RBAC, SSO, and more. These services are provided through multiple types of components, with operators being one of them. MLRun uses operators when launching services like a Spark job or a real-time serving function. These operators take the requests and turn them into live services. MLRun supports eight different operators: Nuclio for real-time, operators for multi-stage graphs, an operator for online multi-state graphical serving, a Spark operator, a Dask operator, and more. These operators are pluggable, which provides flexibility for adding more in the future. Operators are deployed as Helm charts, but can also support cloud formation.
---
### Can MLRun utilize GPUs when running ML jobs?
MLRun supports GPUs. When launching jobs or real-time functions like serving and data processing, you can turn on the GPUs. You can also specify how many CPU, GPU or CPU ratios are required. When building underlying serverless components, MLRun knows how to automatically configure GPUs for optimal usage. For example, when running a training job and requesting a GPU, MLRun will configure CUDA drivers automatically, enabling seamless transition from using a GPU to not using a GPU with a single flag. In this mask detection demo, you can see how to seamlessly move from distributed work on multiple GPUs to running locally without the GPU with the flip of a parameter.
---
### How do I use MLRun for batch sizing?
A batch is a group of instances from the dataset. Batches are fed into the different phases of the MLOps pipeline, like the training phase, for processing. Batches are used for use cases that do not require real-time or online data, which makes it more resource efficient and simple to use. You can practice how to run batch inference with MLRun with this tutorial.
---
### How do I use MLRun for real-time streaming configuration?
Real-time streaming is the process of collecting and ingesting data and processing it in real-time to answer business use cases. In MLOps architectures, streams store sequenced information, or records of data, that can be used for production and consumption. Consuming the data from the streams in real-time, enables running applications that require online data, like e-commerce or fraud detection. You can check out how to run a real-time pipeline with MLRun with the following demos: Model monitoring and drift detection (deploying a model to a live endpoint) Advanced model serving graph (deploying the graph as a real-time serverless functions) Distributed (multi-function) pipeline example
---
### How do MLRun and Iguazio plug into the ML ecosystem?
MLRun is an open source MLOps orchestration tool at the core of the Iguazio MLOps Platform. MLRun integrates data preparation, model development, model and application delivery, and end-to-end model monitoring. MLRun provides an open and pluggable architecture, so you can build a pipeline in MLRun that launches the jobs in Azure ML or SageMaker (and soon, in Databricks). Here are some examples of ways you can use other services with MLRun: Develop on SageMaker, Deploy with MLRun Develop on Azure ML, Deploy with MLRun Once the model is generated, MLRun can build it into an application pipeline. With MLRun’s pluggable architecture, you’re not locked into a single environment.
---
### MLRun vs. Airflow vs. MLFlow
There is some overlap between MLFlow and MLRun, but they have totally different goals. MLRun is an end to end orchestration layer for ML and MLOps. It’s not primarily a tracking system, though it does offer that functionality. MLFlow is a way to track your experiments, a component in the experimentation phase. There are also some ways to define metadata in MLFlow. However, there are many parts of the MLOps lifecycle that MLFlow doesn’t cover, like applying automation, automating serverless functions, running jobs, model monitoring, preparing data logic, and so on. Airflow is a great tool for when you want to track experiments and show great charts showing experiment results but it’s not a tool that will save time getting to production. MLRun is for what we call AutoMLOps, where the entire operationalization process is automated. MLRun uses serverless function technology: write the code once, using your preferred development environment and simple “local” semantics, and then run it as-is on different platforms and at scale. MLRun automates the build process, execution, data movement, scaling, versioning, parameterization, output tracking, CI/CD integration, deployment to production, monitoring, and more. MLRun provides an open pluggable architecture, so you have the option to use MLFlow (or any other tool) for the development side, and then use MLRun to automate the production distributed training environment without adding glue logic.
---
### Does Iguazio support data versioning and data labelling?
The short answer is yes. Everything in MLRun (Iguazio's open source MLOps orchestration framework that sits at the core of the Iguazio MLOps Platform) is versioned: data objects, feature sets, feature vectors, and models. There are actually two different versioning schemes, like a git repository, with the shots or unique identifier of a version, as well as tags and labels. Objects can have a version, like production or development, and they can be changed. There is also the aspect of labeling for the training set and there are many different tools (that integrate with MLRun) that can do the labeling manually. Streaming data requires an automated approach, where the gets labeled as it is ingested. For more on different approaches to automated labeling, check out chapter 2 in the book Implementing MLOps in the Enterprise by Yaron Haviv, Iguazio co-founder and CTO and Noah Gift, MLOps expert and author, from O’Reilly
---
### How do you define the overlapping responsibilities between data scientists, data engineers and MLOps engineers?
Overlapping areas of responsibility (as in cases where multiple roles are expected to do things like data preparation or model building) is a common problem on ML teams. By defining a “contract” between the different components of MLOps, as well as creating automations, you eliminate a lot of those challenges. Another way to make teams more efficient is for everyone to work on the same platform, using the same APIs and metadata databases. For instance: If a data scientist has built some data preparation code in a Jupyter notebook, it can be automatically turned into a pipeline that uses Spark. Now all you need is data from a data warehouse and it will use the same business logic, just on a scalable machine. You eliminated a lot of that friction between roles using automation. When a data scientist builds a feature, he/she doesn’t need to care about validation policies and biases. Instead, his contract is to define those features, and then move on to the training. The data engineer can take that definition and continue working on it—validating or cleaning the data in a different manner. One member of the team can build a serving pipeline just to test it out to see that it works, and then another member in charge of the production side can update it to work with GPUs, or spot instances.
---
### Iguazio vs. MLRun vs. Nuclio: Whats the Difference?
Nuclio and MLRun are two open-source technologies that the Iguazio team maintains. Nuclio is a serverless platform, and MLRun is an MLOps orchestration framework. These two open-source technologies are major components that run under the hood and power our enterprise offering, the Iguazio MLOps Platform. Read all about MLRun in the documentation here and join the Slack for community support here. Read about the technology behind the Iguazio MLOps Platform here, and the Iguazio documentation here.
---
### What Are the Tradeoffs Between a Data Lake and a Data Warehouse?
A data warehouse is for structured reporting data that is typically used for reporting data with consistent business data points (for example, metrics like quarterly sales). The structure of data in a data warehouse is well known, and it holds critical business data for reporting purposes. In a data warehouse, lots of work goes into structuring and cleaning the data upfront, so that it’s selective and useful. A data warehouse housed in a costly proprietary system, like Oracle or Terradata, so it makes sense to store the most critical business data here. A data lake offers massive storage for a much cheaper price point, so you don’t need to know or care about the structure or content of the data. The approach with a data lake is to “dump it in, and deal with it later”. This kind of setup is for any kind of data science process, where the data could potentially hold some value, but exploratory analysis is required to uncover it. A data lake requires very little work up front, and heavy data engineering—processing, running transformations and calculations, etc--to extract the value later. It's worth noting that data infrastructure changes along with the maturity of data use cases in organizations. Data warehouses are primarily for business intelligence, and data lakes are built once some kind of data science work has begun. Once multiple models need to be built and maintained, the next step of maturity is a feature store with advanced data transformations, where data scientists and...
---
### Batch Processing vs. Stream Processing: What’s the Difference?
Batch and stream processing are two types of methods used to process data, which is one of the steps in feature engineering. The processed data can be used in features for generating either real-time or batch predictions. In Batch processing, historical static data is processed in batches for use in features. Batches might run at scheduled intervals, or run when compute resources are available. With batch processing, heavy feature computations can be performed offline, so that it’s ready for fast inference. However, features do become stale, as changes in the real environment shift over time. It’s important to set up a drift-aware monitoring system for these features that includes a retraining step. In Stream Processing, predictions are made on real-time inputs with near real-time or streaming features for a given entity. Predictions made with stream processing and therefore streaming features can improve the quality of predictions by adding more valuable data to the model. For instance, a recommender system with streaming features can make use of a user’s recent website behavior, combined with data like real-time inventory or purchase history. Stream processing is quite a bit more complicated to embed into an ML workflow, and requires a real-time feature store, mature streaming infrastructure, and an efficient development environment so data scientists and ML engineers can work together to validate new streaming features.
---
### Static Deployment vs. Dynamic Deployment: What’s the Difference?
In Static Deployment, the model is trained offline, with batch data. The model is trained once with features generated from historical batch data, and that trained model is deployed. In static deployment, model training is done on a local machine and then saved and transferred to a server to make live inferences. All predictions are precomputed in batch, and generated at a certain interval (for example, once a day, or once every 2 hours). Examples of use cases where this type of deployment is appropriate are situations like content-based recommendations (DoorDash’s restaurant recommendations, Netflix’s recommendations circa 2021) In Dynamic Deployment, the model is trained online, or with a combination of online and offline features. Data continually enters the system and is incorporated into the model through continuous updates. In this deployment scenario, predictions are made on-demand, using a server. The model is deployed using a web framework like FastAPI or Flask and is offered as an API endpoint that responds to user requests, whenever they occur. Examples of use cases where this type of deployment is used are real-time fraud detection and prevention, delivery or logistics time estimates, and real-time recommendations.
---
### How do I train a model on very large datasets?
Distributed computing tools are a great way to run training on very large datasets. Many ML use cases require big data to train the model, which complicates development and production processes: In cases where data is too large to fit into the local machine’s memory, it is possible to use native Python file streaming and other tools to iterate through the dataset without loading it into memory, though this method can be slow going, since jobs are run on a single thread. Parallelization can speed up the process, but you’re still limited by the resources in your local machine Distributed computing tackles this challenge, by distributing tasks to multiple independent worker machines, each handling a part of the dataset in its own memory and dedicated processor. By using distributed training, data scientists can scale their code on very large datasets to run in parallel on several workers. This solution surpasses parallelization where jobs run in multiple processes or in threads on a single machine. There are two main options for transitioning to a distributed compute environment: Spark: A very mature and popular tool for data scientists who prefer a more SQL-oriented approach, especially in enterprises with legacy tech stacks and JVM infrastructureDask: The go-to tool for large-scale data processing for those who prefer Python or native code. It has seamless integration with common Python data tools that data scientists already use. Dask is great for complex use cases or applications that don’t neatly fit into the Spark computing model. More...
---
### How do I move my batch pipeline over to real time?
Moving your pipelines from batch to real-time is a complex endeavor. In many cases, the pipeline needs to be redesigned with a new tech stack that supports event-based architectures. Before you even contemplate moving your pipeline online, you must determine whether the data that supports the pipeline can also be moved to (near) real-time (streaming tech is the most popular right now). If your batch pipelines are supported by daily ETL that takes hours to process, even if you move the pipeline processing online, the data doesn’t support that. Start with the data: Where does it originate? Can you get a hold of the data when it originates, such as when a user clicks a button on a mobile app? Solving these issues can involve changing source applications and using streams. If it can be done, the battle is halfway won! Now that you have access to data streaming from the source, you need a way to run that data through your pipeline logic at scale. Iguazio's serverless open-source framework Nuclio allows you to create complex pipelines with many different event-based triggers. It also scales horizontally to fit the demand of the workload.
---
### How do I automate the training pipeline with my CI/CD framework?
When automating a pipeline of any kind, whether it be ML pipelines like model training or target="_blank" rel="noreferrer noopener">feature engineering, the basic first task is to remove and/or replace any step in the pipeline that requires a human to do the work. There are instances where human logic is ok in a pipeline, such as “Review” and “Approve”, but generally we want everything else in the pipeline to be able to execute without help from us. There are many tools out there for creating pipelines and job workflows. Here at Iguazio, we use Kubeflow Pipelines. Next, you need to design the pipeline in such a way that it’s robust and resilient. Think about how to answer questions like these: If my pipeline fails, can I easily restart it without manual fixes? If my pipeline executes multiple times in a row, do I still get the desired results? Is my pipeline dynamic in how it handles new data, parameters, scale, etc? If my pipeline routinely has issues/risks, what are they and how do I mitigate them? Before involving any CI/CD framework, you should be able to manually “trigger” the pipeline and answer these questions through some basic tests. Once you are satisfied with the pipeline, it’s time to automate it. What are the conditions under which this pipeline should run? If the answer is something along the lines of “every night at X time”, then you can accomplish that with a simple scheduling mechanism, which is often times part of the...
---
### What is the difference between data drift and concept drift?
The performance of a machine learning model degrades over time, absent intervention. This is why model monitoring is an important component of a production ML system. When a machine learning model’s predictions start to underperform, there can be several culprits. After ruling out any data quality issues, the two usual suspects are data drift and concept drift. It’s important to understand the difference between them, because they require different approaches. In data drift, the input has changed. The trained model is no longer relevant on the new data. In concept drift the data distribution hasn't changed. Rather, the interaction between inputs and outputs is different than before. ֵIt practically means that what we are trying to predict has changed. A classic example is spam detection: over time, spammers try new tactics, so the spam filters need to be retrained to react to these new patterns. For a deep dive on how to build a drift-aware production ML system, check out this blog. For a super-simple explanation of these concepts, check out Meor Amer's brilliant illustration of this idea: Source
---
### What is self-supervised learning in machine learning and how is it different from supervised learning?
Self-supervised learning is an evolving technique of helping ML models to learn from more data, without the bottleneck of human-labelled datasets. In this technique, the model predicts any hidden part of the input from an unhidden part of the input. This is similar to how you’re able to predict the missing letters in Wheel of Fortune: your brain fills in the hidden letters based on the clues in the letters you’re shown. By contrast, supervised learning is a technique where the ML model learns from labelled data sets, typically for tasks like classification. One benefit of supervised learning is that its performance can be measured during training, to assess how well the model has learned. So why would you choose one over the other? Machine learning models need to learn from massive amounts of good-quality labelled data. Finding or creating labelled data is a huge bottleneck, because it takes so long and costs so much—for instance, to teach a model to recognize a picture of a dog, a human has to label images as “dog” or “not a dog”. Practically speaking, it’s impossible for humans to label everything in the world (especially unstructured data), and there are some tasks for which there is simply not enough data, meaning any potential AI system would be limited by a small training set. Self-supervised learning addresses these limitations, allowing ML teams to scale the research and development of ML models at a low cost.
---
### How do I serve models for real-time enterprise applications?
You are basically asking for model serving or a way to manage and deliver your models in a secure and governed way to production. There are a few things you need to think about: How will my models be managed? How will my models be delivered (served) for inferencing? Do I need real-time or batch level delivery? In its simplest form, you store or deploy the trained model to a remote repository known as a model server. Then at runtime, you retrieve the model, pass features (inputs) into it and predict. There's a lot of value in this simple model. Firstly, your models are stored in a central repository which provides governance, share-ability, versioning and reusability. It should be as easy as a few function calls. Secondly, retrieving the model should also be as easy as a single function call. However, you must ensure the appropriate protocols are supported and are secure. A great way to accomplish this is using MLRun Serving Pipelines. Using MLRun Serving uses the Nuclio real-time serverless framework for the pipelines.
---
### What can I do with model monitoring?
In a nutshell, model monitoring allows a data scientist or DevOps engineer to keep track of a machine learning model after it has gone into production. This includes keeping track of resources, latency, invocations, and data/concept drift. Using this set of tools, we can define several types of use cases: Keep track of how often and how quickly your model is being used. If a certain time of day is much busier than others, this will become immediately apparent and allow your team to plan as such. This is especially useful when used in conjunction with viewing the consumed cluster resources. This allows for overall capacity planning and may showcase some opportunities to configure auto-scaling behavior. Calculate data and concept drift. This will give insights on the incoming data as well as the outgoing predictions from your model – both of which can drift and can be indicative of problems. Drift-based event triggering. Based on a drift event, it is possible to start a re-training pipeline that will train and deploy an updated version of the model. This is known as continuous training. Additionally, it is also possible to start a statistical analysis of your training data vs the received live data to see where the differences lie. These abilities are non-exhaustive but rather some of the many capabilities provided by proper model monitoring.
---
### Why is model monitoring so important?
After spending a long time developing and training our model, it’s finally time to go to production. But how do you still know if your model is still making accurate predictions in a week or month from now? How do we know how many resources the model is using? In short, model monitoring keeps track of your model after it goes into production. There are several facets of monitoring including: Kubernetes resources: How many resources are we using (CPU, MEM, GPU)? Latency: How long does inferencing take? Number/Average Invocations: How often is our model being used? Data drift: How different (statistically) is the incoming live data from the data we used to train the model? Concept drift: Has the meaning (statistical properties) of our prediction target changed? While all of these facets are important, some are easier to compute than others. Natively, Kubernetes reports resource utilization. It’s possible to get latency and invocation information from most model serving frameworks. However, data and concept drift are incredibly difficult to compute in real-time and on an on-going basis. This is one of the cases where an MLOps platform such as Iguazio will aid in deploying your models with built-in model monitoring and dashboards.
---
### Where does a feature store fit into the ML lifecycle?
Where the feature store fits into the overall ML lifecycle depends on the functionality of the feature store. Some feature stores are purely for storing and retrieving batch datasets. With these kinds of feature stores, they would likely be used after an ETL job from a data engineer to store the newly transformed data or after a feature engineering job from a data scientist to store the newly created features. From there, the end user can browse the feature store and see what is available to retrieve. Because this feature store is purely for storing and retrieving features, it has more limited uses within the overall ML lifecycle. Some more advanced feature stores support batch and real-time transformation engines, real-time feature serving, and model monitoring integration. This allows the feature store to be used throughout the entire ML lifecycle from data ingestion/transformation, mode training, model deployment, and model monitoring. Keeping all of these processes centralized in a single place allows for standardization across an organization as well as glue-less integration between different aspects of your ML lifecycle.
---
### Who benefits from a feature store?
Aside from the technical benefits of a feature store, one of the main benefits is organizational. In a typical enterprise ML team, the data engineers and data scientists have their own siloed workflows. As an over-simplification, data scientists will ask data engineers for new features and the engineers will provide them to the scientists. In this workflow, data scientists are decoupled from the data sources and are often unaware of problems providing and transforming data. Additionally, data engineers are decoupled from the usage of the data and are often unaware of problems implementing and utilizing data. In this context, a feature store will help both parties. It will allow both teams to gain a better understanding of the existing features, data sources, transformations, etc. It will also empower both parties to aid the other. For example, data scientists will have the ability to ingest from various data sources and perform transformations in batch and real-time. Additionally, data engineers will have the ability to see what features are currently available and retrieve them easily in batch and real-time. In this way, organizations can utilize a feature store to ease friction of hand-off points between teams.
---
### How do I select monitoring metrics specific to my use case?
To maintain the accuracy of an ML model in production, and detect drops in performance, it can sometimes be useful to create custom metrics that are quite specific to the product. Coming up with the right mix of metrics to monitor can be challenging. To get started, consider these two questions: 1. What are the expectations of the end user? Think of the metric as a basic user story: For a healthcare mobility use-case, a user story might be something like: “As a hospital worker who needs to triage patient care, I would like the most time-critical patient cases to be easily accessible, and therefore placed high on my screen. ” What metrics have an impact on your end-user’s perspective? 2. What is a bad user experience for your use case? Instead of looking at ideal or typical experiences, consider some edge cases. What happens when your service delivers a bad user experience? This can be instances where the model delivers a fallback response, or a low certainty response, or even an empty response. Your model monitoring should be able to catch these instances. Bugs in any number of code locations can and do happen regularly at every enterprise. For ML models serving critical functions, real-time metrics will ensure that any drop in performance can be addressed ASAP.
---
### Data preprocessing vs. feature engineering
What is Data Preprocessing? Data preprocessing is the process of cleaning and preparing the raw data to enable feature engineering. After getting large volumes of data from sources like databases, object stores, data lakes, engineers prepare them so data scientists can create features. This includes basic cleaning, crunching and joining different sets of raw data. In an operational environment this preprocessing would run as an ETL job for batch processing, or it could be part of a streaming processing for live data. Once the data is ready for the data scientist - then comes the feature engineering part. What is Feature Engineering? Feature engineering is the creation of features from raw data. Feature engineering includes: Determining required features for ML modeAnalysis for understanding statistics, distribution, implementing one hot encoding and imputation, and more. Tools like Python and Python libraries are used. Preparing features for ML model consumptionBuilding the modelsTesting if the features achieve what is neededRepeating the preparation and testing process, by running experiments with different features, adding, removing and changing features. During the process, the data scientist might find out data is missing from the sources. The data scientist will request preprocessing again from the data engineer. Deployment to the ML pipeline How Do Data Preprocessing and Feature Engineering Relate? In preprocessing data engineers get and clean data from the sources to be used for feature engineering. Feature engineering is the part of creating the actual features.
---
### How are ML pipelines evolving to make way for MLOps?
The concept of an ML pipeline is an automated pipeline that can be created from steps that take a model to production. In each step, different pieces of code can run that handle different types of processing within that pipeline. In a typical pipeline, there is a need for different functions that handle actions like collecting data sets, cleaning and preparing the data, calculating the data, creating feature sets, running training processes, selecting the best model from the training process, deploying the model on an inference layer, and finally, monitoring the model. An operational pipeline needs to support running those processes in a scalable way, so it needs to run on a framework like Kubernetes, which enables you to scale up or down based on your load. This pipeline also needs to support running different frameworks, for example, you may want to run your data preparation using Spark or Dask, or when it comes to processing data in real-time, you may need to incorporate frameworks that can read streaming data, such as Nuclio or Spark streaming. Another thing that’s important when creating a pipeline is the ability to track and capture relevant matrix and logs so the user can easily compare between different runs and identify issues within the pipeline and the root cause of those issues. Businesses seeking to generate profit with machine learning will benefit from MLOps. Businesses that are already further along on their data science journey will be more likely to see the advantages of MLOps...
---
### What businesses benefit the most from MLOps?
In short, any business with a data science team that produces ML models that address business operations. The phrase “data-driven” is by now a cliche in the business world. Business decision-makers keep an eye on the bottom line with the help of internal data infrastructures. Depending on the industry, the method by which this value is extracted can look very different. In general, any business that collects and stores data can also train a model. Depending on the use case, models can deliver predictions and insights, and detect risks with data generated by the business. MLOps becomes relevant when the model creation process is iterated, automated, and monitored so that the model continues to produce relevant results and business value as data continues to flow.
---
### What are Kubeflow Pipelines?
On a high level, the concept of pipelines in ML refers to a way of linking sequential components of the workflow of an ML project, and their relationships. In ML projects, it is challenging to keep track of when and where steps like data prep, training, and monitoring take place, so pipelines serve as a step-by-step map that describes what needs to happen and when. Working with pipelines creates easily composable, shareable, reproducible, and stitch-able ML projects. Kubeflow pipelines in particular is a set of services and UI enabling the user to create and manage ML pipelines. Users can write their own code or build from a large set of pre-defined components and algorithms contributed by teams at companies like Google, IBM, Amazon, Microsoft, NVIDIA, Iguazio, and others. Much like a function (ingesting inputs, parameters, and producing outputs), each Kubeflow component is Python code that is packaged in Docker images that executes one step within the ML pipeline. Kubeflow components will launch one or more Kubernetes pods for each step on your pipeline. Kubeflow can also run on Nuclio, a high-performance serverless platform that runs over Docker or Kubernetes and automates the development, operation, and scaling of code. The benefits of using Kubeflow are: Scalability: Easily spin up more resources when needed, and release them when you don’t. Composability: Each step is independent, which simplifies the orchestration of the whole pipeline. IE: you can use many different ML-specific frameworksPortability: Easily compose each step of the pipeline in one place without...
---
### What are the pros and cons of MLOps?
In the race to solve business problems, more companies have invested considerable capital into becoming >Iguazio allow for collaboration through their UI and feature store, continuous integration and deployment, and model monitoring. Cons: Initial costs The only aspect that MLOps could be considered a “con” is the costs outside of long-term cost-benefit analysis. That is, introducing MLOps to your firm might be expensive if you think short-term. However, if you project the benefits of driving your decisions with data, then the only thing you should consider is whether you build or buy your MLOps solution.
---
### What are the key components of a successful MLOps strategy?
The key components of a successful MLOps strategy revolve around having standards and best practices that will help you develop your ML service in a way that is almost production-ready from day one. It’s very important to have a production mindset even if you’re still in the development phase. This includes: 1. Tools & Frameworks Choose the right tools for your data scientists, data engineers, and DevOps. These tools and frameworks should be used in both development and production environments, in a way that will smoothen out the transition between development and production. 2. Building Blocks Build the right building blocks for: Data acquisitionFeature engineeringModel trainingModel servingModel monitoringGovernance For example, feature stores can help with feature engineering, which, in many cases, is considered the hardest task when building ML pipelines. Feature stores are a catalog of features that data scientists and engineers can leverage. With feature stores, data scientists can reuse features instead of creating duplicate features. They also empower data scientists to run complex feature engineering tasks using a simple and abstract API. 3. Strategy A strategy that covers all these steps. So when you start you can feel confident you have the right best practices and standards to complete each step in a way that reduces the time it takes to get those components to production. The state of mind should be that the development process should be production-ready from the get-go.
---
### What are a feature store’s capabilities?
Robust Data Transformation and Real Time Feature Engineering A feature store provides a means for creating a feature list in a logical group. This ensures they can be ready for training or inference. However, a feature store is much more than just a catalog. It is also a data transformation service for creating features. Data scientists can easily create features using simple APIs. These APIs allow them to create complex functions, including aggregation, sliding window, joins custom functions, etc. By using these abstract APIs, data scientists can create complex features while reducing the dependencies on data engineers. Advanced feature stores can handle features for batch processing as well as real-time feature engineering. As a result, the same abstract API can be used for calculating features in real-time based on event streaming (e. g. streaming coming from Kafka, Amazon Kinesis, etc. ). Write Features Once Data scientists often create features while they prepare their models for training. However, once they need to take it to the operational production pipeline, they pass it over to the data engineers. Then, the data engineers need to write code in Spark or Java to make it ready for production. This process becomes much easier with feature stores. Through the feature store APIs, features can be reused for both training and online inference, without the need to rewrite the code again for production. Monitoring and Drift Detection Alongside the features definition and the code for creating them, feature statistics are also captured. Advanced feature stores are...
---
### What is MLOps, and why should we care?
MLOps (Machine Learning Operations) is a combination of Machine Learning and DevOps principles. It uses DevOps concepts to manage the entire lifecycle of developing and deploying machine learning models. When developing a new ML-based service, the data science element is just the first step. Making the model operational in the live environment is where the complexity lies, and that’s what MLOps addresses. With MLOps, the process of deploying a model is accelerated and streamlined, minimizing the level of DevOps effort, reducing time to market and improving model accuracy. MLOps seeks to increase automation and achieve a CI/CD approach to releasing production models. MLOps builds and automates the entire machine learning lifecycle. These steps are: Data collection and processing Feature engineering Model training Model Deployment Model Testing Monitoring / drift detection Drift remediation Governance is another important element of MLOps. Enterprises need tools and processes to ensure data quality and security, enable explainability, and allow appropriate data access for auditing purposes.
---
### Why does governance come first in MLOps?
Sub title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam magna est, efficitur ut sem in, condimentum finibus enim. Nunc ac ex lacus. Phasellus semper mauris vitae erat bibendum, ut accumsan leo auctor. Suspendisse gravida at lorem sit amet rutrum. Vivamus nec turpis finibus, pharetra nunc vel, vulputate tortor. Vivamus lectus dolor, rutrum imperdiet accumsan nec, aliquet non tortor. Donec tempor metus dictum finibus venenatis. Etiam ac semper tellus. Donec ante urna, placerat et condimentum sit amet, imperdiet ut odio. Vestibulum scelerisque nunc enim, a auctor orci finibus eu. Etiam in velit leo. Mauris venenatis id tortor sit amet semper. Phasellus vestibulum dui non enim maximus posuere. Nulla facilisi. Ut at nisi egestas, aliquam erat sed, porttitor augue. Cras convallis ante sed ultricies porta. Mauris imperdiet ipsum vel ex lacinia hendrerit. Vivamus vitae lorem elementum, mattis ante sed, scelerisque augue. Donec id posuere dolor, vel hendrerit massa. Aenean placerat enim quam, nec sagittis est mollis facilisis. Sed turpis magna, porttitor at accumsan sed, vehicula eu velit. Morbi viverra augue sit amet enim aliquet, at molestie est efficitur. Cras vel varius sapien. Suspendisse volutpat tellus et bibendum ullamcorper. Proin orci quam, porta sed viverra ut, dictum eu felis. Vivamus vestibulum diam in velit commodo, ac hendrerit turpis fermentum. Mauris vel ex eu diam sollicitudin tempus sit amet non nunc. Morbi vulputate ligula faucibus, convallis sapien at, efficitur mi. Morbi viverra dui sit amet ultricies faucibus. Pellentesque congue auctor tempus. Morbi nec dictum dolor. Integer maximus tellus sit amet ante convallis,...
---
### How are ML pipelines evolving to make way for MLOps?
The concept of an ML pipeline is an automated pipeline that can be created from automated steps that take a model to production. In each step, different pieces of code can run that handle different types of processing within that pipeline. In a typical pipeline, there is a need for different functions that handle actions like collecting data sets, cleaning and preparing the data, calculating the data, creating feature sets, running training processes, selecting the best model from the training process, deploying the model on an inference layer, and finally, monitoring the model. An operational pipeline needs to support running those processes in a scalable way, so it needs to run on a framework like Kubernetes, which enables you to scale up or down based on your load. This pipeline also needs to support running different frameworks, for example, you may want to run your data preparation using Spark or Dask, or when it comes to processing data in real time, you may need to incorporate frameworks that can read streaming data, such as Nuclio or Sparks streaming. Another thing that’s important when creating a pipeline is the ability to track and capture relevant matrix and logs so the user can easily compare between different runs and identify issues within the pipeline and the root cause of those issues. Businesses seeking to generate profit with machine learning will benefit from MLOps. Businesses that are already further along on their data science journey will be more likely to see the advantages...
---