NEW RELEASE

MLRun 1.7 is here! Unlock the power of enhanced LLM monitoring, flexible Docker image deployment, and more.

How to Build Real-Time Feature Engineering with a Feature Store

Adi Hirschtein | December 17, 2020

Simplifying feature engineering for building real-time ML pipelines might just be the next holy grail of data science. It’s incredibly difficult and highly complex, but it’s also desperately needed for multiple use cases across dozens of industries. 

Currently, feature engineering is siloed between data scientists, who search for and create the features, and data engineers, who rewrite the code for a production environment. The siloed process is both slow and raises the risk of training serving skew due to rewriting code to address the operational requirements of a production environment. 

We see a path to a robust, fast feature store that operates as a data transformation service, with a single logic for generating features for both training and serving. 

A real-time feature store needs:

  • A high-speed serverless function that can read streaming data
  • A transformation service that process real-time events via a simple SDK
  • A fast queuing framework
  • A fast key value database for online serving

A key component of such a solution is a framework for analyzing and processing events in real time. An event processing library should be built in to support this, providing a layer of abstraction for the data scientist. Iguazio’s Storey is an event processing library that meets these needs. 

Why does a feature store need to be integrated with modeling and training processes?

Model deployments are not carried out the first time you deploy your model. Models in production need to be monitored on an ongoing basis as their prediction may become less accurate over time (model drift). 

Data drift is one of the causes of model drift in production. By capturing feature vectors, prediction, and statistics in real time and storing them in the feature store as a product feature set, you can:

  • Identify real-time feature drift
  • Easily access fresh production data for model retraining
  • Monitor and troubleshoot data drift on a real-time dashboard 

With the right capabilities and frameworks in place, it’s possible to create a real-time feature store that underpins real-time feature engineering for close to instantaneous predictions. For more information about enabling real-time feature engineering, read the full article on Towards Data Science.

Read the full article about enabling real-time feature engineering on Towards Data Science.