Feature Stores: The Missing Piece in Your ML Infrastructure

If you have ever deployed a machine learning model and found that its predictions in production do not match its performance during training, you have likely encountered the training-serving skew problem. The model was trained on features computed one way in a batch pipeline, but at serving time, those same features are computed differently, perhaps by a different team using different code. The result is subtle but devastating: a model that looked great in evaluation quietly makes poor decisions in production.

Feature stores solve this problem by providing a centralized, consistent layer for computing, storing, and serving features to both training and inference pipelines. They are one of the most impactful pieces of ML infrastructure you can adopt, yet they remain one of the least understood. This post explains what feature stores do, when you need one, and how to get started.

The Feature Management Problem

To understand why feature stores exist, consider what happens in a typical ML project without one.

A data scientist writes a Jupyter notebook that joins several database tables, computes rolling averages, encodes categorical variables, and engineers a set of features. They train a model and achieve good results. Now it is time to deploy. An ML engineer rewrites the feature logic in the production language, connects to the live data sources, and deploys the model. The problem is that the two implementations, the notebook and the production code, are subtly different. The notebook used a left join; the production code uses an inner join. The notebook computed a 30-day rolling average as of the training date; the production code computes it as of "now," which includes future data leakage for some features.

These discrepancies are difficult to detect and cause silent model degradation. Multiply this by dozens of models, each with their own features, and you have a maintenance nightmare.

A feature store addresses this by establishing a single source of truth for feature definitions that is used by both training and serving.

Core Components of a Feature Store

A feature store is not a single technology. It is a system with several interconnected components:

Feature registry is the metadata layer. It stores feature definitions, including their names, data types, descriptions, owners, data sources, and transformation logic. Think of it as a catalog that answers "what features exist and how are they computed?"

Offline store is optimized for batch access. It stores historical feature values used for training, typically in a data warehouse or data lake. When you need to create a training dataset, you query the offline store for feature values as of specific historical timestamps (point-in-time lookups), ensuring that your training data does not contain future information leakage.

Online store is optimized for low-latency access. It stores the latest feature values for real-time inference, typically in a key-value store like Redis, DynamoDB, or Cassandra. When a prediction request arrives, the serving system retrieves current feature values from the online store in single-digit milliseconds.

Feature transformation engine computes features from raw data sources according to the registered definitions. This can be a batch pipeline (Spark, dbt) for features that update periodically or a streaming pipeline (Flink, Spark Streaming) for features that need near-real-time freshness.

Loading diagram...

Implementing a Feature Store with Feast

Feast is the most widely adopted open-source feature store. It provides the registry, offline retrieval, and online serving components while integrating with existing data infrastructure.

Here is how to define and register features with Feast:

# feature_repo/features.py
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64, String
from datetime import timedelta
 
# Define the entity (primary key for feature lookups)
customer = Entity(
    name="customer_id",
    join_keys=["customer_id"],
    description="Unique customer identifier",
)
 
# Define the data source
customer_features_source = FileSource(
    path="data/customer_features.parquet",
    timestamp_field="event_timestamp",
)
 
# Define the feature view
customer_features = FeatureView(
    name="customer_features",
    entities=[customer],
    ttl=timedelta(days=1),
    schema=[
        Field(name="total_orders_30d", dtype=Int64),
        Field(name="avg_order_value_30d", dtype=Float32),
        Field(name="days_since_last_order", dtype=Int64),
        Field(name="customer_segment", dtype=String),
        Field(name="lifetime_value", dtype=Float32),
    ],
    source=customer_features_source,
    online=True,
)

For training, you retrieve historical features with point-in-time correctness:

from feast import FeatureStore
import pandas as pd
 
store = FeatureStore(repo_path="feature_repo")
 
# Entity dataframe: who you want features for, and at what point in time
entity_df = pd.DataFrame({
    "customer_id": [1001, 1002, 1003, 1001],
    "event_timestamp": [
        "2024-09-01 10:00:00",
        "2024-09-01 10:00:00",
        "2024-09-15 14:00:00",
        "2024-10-01 08:00:00",  # Same customer, different time
    ],
})
 
# Get historical features (point-in-time join)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "customer_features:total_orders_30d",
        "customer_features:avg_order_value_30d",
        "customer_features:days_since_last_order",
        "customer_features:lifetime_value",
    ],
).to_df()

For serving, you retrieve the latest feature values with low latency:

# Materialize latest features to the online store
store.materialize_incremental(end_date=datetime.now())
 
# Retrieve features for real-time inference
feature_vector = store.get_online_features(
    features=[
        "customer_features:total_orders_30d",
        "customer_features:avg_order_value_30d",
        "customer_features:days_since_last_order",
        "customer_features:lifetime_value",
    ],
    entity_rows=[{"customer_id": 1001}],
).to_dict()

The critical point is that the same feature definitions power both retrieval paths. The training pipeline and the serving pipeline always see the same feature values for the same entity at the same point in time, eliminating training-serving skew by design.

When Do You Actually Need a Feature Store?

Not every ML team needs a feature store from day one. The investment is justified when you encounter these signals:

Multiple models share features. If your churn prediction model and your recommendation model both use "customer lifetime value" and "days since last order," you are computing them in two places. A feature store computes them once and serves them to both.

Training-serving skew is causing production issues. If you have experienced models degrading after deployment for reasons unrelated to data drift, feature computation inconsistency is a likely culprit.

Feature engineering is a bottleneck. Data scientists spend 60-80% of their time on feature engineering. If they are repeatedly building the same features from scratch because there is no catalog of existing features, a feature store directly reduces this waste.

Real-time features are required. If your models need features computed from streaming data, such as "number of transactions in the last 5 minutes," a feature store with streaming ingestion handles this elegantly.

If you have one model with a handful of features and a single data scientist, a feature store is premature infrastructure. Start simple and adopt one when the pain of not having one exceeds the cost of implementing one.

Common Pitfalls in Feature Store Adoption

Over-engineering from the start. Begin with a minimal setup: a feature registry, an offline store backed by your existing data warehouse, and an online store only for features that require low-latency serving. Add streaming and complex transformations as needs arise.

Ignoring data quality. A feature store centralizes feature computation, which means it also centralizes any data quality issues. Build monitoring that tracks feature value distributions, null rates, and staleness. Alert when features drift outside expected ranges.

Poor naming conventions. Without conventions, you end up with features named "avg_val_30" and "average_value_last_month" that are actually the same thing. Establish clear naming standards and enforce them through code review and automated validation.

Neglecting documentation. Every feature should have a human-readable description, its business meaning, the team that owns it, and its expected update frequency. The registry should be browsable by non-engineers.