High-Spending Shopper Identification
Task type: BinaryClassificationTask
Industry: Retail / FMCG
This recipe predicts whether a customer will spend above a defined threshold in specific product categories over the next N days. It is useful for targeting high-value shoppers with category-specific promotions, supplier-funded campaigns, or loyalty rewards.
What does "high-spending" mean here? A customer is labeled positive (
1) if their total spend in the target categories meets or exceeds a configurable threshold (e.g., 20 EUR) within the target window. You control the categories, the threshold, and the window length.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes a
transactionsdata source with category and spend columns. - The monad library installed in your environment (for Python App).
Target Function
The target function tells the model how to label each customer for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
The function must return one of:
np.array([1], dtype=np.float32)— customer met or exceeded the spend thresholdnp.array([0], dtype=np.float32)— customer did not meet the thresholdNone— exclude this customer (e.g., incomplete observation window)
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
# === Configuration ===
TARGET_WINDOW_DAYS = 21 # Prediction horizon in days
TRANSACTION_DATA_SOURCE = "transactions"
CATEGORY_COLUMN = "category" # Column containing product category
SPEND_COLUMN = "gross_amt" # Column containing transaction amount
TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]
SPEND_THRESHOLD = 20 # Minimum spend (e.g., in EUR)
def high_spend_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Label a customer as high-spender (1) or not (0) in target categories."""
# 1. Ensure the training window is long enough
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if has_incomplete_training_window(ctx, target_window):
return None
# 2. Trim future events to the target window
future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)
# 3. Filter to target categories only
focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
by=CATEGORY_COLUMN,
condition=lambda x: x in TARGET_CATEGORIES,
)
# 4. Sum spend and compare against threshold
total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0
return np.array([met_threshold], dtype=np.float32)
def high_spend_target_fn(
history: target_function.Events,
future: target_function.Events,
attributes: target_function.Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Label a customer as high-spender (1) or not (0) in target categories."""
# === Configuration ===
TARGET_WINDOW_DAYS = 21 # Prediction horizon in days
TRANSACTION_DATA_SOURCE = "transactions"
CATEGORY_COLUMN = "category" # Column containing product category
SPEND_COLUMN = "gross_amt" # Column containing transaction amount
TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]
SPEND_THRESHOLD = 20 # Minimum spend (e.g., in EUR)
# 1. Ensure the training window is long enough
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
return None
# 2. Trim future events to the target window
future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)
# 3. Filter to target categories only
focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
by=CATEGORY_COLUMN,
condition=lambda x: x in TARGET_CATEGORIES,
)
# 4. Sum spend and compare against threshold
total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0
return np.array([met_threshold], dtype=np.float32)
Step-by-Step Breakdown
① Validate the training window
Skips samples where the temporal split is too close to the end of the dataset to observe a full 21-day window. This prevents training on incomplete data.
② Trim future events to the target window
Narrows the future to exactly 21 days so every sample is evaluated against the same horizon.
③ Filter to target categories
focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
by=CATEGORY_COLUMN,
condition=lambda x: x in TARGET_CATEGORIES,
)
Keeps only transactions in the categories you care about. Everything else (e.g., bakery, beverages) is discarded. The filter method accepts a lambda that receives each event's category value.
④ Apply spend threshold
total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0
return np.array([met_threshold], dtype=np.float32)
Sums the spend column across all matching transactions and compares it to the threshold. Note that .events returns the raw NumPy array of values for a given column.
Training
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, BinaryClassificationTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=BinaryClassificationTask(),
target_fn=high_spend_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
],
metric_to_monitor="val_auroc_0",
metric_monitoring_mode=MetricMonitoringMode.MAX,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC"),
MetricParams(alias="auprc", metric_name="AveragePrecision"),
MetricParams(alias="recall", metric_name="Recall"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Variations
Frequency-based instead of spend-based
Label a customer as positive if they make at least N purchases in the target categories, regardless of total spend:
def frequent_buyer_target_fn(
history: Events, future: Events, attributes: Attributes, ctx: Dict
) -> np.ndarray | None:
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if has_incomplete_training_window(ctx, target_window):
return None
future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)
MIN_PURCHASES = 3
focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
by=CATEGORY_COLUMN,
condition=lambda x: x in TARGET_CATEGORIES,
)
met_threshold = 1 if focus_transactions.count() >= MIN_PURCHASES else 0
return np.array([met_threshold], dtype=np.float32)
def frequent_buyer_target_fn(
history: target_function.Events,
future: target_function.Events,
attributes: target_function.Attributes,
ctx: Dict,
) -> np.ndarray | None:
TARGET_WINDOW_DAYS = 21
TRANSACTION_DATA_SOURCE = "transactions"
CATEGORY_COLUMN = "category"
TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
return None
future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)
MIN_PURCHASES = 3
focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
by=CATEGORY_COLUMN,
condition=lambda x: x in TARGET_CATEGORIES,
)
met_threshold = 1 if focus_transactions.count() >= MIN_PURCHASES else 0
return np.array([met_threshold], dtype=np.float32)
Require purchase history
Exclude customers who have never bought in the target categories before — they are unlikely to start now, and including them adds noise:
def high_spend_with_history_target_fn(
history: target_function.Events,
future: target_function.Events,
attributes: target_function.Attributes,
ctx: Dict,
) -> np.ndarray | None:
# === Configuration ===
TARGET_WINDOW_DAYS = 21
TRANSACTION_DATA_SOURCE = "transactions"
CATEGORY_COLUMN = "category"
SPEND_COLUMN = "gross_amt"
TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]
SPEND_THRESHOLD = 20
# Require purchase history in target categories
history_in_categories = history[TRANSACTION_DATA_SOURCE].filter(
by=CATEGORY_COLUMN,
condition=lambda x: x in TARGET_CATEGORIES,
)
if history_in_categories.count() == 0:
return None
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
return None
future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)
focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
by=CATEGORY_COLUMN,
condition=lambda x: x in TARGET_CATEGORIES,
)
total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0
return np.array([met_threshold], dtype=np.float32)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| AUROC | Overall ranking quality — how well the model separates high-spenders from the rest. |
| AUPRC | More informative when high-spenders are a small minority. |
| Precision | Important when promotional budget is limited — you want to target the right people. |
| F1 Score | Balances precision and recall into a single number. |
Production Tips
-
Adjust the threshold to match your campaign economics. If a coupon costs 5 EUR, set the spend threshold above the break-even point. The model then directly predicts profitability.
-
Update category lists seasonally. Product categories that matter in summer (ice cream, cold drinks) differ from winter (soups, hot beverages). Keep
TARGET_CATEGORIESaligned with your current campaign. -
Segment by customer type. Run separate predictions for new vs. returning customers — their spending patterns differ and the model may perform differently on each group.
-
Combine with basket analysis. Pair predictions with co-purchase insights to recommend complementary products alongside the target categories.