Skip to content

High-Spending Shopper Identification

Task type: BinaryClassificationTask Industry: Retail / FMCG

This recipe predicts whether a customer will spend above a defined threshold in specific product categories over the next N days. It is useful for targeting high-value shoppers with category-specific promotions, supplier-funded campaigns, or loyalty rewards.

What does "high-spending" mean here? A customer is labeled positive (1) if their total spend in the target categories meets or exceeds a configurable threshold (e.g., 20 EUR) within the target window. You control the categories, the threshold, and the window length.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes a transactions data source with category and spend columns.
  • The monad library installed in your environment (for Python App).

Target Function

The target function tells the model how to label each customer for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

The function must return one of:

  • np.array([1], dtype=np.float32) — customer met or exceeded the spend threshold
  • np.array([0], dtype=np.float32) — customer did not meet the threshold
  • Noneexclude this customer (e.g., incomplete observation window)

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window



# === Configuration ===
TARGET_WINDOW_DAYS = 21              # Prediction horizon in days
TRANSACTION_DATA_SOURCE = "transactions"
CATEGORY_COLUMN = "category"         # Column containing product category
SPEND_COLUMN = "gross_amt"           # Column containing transaction amount
TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]
SPEND_THRESHOLD = 20                 # Minimum spend (e.g., in EUR)


def high_spend_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Label a customer as high-spender (1) or not (0) in target categories."""

    # 1. Ensure the training window is long enough
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if has_incomplete_training_window(ctx, target_window):
        return None

    # 2. Trim future events to the target window
    future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)

    # 3. Filter to target categories only
    focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
        by=CATEGORY_COLUMN,
        condition=lambda x: x in TARGET_CATEGORIES,
    )

    # 4. Sum spend and compare against threshold
    total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
    met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0

    return np.array([met_threshold], dtype=np.float32)
Python
def high_spend_target_fn(
    history: target_function.Events,
    future: target_function.Events,
    attributes: target_function.Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Label a customer as high-spender (1) or not (0) in target categories."""

    # === Configuration ===
    TARGET_WINDOW_DAYS = 21              # Prediction horizon in days
    TRANSACTION_DATA_SOURCE = "transactions"
    CATEGORY_COLUMN = "category"         # Column containing product category
    SPEND_COLUMN = "gross_amt"           # Column containing transaction amount
    TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]
    SPEND_THRESHOLD = 20                 # Minimum spend (e.g., in EUR)

    # 1. Ensure the training window is long enough
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if target_function.has_incomplete_training_window(ctx, target_window):
        return None

    # 2. Trim future events to the target window
    future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

    # 3. Filter to target categories only
    focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
        by=CATEGORY_COLUMN,
        condition=lambda x: x in TARGET_CATEGORIES,
    )

    # 4. Sum spend and compare against threshold
    total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
    met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0

    return np.array([met_threshold], dtype=np.float32)

Step-by-Step Breakdown

① Validate the training window

Python
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if has_incomplete_training_window(ctx, target_window):
    return None
Python
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
    return None

Skips samples where the temporal split is too close to the end of the dataset to observe a full 21-day window. This prevents training on incomplete data.

② Trim future events to the target window

Python
future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)
Python
future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

Narrows the future to exactly 21 days so every sample is evaluated against the same horizon.

③ Filter to target categories

Python
focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
    by=CATEGORY_COLUMN,
    condition=lambda x: x in TARGET_CATEGORIES,
)

Keeps only transactions in the categories you care about. Everything else (e.g., bakery, beverages) is discarded. The filter method accepts a lambda that receives each event's category value.

④ Apply spend threshold

Python
total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0
return np.array([met_threshold], dtype=np.float32)

Sums the spend column across all matching transactions and compares it to the threshold. Note that .events returns the raw NumPy array of values for a given column.


Training

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, BinaryClassificationTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=BinaryClassificationTask(),
    target_fn=high_spend_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
        MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
        MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="recall", metric_name="Recall"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Variations

Frequency-based instead of spend-based

Label a customer as positive if they make at least N purchases in the target categories, regardless of total spend:

Python
def frequent_buyer_target_fn(
    history: Events, future: Events, attributes: Attributes, ctx: Dict
) -> np.ndarray | None:
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if has_incomplete_training_window(ctx, target_window):
        return None
    future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)

    MIN_PURCHASES = 3
    focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
        by=CATEGORY_COLUMN,
        condition=lambda x: x in TARGET_CATEGORIES,
    )
    met_threshold = 1 if focus_transactions.count() >= MIN_PURCHASES else 0
    return np.array([met_threshold], dtype=np.float32)
Python
def frequent_buyer_target_fn(
    history: target_function.Events,
    future: target_function.Events,
    attributes: target_function.Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    TARGET_WINDOW_DAYS = 21
    TRANSACTION_DATA_SOURCE = "transactions"
    CATEGORY_COLUMN = "category"
    TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]

    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if target_function.has_incomplete_training_window(ctx, target_window):
        return None
    future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

    MIN_PURCHASES = 3
    focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
        by=CATEGORY_COLUMN,
        condition=lambda x: x in TARGET_CATEGORIES,
    )
    met_threshold = 1 if focus_transactions.count() >= MIN_PURCHASES else 0
    return np.array([met_threshold], dtype=np.float32)

Require purchase history

Exclude customers who have never bought in the target categories before — they are unlikely to start now, and including them adds noise:

Python
# Add after step 2 (trim future):
history_in_categories = history[TRANSACTION_DATA_SOURCE].filter(
    by=CATEGORY_COLUMN,
    condition=lambda x: x in TARGET_CATEGORIES,
)
if history_in_categories.count() == 0:
    return None
Python
def high_spend_with_history_target_fn(
    history: target_function.Events,
    future: target_function.Events,
    attributes: target_function.Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    # === Configuration ===
    TARGET_WINDOW_DAYS = 21
    TRANSACTION_DATA_SOURCE = "transactions"
    CATEGORY_COLUMN = "category"
    SPEND_COLUMN = "gross_amt"
    TARGET_CATEGORIES = ["Yoghurts", "Dairy Desserts", "Dairy Drinks"]
    SPEND_THRESHOLD = 20

    # Require purchase history in target categories
    history_in_categories = history[TRANSACTION_DATA_SOURCE].filter(
        by=CATEGORY_COLUMN,
        condition=lambda x: x in TARGET_CATEGORIES,
    )
    if history_in_categories.count() == 0:
        return None

    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if target_function.has_incomplete_training_window(ctx, target_window):
        return None
    future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

    focus_transactions = future[TRANSACTION_DATA_SOURCE].filter(
        by=CATEGORY_COLUMN,
        condition=lambda x: x in TARGET_CATEGORIES,
    )
    total_spend = np.sum(focus_transactions[SPEND_COLUMN].events)
    met_threshold = 1 if total_spend >= SPEND_THRESHOLD else 0
    return np.array([met_threshold], dtype=np.float32)

Metric Why it matters
AUROC Overall ranking quality — how well the model separates high-spenders from the rest.
AUPRC More informative when high-spenders are a small minority.
Precision Important when promotional budget is limited — you want to target the right people.
F1 Score Balances precision and recall into a single number.

Production Tips

  1. Adjust the threshold to match your campaign economics. If a coupon costs 5 EUR, set the spend threshold above the break-even point. The model then directly predicts profitability.

  2. Update category lists seasonally. Product categories that matter in summer (ice cream, cold drinks) differ from winter (soups, hot beverages). Keep TARGET_CATEGORIES aligned with your current campaign.

  3. Segment by customer type. Run separate predictions for new vs. returning customers — their spending patterns differ and the model may perform differently on each group.

  4. Combine with basket analysis. Pair predictions with co-purchase insights to recommend complementary products alongside the target categories.