Skip to content

Positive Reviews for All Items

Task type: BinaryClassificationTask Industry: E-commerce

Customers who rate every item in an order highly are prime candidates for loyalty programs, referral incentives, and user-generated content campaigns. Identifying these highly satisfied customers before they even place their next order lets marketing teams prepare personalized post-purchase flows — review request emails, social sharing prompts, or ambassador program invitations — timed to arrive when satisfaction is at its peak.

What makes this advanced? Cross-event join via extra columns — the target function uses .extra["order_id"] to link transactions with reviews, a pattern that requires configuring extra columns in the foundation model to make join keys available at prediction time.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources. The foundation model must be configured with order_id as an extra column on the transactions data source — add this entry to your FM config YAML:

    FM config (excerpt)
    data_sources:
      - type: event
        name: transactions
        extra_columns:
          - alias: order_id
            sql: order_id
    
  • The monad library installed in your environment.

  • Data source(s): transactions, reviews

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

The function must return one of:

  • np.array([1], dtype=np.float32)positive case
  • np.array([0], dtype=np.float32)negative case
  • Noneexclude this entity from training

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window


# === Configuration ===
TRANSACTIONS_DATA_SOURCE = "transactions"
REVIEWS_DATA_SOURCE = "reviews"
POSITIVE_RATING_THRESHOLD = 8

def positive_reviews_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict if customer leaves positive reviews for all items in next order."""

    # 1. Get the next order ID from extra columns
    future_orders = future[TRANSACTIONS_DATA_SOURCE].extra["order_id"]
    if len(future_orders) == 0:
        return np.array([0], dtype=np.float32)

    next_basket_id = future_orders[0]

    # 2. Get ratings for items in that order
    next_basket_ratings = future[REVIEWS_DATA_SOURCE].filter(
        "order_id", lambda x: x == next_basket_id
    )["rating"].events

    # 3. Handle missing ratings
    if next_basket_ratings.dtype.kind == "f" and np.isnan(next_basket_ratings).any():
        return np.array([0], dtype=np.float32)

    # 4. Check if all ratings are positive
    all_positive = np.all(next_basket_ratings > POSITIVE_RATING_THRESHOLD)
    return np.array([int(all_positive)], dtype=np.float32)

Extra columns

This recipe uses .extra["order_id"] to access the order identifier from the transactions data source. Extra columns must be configured when building the foundation model — they are columns that are not used as features but are carried through for use in target functions and post-processing.

Step-by-Step Breakdown

① Get next order from extra columns

Python
future_orders = future[TRANSACTIONS_DATA_SOURCE].extra["order_id"]
if len(future_orders) == 0:
    return np.array([0], dtype=np.float32)

next_basket_id = future_orders[0]

The .extra accessor retrieves columns that were configured as extra columns in the foundation model. Here, order_id is used as a join key to link transactions with their reviews. The first order ID in the future represents the customer's next order. If no future transactions exist, the entity is labeled negative.

② Retrieve ratings for that order

Python
next_basket_ratings = future[REVIEWS_DATA_SOURCE].filter(
    "order_id", lambda x: x == next_basket_id
)["rating"].events

The reviews data source is filtered to only include reviews that match the next order's ID. The ["rating"] accessor extracts the numerical rating values. This cross-event join connects two different data sources (transactions and reviews) through the shared order_id.

③ Handle NaN ratings

Python
if next_basket_ratings.dtype.kind == "f" and np.isnan(next_basket_ratings).any():
    return np.array([0], dtype=np.float32)

Missing ratings (NaN values) indicate items that were not reviewed. Since we cannot confirm positive sentiment for unreviewed items, these entities are conservatively labeled negative. This prevents the model from treating incomplete data as positive evidence.

④ Check all-positive condition

Python
all_positive = np.all(next_basket_ratings > POSITIVE_RATING_THRESHOLD)
return np.array([int(all_positive)], dtype=np.float32)

The label is positive only if every single item in the order received a rating above the threshold (8). np.all enforces this strict criterion — even one mediocre rating means the entity is labeled negative. This identifies truly delighted customers, not just satisfied ones.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, BinaryClassificationTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=BinaryClassificationTask(),
    target_fn=positive_reviews_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
        MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
        MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="recall", metric_name="Recall"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
AUROC Measures overall ranking quality.
AUPRC More informative when the positive class is rare.
Recall Proportion of actual positives caught.
Precision Proportion of predicted positives that are correct.
F1 Score Harmonic mean of precision and recall.

Production Tips

  1. Configure extra columns during foundation model training. The order_id must be declared as an extra column on the transactions data source when building the foundation model. Without this configuration, .extra["order_id"] will not be available at prediction time.
  2. Choose the rating threshold carefully. A threshold of 8 (out of 10) is strict. Analyze your rating distribution — if most customers rate 7-8, a threshold of 8 captures only the top tier. Adjust based on whether you want to identify "satisfied" (threshold 6-7) or "delighted" (threshold 8-9) customers.
  3. Handle orders with missing reviews gracefully. Not all customers review every item. The current function labels these as negative, but you may want to exclude them (return None) if the review rate is very low, to avoid biasing the model toward customers who review frequently.
  4. Consider review timing. Reviews may arrive days or weeks after delivery. Ensure your future window is long enough to capture reviews for the order, or the model will undercount positive cases.
  5. Use predictions to time review requests. Score customers at the point of order placement, then send review request emails to high-probability customers first — they are most likely to leave positive reviews that boost your product ratings.