Skip to content

Predict Days Until First Complaint

Task type: RegressionTask Industry: General / Customer Service

Knowing when a complaint will arrive is as valuable as knowing whether it will arrive. By predicting the number of days until a customer's first complaint, service teams can prioritise outreach to customers whose complaints are imminent, schedule proactive check-ins, and allocate support resources ahead of anticipated complaint spikes.

What makes this advanced? Time-to-event prediction — finds the minimum timestamp among complaint events, converts to days from the split point.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): complaint

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

For regression tasks, the function must return one of:

  • np.array([value], dtype=np.float32) — the predicted continuous value (days until complaint).
  • Noneexclude this entity (e.g., no complaint in window, or incomplete data).

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window


# === Configuration ===
TARGET_WINDOW_DAYS = 90
COMPLAINT_DATA_SOURCE = "complaint"

def days_until_complaint_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict days until first complaint within 90 days."""

    split_ts = ctx[SPLIT_TIMESTAMP]

    if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
        return None

    future_window = future.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS))
    complaints = future_window[COMPLAINT_DATA_SOURCE]

    if complaints.count() == 0:
        return None

    first_complaint_ts = np.min(complaints.timestamps)
    days = (first_complaint_ts - split_ts) / 86400.0

    return np.array([days], dtype=np.float32)

Step-by-Step Breakdown

① Validate the training window

Python
if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
    return None

Ensures 90 days of future data are available. Truncated windows would bias the model toward shorter time-to-event values.

② Trim future events to the target window

Python
future_window = future.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS))
complaints = future_window[COMPLAINT_DATA_SOURCE]

Restricts complaints to the 90-day observation window. Complaints beyond this horizon are ignored.

③ Exclude customers with no complaints

Python
if complaints.count() == 0:
    return None

Customers who do not complain within the window are excluded from training. This is a censored observation — the model only learns from customers who actually complained. For survival-analysis style modelling, consider encoding censored observations differently.

④ Compute days to first complaint

Python
first_complaint_ts = np.min(complaints.timestamps)
days = (first_complaint_ts - split_ts) / 86400.0
return np.array([days], dtype=np.float32)

np.min finds the earliest complaint timestamp. Dividing the difference by 86,400 converts seconds to days. The result is a single float: 0.0 means a complaint on the split day, 45.0 means a complaint 45 days later.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, RegressionTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=RegressionTask(num_targets=1),
    target_fn=days_until_complaint_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
    metric_to_monitor="val_mae_0",
    metric_monitoring_mode=MetricMonitoringMode.MIN,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
MAE Average absolute error — intuitive and robust to outliers.
RMSE Penalises large errors more heavily than MAE.
Proportion of variance explained by the model.
MAPE Percentage-based error — useful for comparing across scales.

Production Tips

  1. Consider censored observations. Excluding non-complainers introduces survivorship bias. For a more robust approach, consider adding a binary classification head that predicts whether a complaint will occur, and only use the regression output when the classifier predicts positive.
  2. Log-transform the target. Days-to-event distributions are typically right-skewed. Applying np.log1p(days) can improve model performance and prediction stability.
  3. Segment by complaint severity. Not all complaints are equal. Train separate models for minor feedback vs. formal escalations to get more actionable predictions.
  4. Validate against business calendars. Complaint patterns often spike after weekends and holidays when support channels reopen. Account for business-day effects in your evaluation.