Skip to content

Predict Installment Payment Defaults

Task type: BinaryClassificationTask Industry: Finance / Banking

Early detection of installment payment defaults allows credit teams to intervene before a borrower falls into serious arrears. By predicting which customers will miss multiple deadlines, lenders can offer restructuring options, adjust collection strategies, or tighten credit limits proactively — reducing losses while maintaining customer relationships.

What makes this advanced? Heavy pandas integration — uses DataFrame merge, cumcount for one-to-one pairing of installments with repayments, and timedelta comparisons for grace period logic.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): reminder_log, transactions

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

The function must return one of:

  • np.array([1], dtype=np.float32)positive case
  • np.array([0], dtype=np.float32)negative case
  • Noneexclude this entity from training

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window

import pandas as pd

# === Configuration ===
TARGET_WINDOW_MONTHS = 6
MAX_MISSED_THRESHOLD = 3
GRACE_PERIOD_DAYS = 3
REMINDERS_DATA_SOURCE = "reminder_log"
TRANSACTIONS_DATA_SOURCE = "transactions"

def installment_defaults_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict if customer misses >3 installment deadlines in 6 months."""

    target_window_days = 30 * TARGET_WINDOW_MONTHS
    if has_incomplete_training_window(ctx, timedelta(days=target_window_days)):
        return None

    # 1. Build installment DataFrame with computed due dates
    due_in_days = pd.to_timedelta(
        np.asarray(future[REMINDERS_DATA_SOURCE]["due_in_days"], dtype=int),
        unit="D",
    )
    reminders_ts = pd.to_datetime(
        future[REMINDERS_DATA_SOURCE].timestamps, unit="s"
    )
    installments = pd.DataFrame({
        "loan_id": future[REMINDERS_DATA_SOURCE]["loan_id"],
        "due_date": reminders_ts + due_in_days,
    })

    if installments.empty:
        return None

    # 2. Filter to target window and deduplicate
    window_end = pd.to_datetime(ctx[SPLIT_TIMESTAMP], unit="s") + pd.to_timedelta(
        target_window_days, unit="D"
    )
    installments = installments[installments["due_date"] < window_end]
    installments = (
        installments.drop_duplicates()
        .sort_values(["loan_id", "due_date"])
        .reset_index(drop=True)
    )

    # 3. Build repayments DataFrame
    transactions = future[TRANSACTIONS_DATA_SOURCE].interval_from(
        start=ctx[SPLIT_TIMESTAMP],
        interval_length=timedelta(days=target_window_days),
    )
    repayments_filtered = transactions.filter(
        by="trans_type", condition=lambda v: v == "repayment"
    )
    repayments = pd.DataFrame({
        "loan_id": pd.Series(repayments_filtered["loan_id"]),
        "timestamp": pd.to_datetime(repayments_filtered.timestamps, unit="s"),
    }).sort_values(["loan_id", "timestamp"])

    # 4. One-to-one pairing using cumcount
    installments["installment_index"] = installments.groupby("loan_id").cumcount()
    repayments["installment_index"] = repayments.groupby("loan_id").cumcount()

    paired = installments.merge(
        repayments[["loan_id", "installment_index", "timestamp"]],
        on=["loan_id", "installment_index"],
        how="left",
    )

    # 5. Count missed deadlines (no payment or payment > grace period)
    missed = paired["timestamp"].isna() | (
        paired["timestamp"] > paired["due_date"] + pd.Timedelta(days=GRACE_PERIOD_DAYS)
    )

    return np.array([int(missed.sum() > MAX_MISSED_THRESHOLD)], dtype=np.float32)

Step-by-Step Breakdown

① Build installment table with due dates

Python
due_in_days = pd.to_timedelta(
    np.asarray(future[REMINDERS_DATA_SOURCE]["due_in_days"], dtype=int),
    unit="D",
)
reminders_ts = pd.to_datetime(
    future[REMINDERS_DATA_SOURCE].timestamps, unit="s"
)
installments = pd.DataFrame({
    "loan_id": future[REMINDERS_DATA_SOURCE]["loan_id"],
    "due_date": reminders_ts + due_in_days,
})

Each reminder event contains a due_in_days field indicating how many days until the installment is due. The actual due date is computed by adding this offset to the reminder timestamp, producing a DataFrame with one row per installment deadline.

② Filter to window and deduplicate

Python
window_end = pd.to_datetime(ctx[SPLIT_TIMESTAMP], unit="s") + pd.to_timedelta(
    target_window_days, unit="D"
)
installments = installments[installments["due_date"] < window_end]
installments = (
    installments.drop_duplicates()
    .sort_values(["loan_id", "due_date"])
    .reset_index(drop=True)
)

Installments are filtered to the 6-month target window and deduplicated. Sorting by loan ID and due date ensures consistent ordering for the one-to-one pairing step.

③ Build repayments table

Python
transactions = future[TRANSACTIONS_DATA_SOURCE].interval_from(
    start=ctx[SPLIT_TIMESTAMP],
    interval_length=timedelta(days=target_window_days),
)
repayments_filtered = transactions.filter(
    by="trans_type", condition=lambda v: v == "repayment"
)
repayments = pd.DataFrame({
    "loan_id": pd.Series(repayments_filtered["loan_id"]),
    "timestamp": pd.to_datetime(repayments_filtered.timestamps, unit="s"),
}).sort_values(["loan_id", "timestamp"])

Repayment transactions are extracted from the same time window and converted to a DataFrame. Only transactions with type "repayment" are kept, sorted by loan ID and timestamp to align with the installment ordering.

④ Pair using cumcount merge

Python
installments["installment_index"] = installments.groupby("loan_id").cumcount()
repayments["installment_index"] = repayments.groupby("loan_id").cumcount()

paired = installments.merge(
    repayments[["loan_id", "installment_index", "timestamp"]],
    on=["loan_id", "installment_index"],
    how="left",
)

cumcount assigns a sequential index within each loan group, creating a natural one-to-one pairing between the nth installment deadline and the nth repayment for each loan. A left merge ensures unpaid installments appear as NaN timestamps.

⑤ Apply miss logic with grace period

Python
missed = paired["timestamp"].isna() | (
    paired["timestamp"] > paired["due_date"] + pd.Timedelta(days=GRACE_PERIOD_DAYS)
)

return np.array([int(missed.sum() > MAX_MISSED_THRESHOLD)], dtype=np.float32)

An installment is considered missed if no repayment was made (NaN) or if the repayment arrived more than 3 days after the due date. The customer is labeled positive if more than 3 installments were missed.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, BinaryClassificationTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=BinaryClassificationTask(),
    target_fn=installment_defaults_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
        MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
        MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="recall", metric_name="Recall"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
AUROC Measures overall ranking quality.
AUPRC More informative when the positive class is rare.
Recall Proportion of actual positives caught.
Precision Proportion of predicted positives that are correct.
F1 Score Harmonic mean of precision and recall.

Production Tips

  1. Adjust the grace period to match your business rules. 3 days is a common default, but some lenders allow longer grace periods before marking a payment as late. Align the threshold with your internal collections policy.
  2. Handle partial repayments explicitly. This implementation assumes each repayment covers one full installment. If partial payments are common, consider matching on cumulative amounts rather than event counts.
  3. Consider loan types separately. Mortgages, personal loans, and credit card installments have very different default patterns. Training separate models or adding loan type as a feature can improve accuracy.
  4. Monitor for data freshness issues. Repayment events may arrive with delays due to bank processing times. Ensure your data pipeline accounts for settlement lag to avoid false positives.
  5. Validate the cumcount pairing assumption. The one-to-one pairing works when repayments are made in installment order. If customers can pay installments out of sequence, a more sophisticated matching algorithm may be needed.