Predict Installment Payment Defaults
Task type: BinaryClassificationTask
Industry: Finance / Banking
Early detection of installment payment defaults allows credit teams to intervene before a borrower falls into serious arrears. By predicting which customers will miss multiple deadlines, lenders can offer restructuring options, adjust collection strategies, or tighten credit limits proactively — reducing losses while maintaining customer relationships.
What makes this advanced? Heavy pandas integration — uses DataFrame merge, cumcount for one-to-one pairing of installments with repayments, and timedelta comparisons for grace period logic.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
reminder_log,transactions
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
The function must return one of:
np.array([1], dtype=np.float32)— positive casenp.array([0], dtype=np.float32)— negative caseNone— exclude this entity from training
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
import pandas as pd
# === Configuration ===
TARGET_WINDOW_MONTHS = 6
MAX_MISSED_THRESHOLD = 3
GRACE_PERIOD_DAYS = 3
REMINDERS_DATA_SOURCE = "reminder_log"
TRANSACTIONS_DATA_SOURCE = "transactions"
def installment_defaults_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Predict if customer misses >3 installment deadlines in 6 months."""
target_window_days = 30 * TARGET_WINDOW_MONTHS
if has_incomplete_training_window(ctx, timedelta(days=target_window_days)):
return None
# 1. Build installment DataFrame with computed due dates
due_in_days = pd.to_timedelta(
np.asarray(future[REMINDERS_DATA_SOURCE]["due_in_days"], dtype=int),
unit="D",
)
reminders_ts = pd.to_datetime(
future[REMINDERS_DATA_SOURCE].timestamps, unit="s"
)
installments = pd.DataFrame({
"loan_id": future[REMINDERS_DATA_SOURCE]["loan_id"],
"due_date": reminders_ts + due_in_days,
})
if installments.empty:
return None
# 2. Filter to target window and deduplicate
window_end = pd.to_datetime(ctx[SPLIT_TIMESTAMP], unit="s") + pd.to_timedelta(
target_window_days, unit="D"
)
installments = installments[installments["due_date"] < window_end]
installments = (
installments.drop_duplicates()
.sort_values(["loan_id", "due_date"])
.reset_index(drop=True)
)
# 3. Build repayments DataFrame
transactions = future[TRANSACTIONS_DATA_SOURCE].interval_from(
start=ctx[SPLIT_TIMESTAMP],
interval_length=timedelta(days=target_window_days),
)
repayments_filtered = transactions.filter(
by="trans_type", condition=lambda v: v == "repayment"
)
repayments = pd.DataFrame({
"loan_id": pd.Series(repayments_filtered["loan_id"]),
"timestamp": pd.to_datetime(repayments_filtered.timestamps, unit="s"),
}).sort_values(["loan_id", "timestamp"])
# 4. One-to-one pairing using cumcount
installments["installment_index"] = installments.groupby("loan_id").cumcount()
repayments["installment_index"] = repayments.groupby("loan_id").cumcount()
paired = installments.merge(
repayments[["loan_id", "installment_index", "timestamp"]],
on=["loan_id", "installment_index"],
how="left",
)
# 5. Count missed deadlines (no payment or payment > grace period)
missed = paired["timestamp"].isna() | (
paired["timestamp"] > paired["due_date"] + pd.Timedelta(days=GRACE_PERIOD_DAYS)
)
return np.array([int(missed.sum() > MAX_MISSED_THRESHOLD)], dtype=np.float32)
Step-by-Step Breakdown
① Build installment table with due dates
due_in_days = pd.to_timedelta(
np.asarray(future[REMINDERS_DATA_SOURCE]["due_in_days"], dtype=int),
unit="D",
)
reminders_ts = pd.to_datetime(
future[REMINDERS_DATA_SOURCE].timestamps, unit="s"
)
installments = pd.DataFrame({
"loan_id": future[REMINDERS_DATA_SOURCE]["loan_id"],
"due_date": reminders_ts + due_in_days,
})
Each reminder event contains a due_in_days field indicating how many days until the installment is due. The actual due date is computed by adding this offset to the reminder timestamp, producing a DataFrame with one row per installment deadline.
② Filter to window and deduplicate
window_end = pd.to_datetime(ctx[SPLIT_TIMESTAMP], unit="s") + pd.to_timedelta(
target_window_days, unit="D"
)
installments = installments[installments["due_date"] < window_end]
installments = (
installments.drop_duplicates()
.sort_values(["loan_id", "due_date"])
.reset_index(drop=True)
)
Installments are filtered to the 6-month target window and deduplicated. Sorting by loan ID and due date ensures consistent ordering for the one-to-one pairing step.
③ Build repayments table
transactions = future[TRANSACTIONS_DATA_SOURCE].interval_from(
start=ctx[SPLIT_TIMESTAMP],
interval_length=timedelta(days=target_window_days),
)
repayments_filtered = transactions.filter(
by="trans_type", condition=lambda v: v == "repayment"
)
repayments = pd.DataFrame({
"loan_id": pd.Series(repayments_filtered["loan_id"]),
"timestamp": pd.to_datetime(repayments_filtered.timestamps, unit="s"),
}).sort_values(["loan_id", "timestamp"])
Repayment transactions are extracted from the same time window and converted to a DataFrame. Only transactions with type "repayment" are kept, sorted by loan ID and timestamp to align with the installment ordering.
④ Pair using cumcount merge
installments["installment_index"] = installments.groupby("loan_id").cumcount()
repayments["installment_index"] = repayments.groupby("loan_id").cumcount()
paired = installments.merge(
repayments[["loan_id", "installment_index", "timestamp"]],
on=["loan_id", "installment_index"],
how="left",
)
cumcount assigns a sequential index within each loan group, creating a natural one-to-one pairing between the nth installment deadline and the nth repayment for each loan. A left merge ensures unpaid installments appear as NaN timestamps.
⑤ Apply miss logic with grace period
missed = paired["timestamp"].isna() | (
paired["timestamp"] > paired["due_date"] + pd.Timedelta(days=GRACE_PERIOD_DAYS)
)
return np.array([int(missed.sum() > MAX_MISSED_THRESHOLD)], dtype=np.float32)
An installment is considered missed if no repayment was made (NaN) or if the repayment arrived more than 3 days after the due date. The customer is labeled positive if more than 3 installments were missed.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, BinaryClassificationTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=BinaryClassificationTask(),
target_fn=installment_defaults_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
],
metric_to_monitor="val_auroc_0",
metric_monitoring_mode=MetricMonitoringMode.MAX,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC"),
MetricParams(alias="auprc", metric_name="AveragePrecision"),
MetricParams(alias="recall", metric_name="Recall"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| AUROC | Measures overall ranking quality. |
| AUPRC | More informative when the positive class is rare. |
| Recall | Proportion of actual positives caught. |
| Precision | Proportion of predicted positives that are correct. |
| F1 Score | Harmonic mean of precision and recall. |
Production Tips
- Adjust the grace period to match your business rules. 3 days is a common default, but some lenders allow longer grace periods before marking a payment as late. Align the threshold with your internal collections policy.
- Handle partial repayments explicitly. This implementation assumes each repayment covers one full installment. If partial payments are common, consider matching on cumulative amounts rather than event counts.
- Consider loan types separately. Mortgages, personal loans, and credit card installments have very different default patterns. Training separate models or adding loan type as a feature can improve accuracy.
- Monitor for data freshness issues. Repayment events may arrive with delays due to bank processing times. Ensure your data pipeline accounts for settlement lag to avoid false positives.
- Validate the cumcount pairing assumption. The one-to-one pairing works when repayments are made in installment order. If customers can pay installments out of sequence, a more sophisticated matching algorithm may be needed.