Loan Application Propensity
Task type: MultilabelClassificationTask
Industry: Banking / Financial Services
This recipe scores the propensity of each customer to apply for various loan products — personal loans, mortgages, auto loans, credit lines, and small business loans — within a defined future window. The output is a binary vector indicating which loan types the customer is likely to apply for, enabling cross-sell campaigns and proactive product recommendations.
Why multilabel? A single customer can apply for multiple loan types simultaneously (e.g., an auto loan and a credit line). Multilabel classification handles this naturally, producing an independent yes/no prediction per loan type.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes a
loan_applicationsdata source (or equivalent) with a column identifying the loan type (e.g.,loan_type). - The monad library installed in your environment (for Python App).
Target Function
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
For multilabel classification, the function must return one of:
- A 1-D
float32array of sizenum_labels— binary indicators (0or1) per loan type. None— exclude this customer from training.
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
# === Configuration ===
TARGET_WINDOW_DAYS = 60 # Prediction horizon in days
APPLICATION_DATA_SOURCE = "loan_applications"
LOAN_TYPE_COLUMN = "loan_type"
TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]
def loan_propensity_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Score propensity to apply for each loan type (1 = applied, 0 = did not)."""
# 1. Ensure the training window is long enough
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if has_incomplete_training_window(ctx, target_window):
return None
# 2. Trim future events to the target window
future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)
# 3. Check which loan types the customer applied for
loan_labels, _ = (
future[APPLICATION_DATA_SOURCE]
.groupBy(LOAN_TYPE_COLUMN)
.exists(groups=TARGET_LOAN_TYPES)
)
return loan_labels
def loan_propensity_target_fn(
history: target_function.Events,
future: target_function.Events,
attributes: target_function.Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Score propensity to apply for each loan type (1 = applied, 0 = did not)."""
# === Configuration ===
TARGET_WINDOW_DAYS = 60 # Prediction horizon in days
APPLICATION_DATA_SOURCE = "loan_applications"
LOAN_TYPE_COLUMN = "loan_type"
TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]
# 1. Ensure the training window is long enough
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
return None
# 2. Trim future events to the target window
future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)
# 3. Check which loan types the customer applied for
loan_labels, _ = (
future[APPLICATION_DATA_SOURCE]
.groupBy(LOAN_TYPE_COLUMN)
.exists(groups=TARGET_LOAN_TYPES)
)
return loan_labels
Step-by-Step Breakdown
① Validate the training window
Uses a 60-day window — longer than typical retail recipes because loan decisions take weeks. Skips samples with insufficient future data.
② Trim future events
Narrows future events to exactly 60 days for a consistent horizon.
③ Detect loan applications per type
loan_labels, _ = (
future[APPLICATION_DATA_SOURCE]
.groupBy(LOAN_TYPE_COLUMN)
.exists(groups=TARGET_LOAN_TYPES)
)
This is the core logic:
groupBy(LOAN_TYPE_COLUMN)groups future loan application events by type..exists(groups=TARGET_LOAN_TYPES)returns a binary array:1if the group has at least one event,0otherwise.- The return type is a tuple
(np.ndarray, List[str]). We take only the array. - Example output:
[1, 0, 1, 0, 0]means the customer applied for Personal and Auto loans.
Note:
groupBy().exists()returns afloat64array. The Task layer accepts it as-is — no manualastype(np.float32)is required.
Training
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, MultilabelClassificationTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=MultilabelClassificationTask(class_names=TARGET_LOAN_TYPES),
target_fn=loan_propensity_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
MetricParams(alias="f1", metric_name="F1Score", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
],
metric_to_monitor="val_auroc_0",
metric_monitoring_mode=MetricMonitoringMode.MAX,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC"),
MetricParams(alias="auprc", metric_name="AveragePrecision"),
MetricParams(alias="f1", metric_name="F1Score"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Variations
Exclude existing loan holders
Only predict for customers who do not already hold a given loan type:
def loan_propensity_target_fn(
history: Events, future: Events, attributes: Attributes, ctx: Dict
) -> np.ndarray | None:
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if has_incomplete_training_window(ctx, target_window):
return None
future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)
# Check existing loans from history
existing_loans, _ = (
history[APPLICATION_DATA_SOURCE]
.groupBy(LOAN_TYPE_COLUMN)
.exists(groups=TARGET_LOAN_TYPES)
)
# Check future applications
future_loans, _ = (
future[APPLICATION_DATA_SOURCE]
.groupBy(LOAN_TYPE_COLUMN)
.exists(groups=TARGET_LOAN_TYPES)
)
# Only predict for loan types not already held
# Mask existing loans to -1 (or handle via masking in task)
result = future_loans.copy()
result[existing_loans == 1] = 0 # Ignore cross-sell for existing products
return result
def loan_propensity_target_fn(
history: target_function.Events,
future: target_function.Events,
attributes: target_function.Attributes,
ctx: Dict,
) -> np.ndarray | None:
TARGET_WINDOW_DAYS = 60
APPLICATION_DATA_SOURCE = "loan_applications"
LOAN_TYPE_COLUMN = "loan_type"
TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
return None
future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)
# Check existing loans from history
existing_loans, _ = (
history[APPLICATION_DATA_SOURCE]
.groupBy(LOAN_TYPE_COLUMN)
.exists(groups=TARGET_LOAN_TYPES)
)
# Check future applications
future_loans, _ = (
future[APPLICATION_DATA_SOURCE]
.groupBy(LOAN_TYPE_COLUMN)
.exists(groups=TARGET_LOAN_TYPES)
)
# Only predict for loan types not already held
result = future_loans.copy()
result[existing_loans == 1] = 0
return result
Recommended Metrics
| Metric | Why it matters |
|---|---|
| AUROC (per label) | Ranking quality for each loan type independently. |
| AUPRC (per label) | Better than AUROC when applications for a loan type are rare. |
| F1 Score (micro) | Overall balance across all labels combined. |
| Hamming Loss | Fraction of labels that are incorrectly predicted — lower is better. |
Production Tips
-
Threshold per loan type. Each loan product has different conversion rates and profit margins. Tune decision thresholds independently rather than using a single global threshold.
-
Respect eligibility rules. Filter predictions by credit score, income, or other eligibility criteria before surfacing to advisors — the model predicts intent, not eligibility.
-
Time the outreach. A 60-day window gives your sales team ample time to engage. For more urgent products (e.g., credit lines), consider a shorter window.
-
Retrain after product launches. Adding a new loan product requires updating
TARGET_LOAN_TYPESand retraining to capture the new pattern.