Loan Application Propensity

Task type: MultilabelClassificationTask Industry: Banking / Financial Services

This recipe scores the propensity of each customer to apply for various loan products — personal loans, mortgages, auto loans, credit lines, and small business loans — within a defined future window. The output is a binary vector indicating which loan types the customer is likely to apply for, enabling cross-sell campaigns and proactive product recommendations.

Why multilabel? A single customer can apply for multiple loan types simultaneously (e.g., an auto loan and a credit line). Multilabel classification handles this naturally, producing an independent yes/no prediction per loan type.

Prerequisites

Before writing a target function you need:

A trained foundation model built on event data that includes a loan_applications data source (or equivalent) with a column identifying the loan type (e.g., loan_type).
The monad library installed in your environment (for Python App).

Target Function

Argument	Type	Description
`history`	`Events`	All events before the temporal split.
`future`	`Events`	All events after the temporal split.
`attributes`	`Attributes`	Static entity attributes.
`ctx`	`Dict`	Context dictionary containing `SPLIT_TIMESTAMP`, data mode, etc.

For multilabel classification, the function must return one of:

A 1-D float32 array of size num_labels — binary indicators (0 or 1) per loan type.
None — exclude this customer from training.

Full Example

Python AppGUI App

Python

import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window



# === Configuration ===
TARGET_WINDOW_DAYS = 60              # Prediction horizon in days
APPLICATION_DATA_SOURCE = "loan_applications"
LOAN_TYPE_COLUMN = "loan_type"
TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]


def loan_propensity_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Score propensity to apply for each loan type (1 = applied, 0 = did not)."""

    # 1. Ensure the training window is long enough
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if has_incomplete_training_window(ctx, target_window):
        return None

    # 2. Trim future events to the target window
    future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)

    # 3. Check which loan types the customer applied for
    loan_labels, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    return loan_labels

Python

def loan_propensity_target_fn(
    history: target_function.Events,
    future: target_function.Events,
    attributes: target_function.Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Score propensity to apply for each loan type (1 = applied, 0 = did not)."""

    # === Configuration ===
    TARGET_WINDOW_DAYS = 60              # Prediction horizon in days
    APPLICATION_DATA_SOURCE = "loan_applications"
    LOAN_TYPE_COLUMN = "loan_type"
    TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]

    # 1. Ensure the training window is long enough
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if target_function.has_incomplete_training_window(ctx, target_window):
        return None

    # 2. Trim future events to the target window
    future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

    # 3. Check which loan types the customer applied for
    loan_labels, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    return loan_labels

Step-by-Step Breakdown

① Validate the training window

Python AppGUI App

Python

target_window = timedelta(days=TARGET_WINDOW_DAYS)
if has_incomplete_training_window(ctx, target_window):
    return None

Python

target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
    return None

Uses a 60-day window — longer than typical retail recipes because loan decisions take weeks. Skips samples with insufficient future data.

② Trim future events

Python AppGUI App

Python

future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)

Python

future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

Narrows future events to exactly 60 days for a consistent horizon.

③ Detect loan applications per type

Python

loan_labels, _ = (
    future[APPLICATION_DATA_SOURCE]
    .groupBy(LOAN_TYPE_COLUMN)
    .exists(groups=TARGET_LOAN_TYPES)
)

This is the core logic:

groupBy(LOAN_TYPE_COLUMN) groups future loan application events by type.
.exists(groups=TARGET_LOAN_TYPES) returns a binary array: 1 if the group has at least one event, 0 otherwise.
The return type is a tuple (np.ndarray, List[str]). We take only the array.
Example output: [1, 0, 1, 0, 0] means the customer applied for Personal and Auto loans.

Note: groupBy().exists() returns a float64 array. The Task layer accepts it as-is — no manual astype(np.float32) is required.

Training

Python

from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, MultilabelClassificationTask


module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=MultilabelClassificationTask(class_names=TARGET_LOAN_TYPES),
    target_fn=loan_propensity_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
        MetricParams(alias="f1", metric_name="F1Score", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python

from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="f1", metric_name="F1Score"),
    ],
)

results = module.test(testing_params)

Prediction

Python

from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Variations

Exclude existing loan holders

Only predict for customers who do not already hold a given loan type:

Python AppGUI App

Python

def loan_propensity_target_fn(
    history: Events, future: Events, attributes: Attributes, ctx: Dict
) -> np.ndarray | None:
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if has_incomplete_training_window(ctx, target_window):
        return None
    future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)

    # Check existing loans from history
    existing_loans, _ = (
        history[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Check future applications
    future_loans, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Only predict for loan types not already held
    # Mask existing loans to -1 (or handle via masking in task)
    result = future_loans.copy()
    result[existing_loans == 1] = 0  # Ignore cross-sell for existing products
    return result

Python

def loan_propensity_target_fn(
    history: target_function.Events,
    future: target_function.Events,
    attributes: target_function.Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    TARGET_WINDOW_DAYS = 60
    APPLICATION_DATA_SOURCE = "loan_applications"
    LOAN_TYPE_COLUMN = "loan_type"
    TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]

    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if target_function.has_incomplete_training_window(ctx, target_window):
        return None
    future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

    # Check existing loans from history
    existing_loans, _ = (
        history[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Check future applications
    future_loans, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Only predict for loan types not already held
    result = future_loans.copy()
    result[existing_loans == 1] = 0
    return result

Recommended Metrics

Metric	Why it matters
AUROC (per label)	Ranking quality for each loan type independently.
AUPRC (per label)	Better than AUROC when applications for a loan type are rare.
F1 Score (micro)	Overall balance across all labels combined.
Hamming Loss	Fraction of labels that are incorrectly predicted — lower is better.

Production Tips

Threshold per loan type. Each loan product has different conversion rates and profit margins. Tune decision thresholds independently rather than using a single global threshold.
Respect eligibility rules. Filter predictions by credit score, income, or other eligibility criteria before surfacing to advisors — the model predicts intent, not eligibility.
Time the outreach. A 60-day window gives your sales team ample time to engage. For more urgent products (e.g., credit lines), consider a shorter window.
Retrain after product launches. Adding a new loan product requires updating TARGET_LOAN_TYPES and retraining to capture the new pattern.