Skip to content

Loan Application Propensity

Task type: MultilabelClassificationTask Industry: Banking / Financial Services

This recipe scores the propensity of each customer to apply for various loan products — personal loans, mortgages, auto loans, credit lines, and small business loans — within a defined future window. The output is a binary vector indicating which loan types the customer is likely to apply for, enabling cross-sell campaigns and proactive product recommendations.

Why multilabel? A single customer can apply for multiple loan types simultaneously (e.g., an auto loan and a credit line). Multilabel classification handles this naturally, producing an independent yes/no prediction per loan type.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes a loan_applications data source (or equivalent) with a column identifying the loan type (e.g., loan_type).
  • The monad library installed in your environment (for Python App).

Target Function

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

For multilabel classification, the function must return one of:

  • A 1-D float32 array of size num_labels — binary indicators (0 or 1) per loan type.
  • Noneexclude this customer from training.

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window



# === Configuration ===
TARGET_WINDOW_DAYS = 60              # Prediction horizon in days
APPLICATION_DATA_SOURCE = "loan_applications"
LOAN_TYPE_COLUMN = "loan_type"
TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]


def loan_propensity_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Score propensity to apply for each loan type (1 = applied, 0 = did not)."""

    # 1. Ensure the training window is long enough
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if has_incomplete_training_window(ctx, target_window):
        return None

    # 2. Trim future events to the target window
    future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)

    # 3. Check which loan types the customer applied for
    loan_labels, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    return loan_labels
Python
def loan_propensity_target_fn(
    history: target_function.Events,
    future: target_function.Events,
    attributes: target_function.Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Score propensity to apply for each loan type (1 = applied, 0 = did not)."""

    # === Configuration ===
    TARGET_WINDOW_DAYS = 60              # Prediction horizon in days
    APPLICATION_DATA_SOURCE = "loan_applications"
    LOAN_TYPE_COLUMN = "loan_type"
    TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]

    # 1. Ensure the training window is long enough
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if target_function.has_incomplete_training_window(ctx, target_window):
        return None

    # 2. Trim future events to the target window
    future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

    # 3. Check which loan types the customer applied for
    loan_labels, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    return loan_labels

Step-by-Step Breakdown

① Validate the training window

Python
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if has_incomplete_training_window(ctx, target_window):
    return None
Python
target_window = timedelta(days=TARGET_WINDOW_DAYS)
if target_function.has_incomplete_training_window(ctx, target_window):
    return None

Uses a 60-day window — longer than typical retail recipes because loan decisions take weeks. Skips samples with insufficient future data.

② Trim future events

Python
future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)
Python
future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

Narrows future events to exactly 60 days for a consistent horizon.

③ Detect loan applications per type

Python
loan_labels, _ = (
    future[APPLICATION_DATA_SOURCE]
    .groupBy(LOAN_TYPE_COLUMN)
    .exists(groups=TARGET_LOAN_TYPES)
)

This is the core logic:

  • groupBy(LOAN_TYPE_COLUMN) groups future loan application events by type.
  • .exists(groups=TARGET_LOAN_TYPES) returns a binary array: 1 if the group has at least one event, 0 otherwise.
  • The return type is a tuple (np.ndarray, List[str]). We take only the array.
  • Example output: [1, 0, 1, 0, 0] means the customer applied for Personal and Auto loans.

Note: groupBy().exists() returns a float64 array. The Task layer accepts it as-is — no manual astype(np.float32) is required.


Training

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, MultilabelClassificationTask


module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=MultilabelClassificationTask(class_names=TARGET_LOAN_TYPES),
    target_fn=loan_propensity_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
        MetricParams(alias="f1", metric_name="F1Score", kwargs={"task": "multilabel", "num_labels": <num_labels>}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="f1", metric_name="F1Score"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Variations

Exclude existing loan holders

Only predict for customers who do not already hold a given loan type:

Python
def loan_propensity_target_fn(
    history: Events, future: Events, attributes: Attributes, ctx: Dict
) -> np.ndarray | None:
    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if has_incomplete_training_window(ctx, target_window):
        return None
    future = future.interval_from(ctx[SPLIT_TIMESTAMP], target_window)

    # Check existing loans from history
    existing_loans, _ = (
        history[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Check future applications
    future_loans, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Only predict for loan types not already held
    # Mask existing loans to -1 (or handle via masking in task)
    result = future_loans.copy()
    result[existing_loans == 1] = 0  # Ignore cross-sell for existing products
    return result
Python
def loan_propensity_target_fn(
    history: target_function.Events,
    future: target_function.Events,
    attributes: target_function.Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    TARGET_WINDOW_DAYS = 60
    APPLICATION_DATA_SOURCE = "loan_applications"
    LOAN_TYPE_COLUMN = "loan_type"
    TARGET_LOAN_TYPES = ["Personal", "Mortgage", "Auto", "Credit_Line", "Small_Business"]

    target_window = timedelta(days=TARGET_WINDOW_DAYS)
    if target_function.has_incomplete_training_window(ctx, target_window):
        return None
    future = future.interval_from(ctx[target_function.SPLIT_TIMESTAMP], target_window)

    # Check existing loans from history
    existing_loans, _ = (
        history[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Check future applications
    future_loans, _ = (
        future[APPLICATION_DATA_SOURCE]
        .groupBy(LOAN_TYPE_COLUMN)
        .exists(groups=TARGET_LOAN_TYPES)
    )

    # Only predict for loan types not already held
    result = future_loans.copy()
    result[existing_loans == 1] = 0
    return result

Metric Why it matters
AUROC (per label) Ranking quality for each loan type independently.
AUPRC (per label) Better than AUROC when applications for a loan type are rare.
F1 Score (micro) Overall balance across all labels combined.
Hamming Loss Fraction of labels that are incorrectly predicted — lower is better.

Production Tips

  1. Threshold per loan type. Each loan product has different conversion rates and profit margins. Tune decision thresholds independently rather than using a single global threshold.

  2. Respect eligibility rules. Filter predictions by credit score, income, or other eligibility criteria before surfacing to advisors — the model predicts intent, not eligibility.

  3. Time the outreach. A 60-day window gives your sales team ample time to engage. For more urgent products (e.g., credit lines), consider a shorter window.

  4. Retrain after product launches. Adding a new loan product requires updating TARGET_LOAN_TYPES and retraining to capture the new pattern.