Regression

When to use

Predict continuous numeric values — how much will the customer spend, how many items will they buy, how many days until their next visit?

For more use cases and complete solutions, see Regression Recipes.

Training script

ImportsTarget functionTraining configRun training

Python

import numpy as np
from pathlib import Path

from monad.ui.module import RegressionTask
from monad.ui.config import TrainingParams
from monad.ui.module import load_from_foundation_model

The only task-specific import is RegressionTask. The remaining imports (numpy, Path, TrainingParams, load_from_foundation_model) are common to every training script.

Return a np.float32 array with shape=(num_targets,), or None to exclude the entity.

Python

return np.array([total_spend], dtype=np.float32)

Zero is a valid return value (e.g. zero spend). Use None only when the entity should be excluded entirely.

See Target Function for the full data-access API and Target Examples → Regression for ready-made patterns.

Every training script requires two configuration objects: a task that defines the prediction type, and TrainingParams that control metrics and training behavior.

Task declaration

Python

task = RegressionTask(
    num_targets=1,
    max_value=10000.0,
)

Parameter	Required	Default	Description
`num_targets`	No	`1`	Number of values to predict simultaneously
`num_bins`	No	—	Discretisation bins (can improve training stability)
`max_value`	No	`None`	Cap on predicted values; helps prevent extreme outputs

Training parameters

Configure training with TrainingParams. At minimum, provide the checkpoint directory:

Python

training_params = TrainingParams(
    checkpoint_dir=scenario_model_path,
)

For all available options and their defaults, see Training Parameters. The default metrics for this task are:

Metric	Alias	Monitoring
`MeanSquaredError(squared=False)` (RMSE)	`val_loss`	Minimize

To add or replace metrics, see Custom Metrics.

Python

trainer = load_from_foundation_model(
    checkpoint_path="./foundation_model",
    downstream_task=task,
    target_fn=my_target_fn,
    training_params=TrainingParams(...),
)
trainer.fit()

See Scenario Model reference for all load_from_foundation_model options.

Full example

Complete training script from the onboarding package — adapt the paths, target column, and window length to your data:

Python

from datetime import timedelta
from pathlib import Path
from typing import Dict

import numpy as np

from monad.batch import SPLIT_TIMESTAMP
from monad.ui.config import TrainingParams
from monad.ui.module import RegressionTask, load_from_foundation_model
from monad.ui.target_function import Attributes, Events, has_incomplete_training_window


# --- Names & Paths -----------------------------------------------------------

# EDIT: provide path to project directory, PARENT to /fm, /features, /lightning_checkpoints etc.
project_dir = Path("/basemodel/projects/project_dir").resolve()
# EDIT: define name for scenario checkpoints directory; the script will put it under the same parent directory as fm
scenario_name = "scenario_name"

# creating the relative paths
foundation_model_path = project_dir / "fm"
scenario_model_path = project_dir / "scenarios" / scenario_name


# --- Target Definition -------------------------------------------------------

# regression definition, for reference:
# the target is a continuous value computed from events in the future window
# here: sum of TARGET_COLUMN over TARGET_EVENT_TABLE within TARGET_WINDOW_DAYS
# the model predicts this future value

# EDIT: target details
TARGET_EVENT_TABLE = "transactions"  # event data source where the target is calculated
TARGET_COLUMN = "price"  # column with values to aggregate (e.g. spend)
TARGET_WINDOW_DAYS = 60  # future window length (here: days)

# EDIT: number of values predicted per entity (use > 1 for multi-target regression)
num_targets = 1


def target_fn(_history: Events, future: Events, _entity: Attributes, _ctx: Dict) -> np.ndarray | None:

    # filters out users with too short remaining window
    if has_incomplete_training_window(_ctx, timedelta(days=TARGET_WINDOW_DAYS)):
        return None

    # trims the future to desired window
    future_window = future.interval_from(
        _ctx[SPLIT_TIMESTAMP],
        timedelta(days=TARGET_WINDOW_DAYS),
    )

    # regression target: total spend in the future window
    y = np.sum(future_window[TARGET_EVENT_TABLE][TARGET_COLUMN].events)

    return np.array([y], dtype=np.float32)


# --- Training ----------------------------------------------------------------

# EDIT: metaparams - keep default unless experimenting
learning_rate = 3e-5
epochs = 3  # use 1 for smoke test

# EDIT: limited runs - use for smoke test, then comment out here and below
limit_train_batches = 5
limit_val_batches = 5

# EDIT: parallelised training - comment out to default to a single GPU
strategy = "ddp"
devices = [0, 1] # list GPU indices

# For more options refer to docs: https://docs.basemodel.ai/reference/trainingparams

task = RegressionTask(num_targets=num_targets)

training_params = TrainingParams(
    checkpoint_dir=scenario_model_path,
    learning_rate=learning_rate,
    epochs=epochs,
    devices=devices,
    strategy=strategy,
    limit_train_batches=limit_train_batches,  # smoke test, comment out for full runs
    limit_val_batches=limit_val_batches,  # smoke test, comment out for full runs
)

trainer = load_from_foundation_model(
    checkpoint_path=foundation_model_path,
    downstream_task=task,
    target_fn=target_fn,
)

trainer.fit(training_params=training_params, overwrite=True)  # replace with resume=True for resumed training

This script is part of the onboarding package shipped with every BaseModel installation.