Regression
When to use
Predict continuous numeric values — how much will the customer spend, how many items will they buy, how many days until their next visit?
For more use cases and complete solutions, see Regression Recipes.
Training script
import numpy as np
from pathlib import Path
from monad.ui.module import RegressionTask
from monad.ui.config import TrainingParams
from monad.ui.module import load_from_foundation_model
The only task-specific import is RegressionTask. The remaining imports (numpy, Path, TrainingParams, load_from_foundation_model) are common to every training script.
Return a np.float32 array with shape=(num_targets,), or None to exclude the entity.
Zero is a valid return value (e.g. zero spend). Use None only when the entity should be excluded entirely.
See Target Function for the full data-access API and Target Examples → Regression for ready-made patterns.
Every training script requires two configuration objects: a task that defines the prediction type, and TrainingParams that control metrics and training behavior.
Task declaration
| Parameter | Required | Default | Description |
|---|---|---|---|
num_targets |
No | 1 |
Number of values to predict simultaneously |
num_bins |
No | — | Discretisation bins (can improve training stability) |
max_value |
No | None |
Cap on predicted values; helps prevent extreme outputs |
Training parameters
Configure training with TrainingParams. At minimum, provide the checkpoint directory:
For all available options and their defaults, see Training Parameters. The default metrics for this task are:
| Metric | Alias | Monitoring |
|---|---|---|
MeanSquaredError(squared=False) (RMSE) |
val_loss |
Minimize |
To add or replace metrics, see Custom Metrics.
trainer = load_from_foundation_model(
checkpoint_path="./foundation_model",
downstream_task=task,
target_fn=my_target_fn,
training_params=TrainingParams(...),
)
trainer.fit()
See Scenario Model reference for all load_from_foundation_model options.
Full example
Complete training script from the onboarding package — adapt the paths, target column, and window length to your data:
from datetime import timedelta
from pathlib import Path
from typing import Dict
import numpy as np
from monad.batch import SPLIT_TIMESTAMP
from monad.ui.config import TrainingParams
from monad.ui.module import RegressionTask, load_from_foundation_model
from monad.ui.target_function import Attributes, Events, has_incomplete_training_window
# --- Names & Paths -----------------------------------------------------------
# EDIT: provide path to project directory, PARENT to /fm, /features, /lightning_checkpoints etc.
project_dir = Path("/basemodel/projects/project_dir").resolve()
# EDIT: define name for scenario checkpoints directory; the script will put it under the same parent directory as fm
scenario_name = "scenario_name"
# creating the relative paths
foundation_model_path = project_dir / "fm"
scenario_model_path = project_dir / "scenarios" / scenario_name
# --- Target Definition -------------------------------------------------------
# regression definition, for reference:
# the target is a continuous value computed from events in the future window
# here: sum of TARGET_COLUMN over TARGET_EVENT_TABLE within TARGET_WINDOW_DAYS
# the model predicts this future value
# EDIT: target details
TARGET_EVENT_TABLE = "transactions" # event data source where the target is calculated
TARGET_COLUMN = "price" # column with values to aggregate (e.g. spend)
TARGET_WINDOW_DAYS = 60 # future window length (here: days)
# EDIT: number of values predicted per entity (use > 1 for multi-target regression)
num_targets = 1
def target_fn(_history: Events, future: Events, _entity: Attributes, _ctx: Dict) -> np.ndarray | None:
# filters out users with too short remaining window
if has_incomplete_training_window(_ctx, timedelta(days=TARGET_WINDOW_DAYS)):
return None
# trims the future to desired window
future_window = future.interval_from(
_ctx[SPLIT_TIMESTAMP],
timedelta(days=TARGET_WINDOW_DAYS),
)
# regression target: total spend in the future window
y = np.sum(future_window[TARGET_EVENT_TABLE][TARGET_COLUMN].events)
return np.array([y], dtype=np.float32)
# --- Training ----------------------------------------------------------------
# EDIT: metaparams - keep default unless experimenting
learning_rate = 3e-5
epochs = 3 # use 1 for smoke test
# EDIT: limited runs - use for smoke test, then comment out here and below
limit_train_batches = 5
limit_val_batches = 5
# EDIT: parallelised training - comment out to default to a single GPU
strategy = "ddp"
devices = [0, 1] # list GPU indices
# For more options refer to docs: https://docs.basemodel.ai/reference/trainingparams
task = RegressionTask(num_targets=num_targets)
training_params = TrainingParams(
checkpoint_dir=scenario_model_path,
learning_rate=learning_rate,
epochs=epochs,
devices=devices,
strategy=strategy,
limit_train_batches=limit_train_batches, # smoke test, comment out for full runs
limit_val_batches=limit_val_batches, # smoke test, comment out for full runs
)
trainer = load_from_foundation_model(
checkpoint_path=foundation_model_path,
downstream_task=task,
target_fn=target_fn,
)
trainer.fit(training_params=training_params, overwrite=True) # replace with resume=True for resumed training
This script is part of the onboarding package shipped with every BaseModel installation.