API Reference

pretrain

monad.ui.pretrain

monad.ui.pretrain(config_path, output_path, use_last_basket_sketches=True, recency_sketch_timespan_days=RECENCY_SKETCH_TIMESPAN_DAYS_DEFAULT, storage_config_path=None, nan_threshold=0.9, callbacks=None, pl_logger=None, resume=False, overwrite=False, seed=None)

Validates the configuration, then automatically runs both training stages: it fits the behavioral representation, and finally trains the foundation model using the output from the fitting step.

from monad.ui import pretrain
from pathlib import Path

pretrain(
    config_path=Path("path/to/config.yaml"), 
    output_path=Path("path/to/store/pretrain/artifacts")
)

Parameters

config_path : pathlib.Path
Path to YAML configuration file.


output_path : pathlib.Path
Path to store training results.


storage_config_path : Optional[pathlib.Path]
Default: None
File system configuration.


resume : bool
Default: False
If True, training will be resumed from the last checkpoint if such exists, an error will be thrown otherwise.


overwrite : bool
Default: False
If True, any previous training results will be overwritten. Otherwise, if resume is not set and checkpoints from previous training are present, error will be raised.

❗️

Note

The parameters resume and overwrite cannot both be set to True. Doing so will raise an error.


callbacks : Optional[list[pytorch_lightning.callbacks.Callback]]
Default: None
List of additional Pytorch Lightning callbacks to add to training.


pl_logger :Optional[pytorch_lightning.loggers.Logger]
Default: None
A logger compatible with PyTorch Lightning, used to record metrics and training progress.


use_last_basket_sketches : bool
Default: True.
Whether to include a sketch of the most recent events as an additional input.


recency_sketch_timespan_days : Optional[int]
_Default: RECENCY_SKETCH_TIMESPAN_DAYS_DEFAULT.
If set, defines the window in days for recency-based sketches. Recency sketches store information about how far in the past the interactions took place.


nan_threshold : float
Default: 0.9
Maximum fraction of missing values allowed in a column to process.


seed: Optional[int]
Default: None
Seed for the training, when provided, ensures reproducibility of the results.


Returns

Saves results under output_path.