`monad.ui.pretrain`

monad.ui.pretrain(config_path, output_path, use_last_basket_sketches=True, recency_sketch_timespan_days=RECENCY_SKETCH_TIMESPAN_DAYS_DEFAULT, storage_config_path=None, nan_threshold=0.9, callbacks=None, pl_logger=None, resume=False, overwrite=False, seed=None)

Validates the configuration, then automatically runs both training stages: it fits the behavioral representation, and finally trains the foundation model using the output from the fitting step.

from monad.ui import pretrain
from pathlib import Path

pretrain(
    config_path=Path("path/to/config.yaml"), 
    output_path=Path("path/to/store/pretrain/artifacts")
)

Parameters

config_path : pathlib.Path
Required
Path to YAML configuration file.

output_path : pathlib.Path
Required
Path to store training results.

resume : bool
Default: False
If True, training will be resumed from the last checkpoint if such exists, an error will be thrown otherwise.

overwrite : bool
Default: False
If True, any previous training results will be overwritten. Otherwise, if resume is not set and checkpoints from previous training are present, error will be raised.

❗️
Note
The parameters resume and overwrite cannot both be set to True. Doing so will raise an error.

callbacks : Optional[list[pytorch_lightning.callbacks.Callback]]
Default: None
List of additional Pytorch Lightning callbacks to add to training.

pl_logger :Optional[pytorch_lightning.loggers.Logger]
Default: None
A logger compatible with PyTorch Lightning, used to record metrics and training progress.

use_last_basket_sketches : bool
Default: True
Whether to include a sketch of the most recent events as an additional input.

recency_sketch_timespan_days : Optional[int]
_Default: RECENCY_SKETCH_TIMESPAN_DAYS_DEFAULT.
If set, defines the window in days for recency-based sketches. Recency sketches store information about how far in the past the interactions took place. If not set, recency sketches won't be used.

nan_threshold : float
Default: 0.9
Maximum fraction of missing values allowed in a column to process.

seed: Optional[int]
Default: None
Seed for the training, when provided, ensures reproducibility of the results.

Returns

Saves results under output_path.