In this module, we configure model and training parameters of the downstream task, and write the target function that defines the model's target.

By default, the training parameters are defined as in Pretrain config. It means, that:

Data sources are loaded automatically.
Dates for training, validation and test sets are defined as in Pretrain config.
All model training parameters for training downstream task are defined as in Foundation Model.

This is done for simplicity and with the aim of not replicating unnecessary code. However, we understand that parameters set for your downstream task can differ from the ones used in Foundation Model, so we allow to change them.

We dive into two scenarios in the incoming sections.

Loading Foundation Model

In order to load Foundation Model trained in Pretrain phase, you need to use load_from_foundation_model method.

Example for churn model.

from monad.ui.module import load_from_foundation_model
from monad.ui.module import BinaryClassificationTask

trainer = load_from_foundation_model(
        checkpoint_path="<path/to/store/pretrain/artifacts>",
        downstream_task=BinaryClassificationTask(),
        target_fn=churn_target_fn, num_outputs=1
    )

Example for propensity model.

from monad.ui.module import load_from_foundation_model
from monad.ui.module import MultilabelClassificationTask

trainer = load_from_foundation_model(
        checkpoint_path="<path/to/store/pretrain/artifacts>",
        downstream_task=MultilabelClassificationTask(),
        target_fn=propensity_target_fn, num_targets=10
    )

Example for recommendation model.

from monad.ui.module import load_from_foundation_model
from monad.ui.module import RecommendationTask

trainer = load_from_foundation_model(
        checkpoint_path="<path/to/store/pretrain/artifacts>",
        downstream_task=RecommendationTask(),
        target_fn=recommendation_fn,
    )

Creates MonadModuleImpl from MonadCheckpoint where model saved under checkpoint_path is assumed to be a Foundation Model.

Parameters

Name	Type	Description	Default
checkpoint_path	str	Directory where all the checkpoint artifacts are stored.	required
downstream_task	Task	one of machine learning tasks defined in BaseModel. Possible values are RecommendationTask(), BinaryClassificationTask(), MultilabelClassificationTask(), MulticlassClassificationTask()	required
target_fn	Callable[[Events, Events, Attributes, Dict], Union[Tensor, ndarray, Sketch]]	Target function for the specified task.	required
pl_logger	Optional[Logger]	Instance of PytorchLightning logger.	None
loading_config	Optional[LoadingConfigParams]	A dictionary containing a mapping from datasource name (or from datasource name and mode) to the fields of DataSourceLoadingConfig. If provided, the listed parameters will be overwritten. Field datasource_cfg can't be changed.	None

Additionally, you can pass any parameters defined in MonadDataParams in order to overwrite parameters configured for Foundation Model training:

Parameters:

Name	Type	Description	Default
features_path	str	A path to the folder with features created during the pretrain phase.	required
data_start_date	datetime	Events after this date will be considered for training.	required
check_target_for_next_N_days	int	The number of days used to create the model's target. Not suitable for recommendation models.	None
validation_start_date	datetime	start date for the validation set	None
test_start_date	datetime	The date that the prediction is being calculated for. validation_start_date or test_start_date needs to be provided.	None
test_end_date	datetime	End date of the test period - prediction end date.	None
timebased_encoding	str	How to encode time based features; available encoding options are "fourier" or "two-hot".	'two-hot'
target_sampling_strategy	str	"valid" or "random" sampling strategy. For Foundation Model it should always be "random".	'random'
maximum_splitpoints_per_entity	int	The maximum number of splits into input and target events per entity.	1
num_query_chunks	int	This parameter represents the number of segments a query should be divided into. Splitting the query into smaller pieces can help reduce memory consumption on the database end.	1
use_recency_sketches	boolean	If true then recency sketches are used in training	True

Training downstream task

Once you loaded the Foundation Model, you should specify checkpoint_dir in MonadTrainingParams and overwrite any default parameters you need.

Constructor parameters for PyTorch Lightning Trainer.

Parameters

Name	Type	Description	Default
epochs	int	Number of epochs to train.	1
learning_rate	float	The learning rate.	0.001
devices	Union[List[int], str, int, None]	The devices to use. Can be set to a positive number (int or str), a sequence of device indices(list or str), the value `-1` to indicate all available devices should be used, or `auto` for automatic selection based on the chosen accelerator.	field(default_factory=lambda : [0])
accelerator	Literal['cpu', 'gpu']	The accelerator to use, as in PytorchLightning trainer.	'gpu'
precision	Literal[64, 32, 16, '64', '32', '16', 'bf16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true']	Double precision (64, ‘64’ or ‘64-true’), full precision (32, ‘32’ or ‘32-true’), 16bit mixed precision (16, ‘16’, ‘16-mixed’) or bfloat16 mixed precision (‘bf16’, ‘bf16-mixed’).	DEFAULT_PRECISION
limit_train_batches	Optional[Union[int, float]]	How much of training dataset to check (float = fraction, int = num_batches).	1.0
limit_val_batches	Optional[Union[int, float]]	How much of validation dataset to check (float = fraction, int = num_batches).	1.0
loss	Optional[Callable]	The loss function to use. If not provided, default loss function for a task will be used.	None
metrics	Optional[Dict[str, Metric]]	Metrics to use in validation. If not provided, default validation metrics function for a task will be used.	None
checkpoint_dir	Optional[Union[str, Path]]	If provided, points the location where checkpoints will be stored.	None
metric_to_monitor	Optional[str]	Decides what metric is responsible for determining the model quality for saving the state dict.	None
metric_monitoring_mode	Optional[MetricMonitoringMode]	whether the smaller or greater value of the metric is the better. Possible values are "min" and "max".	None
callbacks	List[Callback]	List of additional callbacks to add to training	list()
gradient_clip_val	Optional[Union[int, float]]	Gradient clipping value passed to PytorchLightning trainer	None
warm_start_steps	int	Number of warm-start training steps used while fine-tuning a supervised model. Ignored if no pretrained model is used.	0
top_k	int	Only for recommendation task. Number of targets to recommend. Top k targets will be included in validation metrics, it doesn't have impact on model training. Default value: 12.	12
targets_to_include	List[str]	Only for recommendation task. Target names that should be used for validation metrics, it doesn't have impact on model training.	None

Task-specific training

BaseModelAI supports a broad range of ML tasks:

Classification Tasks

BinaryClassificationTask — Each data point belongs to one of two categories, e.g. churn prediction; with binary classification task, we use binary cross-entropy with logits loss function.
MulticlassClassificationTask — Each data point belongs to one of multiple categories (3 or more), e.g. predicting the user's favorite brand; with multi-class classification, we use cross-entropy loss function.
MultilabelClassificationTask — Each data point can be assigned with zero or more labels, e.g. predicting the probability that the user will buy a product in some predefined categories; with multilabel classification task, we use binary cross-entropy with logits loss function.

Recommendation Tasks

RecommendationTask — For each data example, a list of n items is returned, e.g. personalized recommendation of offers or products.

Regression Tasks

RegressionTask — Predict continuous values based on the provided input, e.g. predicting how much the user will spend in the next year; with regression task, we use cross-entropy loss function.

Data

Data split

For training purposes, BaseModelAI can split data into train, validation, and test sets. It is done based on events' time windows configured for each phase. The data_start_date is our dataset's starting point — only events after this date will be considered.

To create train, validation, and test sets, we use subsequent dates validation_start_date and test_start_date to define targets. At each stage, we consider user history from data_start_date; what differs are split points that define which portion of events will be used as an input and which will be used to create the model's target.

In configuration file of the downstream task, you must set at least validation_start_date or test_start_date.

🚧
Important:
If you do not set validation_start_date but set test_start_date, you cannot train downstream task, only run prediction using already trained downstream model.

Classification and regression

For training purposes, we consider user history since data_start_date until training_max_split_date, which is automatically calculated based on validation_start_date:

training_max_split_date = validation_start_date - 1 - check_target_for_next_N_days

We randomly select split points between data_start_date and training_max_split_date. Thanks to that, if the latest possible split point is chosen — the one on training_max_split_date — we train on the next check_target_for_next_N_days which are NOT included in the validation period, so no data leakage occurs. Multiple split points can be selected for one entity. The number of split points is governed by the maximum_splitpoints_per_entity attribute. This enables us to create multiple data points from single user history.

Analogously, for testing, we use history from data_start_date to test_start_date as an input and use subsequent check_target_for_next_N_days to create the target.

For validation purposes, to create the model input, we use the whole user history from data_start_date to
validation_start_date. To create the target, we use check_target_for_next_N_days after validation_start_date. This way, we use all available history to make predictions, while preventing data leakage by predicting events that were not seen in the training phase.

Recommendations

In recommendation setup, we predict the next item in the basket, so we do not set check_target_for_next_N_days parameter, because it is not relevant.
For recommendations, training_max_split_date defaults to validation_start_date.
If we want to set aside test dataset, we must define test_start_date. If not, the whole period will be used.

Target function

Please refer to the Model Target Function section for details on how to create the target function that will fulfill your business needs.