Hidden Expert Options
The list of options deliberately left out from official documentation
The following options are not visible in the general public documentation.
Model Training Options
pretrain, fit_behavioral_representation
pretrain, fit_behavioral_representation- uniqueness_threshold : float
Required. Default: 0.9
⚠️ FOR TEST PURPOSES ONLY. Maximum uniqueness ratio to hash a column. - sketch_depth : int | None
Optional
⚠️ FOR TEST PURPOSES ONLY. When provided, sketches of given depth will be produced. Otherwise, the default value will be used. - sketch_width : int | None
Optional
⚠️ FOR TEST PURPOSES ONLY. When provided, sketches of given width will be produced. Otherwise, the default value will be used.
data_loader_params
data_loader_params- worker_init_fn : Callable[[int], None] | None
default: None
Custom initialization function for PyTorch data loader workers. Primarily used in distributed training setups to control how data is partitioned (sharded) across multiple workers. When using multiple workers, the dataset is split into chunks, and each worker processes only its assigned chunk. Theworker_init_fnensures that each worker knows which chunk it is responsible for, enabling parallel, non-overlapping data loading. This setting is automatically managed by the system and Snowflake integration — users should not modify it manually.
data_params
data_params- features_path : str
default: None
The path to the folder with features created during the foundation model training. Please do not specify it inYAMLfile - it should be provided as argument to pretrain function or terminal command and is then overwritten here. It can no longer be modified at scenario stage.
query_optimization_params
query_optimization_params- sampling_params : SamplingParams, optional
default: SamplingParams
Sampling parameters used to change the size of the sample obtained from the dataset. Sample will be used to train proper model features. Changing default values is not recommended. Keyword arguments are:- num_entities : int, optional
Maximal number of entities to sample. If not provided, optimal number of samples will be calculated automatically. - history_limit: int, optional
Maximal number of events per entity to sample. If not provided, optimal number of samples will be calculated automatically.
- num_entities : int, optional
Diagnostic options
Memory usage during training can be track with DiagnosticLogger which has to be defined before any BaseModel stage: foundation model training, scenario training or inference. The code snippet below shows how to save memory usage to csv file with DiagnosticLogger and how to visualize it on a plot with visualise_memory_usage function.
from pathlib import Path
from monad.ui import pretrain
from monad.ui.utils import DiagnosticLogger, visualise_memory_usage
DiagnosticLogger(output_path=Path("/path/to/diagnostics/results"), devices=[0])
pretrain(
config_path=Path("/path/to/configuration/file"),
output_path=Path("/path/to/output/file"),
)
visualise_memory_usage(Path("/path/to/diagnostics/results"))As a result in /path/to/diagnostics/results/.DIAGNOSTICS directory 2 files will be saved, memory_usage.csv with memory usage values and memory_usage_plot.png with memory usage plot.
Updated 7 months ago