Training Parameters
Parameters that control the foundation model training process. These are set in the training_params section of the YAML config or passed as a TrainingParams object in Python.
TrainingParams
| Parameter | Type | Default | Description |
|---|---|---|---|
epochs |
int |
1 |
Number of epochs to train the model for. |
learning_rate |
float |
0.0001 |
Learning rate. |
check_val_every_n_steps |
int \| None |
None |
Run validation every N training steps. Disables validation on epoch end when set. |
check_val_every_n_epochs |
int \| None |
1 |
Run validation every N epochs. |
limit_train_batches |
int \| None |
None |
Limit number of training batches per epoch. Useful for quick validation of setup. |
limit_val_batches |
int \| None |
None |
Limit number of validation batches per epoch. |
loss |
Callable \| None |
None |
Custom loss function. |
checkpoint_dir |
str \| Path \| None |
None |
Directory to store model checkpoints. |
metric_to_monitor |
str \| None |
None |
Metric alias to monitor for best-model selection. |
metric_monitoring_mode |
MetricMonitoringMode \| None |
None |
Whether to minimize ("min") or maximize ("max") the monitored metric. |
gradient_clip_val |
int \| float \| None |
None |
Value above which gradients are clipped. |
checkpoint_every_n_steps |
int \| None |
None |
Enable intra-epoch checkpointing at every N steps. |
early_stopping |
EarlyStopping \| None |
None |
Early stopping configuration. See EarlyStopping below. |
precision |
Literal[...] |
"bf16-mixed" / "16-mixed" |
Float precision for training. Defaults to "bf16-mixed" on GPUs with bfloat16 support, otherwise "16-mixed". See Precision Values below. |
Inherited Parameters
These are inherited from the base parameter class and shared with TestingParams.
| Parameter | Type | Default | Description |
|---|---|---|---|
devices |
list[int] \| int \| "auto" |
"auto" |
GPU devices to use. "auto" selects the least-occupied GPU automatically, falling back to CPU if none are available. A positive int specifies count; a list[int] specifies device indices; -1 uses all available GPUs. |
accelerator |
"cpu" \| "gpu" |
"gpu" |
Accelerator type. |
strategy |
str \| None |
None |
Distributed training strategy: None (PyTorch Lightning default), "auto", "ddp", "fsdp", or "fsdp:A:B" (A = data parallelism, B = tensor parallelism). |
nccl_timeout |
timedelta \| None |
None |
Timeout for NCCL collective operations in distributed training (DDP/FSDP). A bare number is interpreted as seconds. Ignored on a single device. |
rank_sync_timeout |
timedelta \| None |
None |
Timeout for a dedicated per-step rank-synchronization barrier that absorbs data-loading skew between ranks, independently of nccl_timeout (which then only needs to cover the gradient sync). A bare number is interpreted as seconds. Ignored on a single device. |
Note
Parameters such as metrics, top_k, predictions_threshold, entity_ids, callbacks, and approximate_decoding_params are inherited from the base class but not applicable to foundation model training. Custom metrics are explicitly rejected during pretraining. These parameters are available for Scenario Model Training.
EarlyStopping
Wraps the configuration for PyTorch Lightning's early stopping callback.
from monad.config import EarlyStopping
early_stopping = EarlyStopping(
min_delta=0.001,
patience=5,
verbose=True,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
min_delta |
float |
0.0 |
Minimum change in the monitored metric to qualify as an improvement. |
patience |
int |
3 |
Number of validation checks with no improvement after which training stops. |
verbose |
bool |
False |
Whether to log information about registered improvements. |
Precision Values
Valid values for the precision parameter:
| Value | Description |
|---|---|
32 or "32" or "32-true" |
Full 32-bit float precision. |
64 or "64" or "64-true" |
Full 64-bit float precision. |
16 or "16" or "16-true" |
Pure 16-bit float precision. |
"16-mixed" |
Mixed precision with float16. |
"bf16" or "bf16-true" |
Pure bfloat16 precision. |
"bf16-mixed" |
Mixed precision with bfloat16. Default on supported GPUs. |
Tip
"bf16-mixed" is the default on GPUs that support bfloat16 (e.g., A100, H100). On older GPUs, it falls back to "16-mixed". Mixed precision provides a good balance between training speed and numerical stability.
YAML Example
In the foundation model config file:
training_params:
learning_rate: 0.0003
epochs: 3
precision: "bf16-mixed"
strategy: "ddp"
devices: [0, 1]
early_stopping:
min_delta: 0.001
patience: 5