class monad.ui.config.TrainingParams
monad.ui.config.TrainingParamsDefines model training setup.
from monad.ui.config import TrainingParams
training_params = TrainingParams(
checkpoint_dir='/location/to/save/your/scenario/model'
epochs=1,
learning_rate=0.0001,
overwrite=True,
devices=[2],
limit_train_batches=10
)| Parameters |
|---|
epochs: int
Default: 1
Number of epochs to train the model for.
learning_rate: float
Default: 0.0003
The learning rate.
devices: list[int] | int
Default: 1
The devices to use. Positive integer defines how many devices to use, a list of integers indices which devices should be used, the value -1 indicate that all available devices should be used.
accelerator: Literal["cpu", "gpu"]
Default: "gpu"
The accelerator to use: GPU or CPU.
strategy: str | None
Default: None
Strategy for the distributed training. Supported strategies are:
- None: Pytorch Lightnings default strategy,
- "ddp": Distributed Data Parallel,
- "fsdp": Fully Sharded Data-Parallel 2 with a full tensor parallelism
- "fsdp:%d:%d": Fully Sharded Data-Parallel 2 where first the int defines the data parallelism (replication) and the second int defines tensor parallelism (sharding).
precision: Literal[64, 32, 16, "64", "32", "16", "bf16", "16-true", "16-mixed", "bf16-true", "bf16-mixed", "32-true", "64-true"]
Default: DEFAULT_PRECISION
Controls Float precision used for training; double precision (64, ‘64’ or ‘64-true’), full precision (32, ‘32’ or ‘32-true’), 16bit mixed precision (16, ‘16’, ‘16-mixed’) or bfloat16 mixed precision (‘bf16’, ‘bf16-mixed’). DEFAULT_PRECISION constant sets precision to "bf16-mixed" if CUDA is available, else "16-mixed".
limit_train_batches: int | float | None
Default: None
Limits the number of train batches per epoch (float = fraction, int = num_batches). Use eg. to speed up testing.
limit_val_batches : int | float | None
Default: None
Limits the number of validation batches per epoch (float = fraction, int = num_batches). Use eg. to speed up testing.
gradient_clip_val: int | float | None
Default: None
Gradient clipping value (above which the gradients are clipped) passed to PytorchLightning trainer.
checkpoint_every_n_steps : int | None
Default: None
Whether intra-epoch checkpointing should be performed.
early_stopping: EarlyStopping
Default: None
Whether to add early stopping callback to the training. If there is no improvement in the model's performance after subsequent validations, it will end the training before the defined number of epochs.
entity_ids: EntityIds
_Default: None
_
Restricts the set of entity IDs used during training or testing.
| Parameters Exclusive to Scenario Model |
|---|
loss : Callable | None
Default: None
The loss function to use. If not provided, default loss function for a task will be used.
metrics: list[MetricParams]
Default: None
Metrics to use in validation. If not provided, default validation metrics function for a task will be used.
checkpoint_dir: str | pathlib.Path | None
Default: None
If provided, points to the location where model checkpoints will be stored.
metric_to_monitor: str | None
Default: None
Determines which metric should be used to select the best model for saving.
metric_monitoring_mode: MetricMonitoringMode| None
Default: None
Indicates whether the smaller or greater value of selected metric is the better.
callbacks: Optional[list[lightning.pytorch.callbacks.Callback]]
Default: list
List of additional Pytorch Lightning callbacks to add to training.
top_k: int | None
Default: None
Only valid at downstream model stage, for a recommendation task. Number of targets to recommend. Top k targets will be included in validation metrics, it does not have impact on model training. In set to None, 12 will be used.
