Training the Model
Usage of pretrain function
Note
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.
Once you have successfully configured your data and model parameters in the YAML
file, it is time to train your foundation model! With BaseModel implemented as Docker container you can do that in two ways:
- run a
pretrain
Python function OR - run a
monad.pretrain
command in your terminal.
Both ways are explained in more detail below.
Start the training using Python function
The most basic syntax when using Python environment to launch the training is shown in example below:
from monad.ui import pretrain
from pathlib import Path
pretrain(
config_path=Path("path/to/config.yaml"),
output_path=Path("path/to/store/pretrain/artifacts")
)
The pretrain
function allows for various additional arguments to manage the training process. Below is the full list of configurations accepted at this stage.
Good to Know
Some of these arguments directly correspond to
YAML
parameters. If specified here, they will override the entries in the configuration file.
Parameters |
---|
- config_path : str
Required. No default.
The path toYAML
configuration file.
- output_path : str
Required. No default.
The path to a folder you intend to store the results.
- storage_config_path : str
Optional. Default: None
The option to configure the file system.
- resume : boolean
Optional. Default: False
Whether to resume the training. If True, training will be resumed from the last checkpoint if such exists, an error will be thrown otherwise.
- overwrite : boolean
Optional. Default: False
Whether to overwrite the previous training results. If True, results will be overwritten. Otherwise, if resume is not set and checkpoints from previous training are present, error will be raised.
- callbacks : list[Callback]
Optional. Default: Lightning factory default
List of additional Pytorch Lightning callbacks to add to training.
- pl_logger : str
Optional. Default: None
PyTorch Lightning logger to use.
- uniqueness_threshold : float
Required. Default: 0.9
DO NOT USE - TEST OPTION ONLY. Maximum uniqueness ratio to hash a column.
- nan_threshold : float
Required. Default: 0.9
Maximum fraction of missing values allowed in a column to process.
Initiate training in a command line
An alternative way is to use your terminal; in this case use the syntax below:
python -m monad.pretrain
--config "path/to/config.yml"
--features-path "path/to/store/pretrain/artifacts"
--overwrite
As before, you can add arguments to manage the training process, but the options are narrower:
Parameters |
---|
- --config
required
Requires"str"
. The path toYAML
configuration file. Equivalent to config_path in Python.
- --features-path
required
Requires"str"
. The path to a folder you intend to store the results. Equivalent to output_path in Python.
- --storage-config
optional
Requires"str"
. The option to configure the file system.
- --resume
optional
If provided, training will be resumed from the last checkpoint if such exists, an error will be thrown otherwise.
- --overwrite
optional
If provided, previous training results will be overwritten. Otherwise, if resume is not set and checkpoints from previous training are present, error will be raised.
Good to Know
Some of these arguments directly correspond to
YAML
parameters (some names are slightly different).
If specified here, they will override the entries in the configuration file.
End of Foundation Model training
The model will have finished training once console output states that the model checkpoints have been saved, and _FINISHED
folder with best model is created under the output_path
.
Updated 7 days ago