Testing scenario model

⚠️
Check This First!
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.

Once you have trained a downstream model, you will likely want to test its performance. To do this, you should prepare and execute a Python script with the following steps:

Import Required Packages, Classes and Functions: There are two required BaseModel.AI imports, but you may need additional ones if you want to use custom metrics, loggers, manipulate dates, etc.
Instantiate the Testing Module: Use the load_from_checkpoint method to load the best model according to the specified metric defined during training.
Define Testing Parameters: Use the TestingParams class to define testing parameters. You can override the parameters configured during the training of the downstream model.
Run evaluation: The test() method of the testing module will generate and save predictions.

Please see an example end-to-end script below:

from monad.config import DataTimeSplit, TimeRange
from monad.ui.config import OutputType, TestingParams, DataMode
from monad.ui.module import load_from_checkpoint
from datetime import datetime

# declare variables
checkpoint_path = "<path/to/downstream/model/checkpoints>" # location of scenario model checkpoints
save_path = "<path/to/predictions/predictions_and_ground_truth.tsv>" # location to store evaluation results
test_start_date = datetime(2023, 8, 1) # first day of test period
test_end_date = datetime(2023, 8, 22) # last day of test period

# load scenario model to instantiate testing module
testing_module = load_from_checkpoint(
    checkpoint_path = checkpoint_path,
    split={DataMode.TEST: TimeRange(start_date=test_start_date, end_date=test_end_date)}
)

# define testing parameters
testing_params = TestingParams(
    local_save_location = save_path,
    output_type = OutputType.DECODED,
)

# run evaluation
testing_module.test(testing_params = testing_params)

Necessary imports and variables

To evaluate the model, you need to import the required BaseModel functions and classes :

load_from_checkpoint from monad.ui.module - to instantiate training module.
TestingParams from monad.ui.config - to configure your predictions.

Instantiating the testing module

We instantiate the testing module by calling load_from_checkpoint and providing checkpoint_path (the location of your scenario model's checkpoints) along with any of the other optional arguments listed below.
This method will use the best of the model's checkpoints and the provided dataloaders to create an instance of the BaseModel module that enables the test() method, which we will use to generate predictions.

Arguments

checkpoint_path : str, required
No default
The directory where all the checkpoint artifacts of the scenario model are stored.
pl_logger : [Logger], optional
Default: None
An instance of PyTorch Lightning logger to use.
split: Optional[dict[DataMode, TimeRange]]
Default: None
Overwrites split defined during foundation model training phase in case when time-based split was used. If the test period was not defined in the pretrain configuration file, it can be assigned here. You need to set DataMode as TEST and provide dates in the TimeRange object, for example {DataMode.TEST: TimeRange(start_date=datetime(2023, 8, 1), end_date=datetime(2023, 8, 22)}
⚠️
Note!
Parameter split should be used if model was trained with time-based split. For models trained with entity-based split parameter prediction_date from TestingParams should be used.

Additonally, as kwargs, you can pass any parameters defined in data_params block in YAML configuration to overwrite those used during the training of the scenario model.

Have a look at an example of testing module instantiation with some additional arguments below.

testing_module = load_from_checkpoint(
    checkpoint_path = "<path/to/downstream/model/checkpoints>",
    split={DataMode.TEST: TimeRange(start_date=datetime(2023, 8, 1), end_date=datetime(2023, 8, 22)}
    pl_logger = neptune_logger # should be instantiated before
)

Configuring Inference with `TestingParams`

We should now use TestingParams, a class we imported from monad.ui.config, to set up our predictions.

Parameters

output_type: OutputType
No default
Output format in which to save the predictions. The table below explains how different values affect prediction outputs.

^Task	^{OutputType.RAW_MODEL}	^{OutputType.ENCODED}	^{OutputType.DECODED}	^{OutputType.SEMANTIC}
^{Binary Classification}	^{Raw model output}	^{Raw model output}	^{Recommended Sigmoid of the raw model output}	^{Sigmoid of the raw model output}
^{Multiclass Classification}	^{Raw model output}	^{Raw model output}	^{Recommended Softmax of the raw outputs}	^{Predicted class}
^{Multi-label classification}	^{Raw model output}	^{Raw model output}	^{Recommended Sigmoid of the raw outputs}	^{Class names sorted by score}
^Regression	^{Raw model output}	^{Transformed raw model output}	^Recommended ^{Predicted value}	^{Predicted value}
^{Recommendations}	^{Raw model output}	^{Log softmax of raw model output}	^{Internal BaseModel codes of entities}	^{Recommended A ranked list of recommended feature values}

prediction_date: Optional[datetime]
Default: None
For which date do test evaluation in case when entity-based split is used or for which date make predictions during the inference.
local_save_location: str
Default: None
If provided, points to the location in the local filesystem where evaluation results will be stored in TSV format.
remote_save_location: str
Default: None
If provided, defines a table in a remote database where the evaluation results will be stored.
limit_test_batches: int
Default: None
If provided, defines how many of batches to run evaluation over.
top_k: int
Default: None
Only for recommendation task. Number of top k values to recommend. It is highly advised to use this to reduce the size of the prediction file.

Additionally, stored parameters provided earlier as part of YAML training_params or during scenario model training as TrainingParamscan also be modified. This is useful e.g. if you want to change the device, modify the number of recommendation targets, or add a new callback. Please refer to this article for the full list of modifiable parameters.

Example

The example provided below demonstrates testing parameters. In addition to specifying the location for predictions, some parameters have been overwritten or added. Note that the metric added as an argument requires importing an additional class.

from monad.ui.config import OutputType

testing_params = TestingParams(  
  local_save_location="/path/where/evaluation/results/should/be/stored.tsv",  
  output_type=OutputType.DECODED,
  devices=[0],  
)

Running evaluation

Having instantiated the testing module and configured the testing parameters, we are now ready to start the evaluation run. This is done by simply calling the test() method of the testing module and providing testing_params as its argument and optionally seed.

Parameters

training_params: TestinfParams
No default Parameters defining the evaluation.
seed: int
Optional. Default: None Seed for the training, when provided, ensures reproducibility of the results.

testing_module.test(testing_params=testing_params)

Check This First!

Necessary imports and variables

Instantiating the testing module

Note!

Configuring Inference with TestingParams

Running evaluation

Configuring Inference with `TestingParams`