Testing Parameters
Parameters for model evaluation and prediction generation. Used with the .test() and .predict() methods on a scenario model.
test() vs predict()
Both methods accept TestingParams, but they differ in behavior:
test() |
predict() |
|
|---|---|---|
| Purpose | Evaluate against ground truth | Score new/unseen data |
| Metrics | Computed and logged | Ignored |
| Data split | Uses the test split from training | Uses prediction_date as cutoff |
prediction_date |
Optional (defaults to test split boundary) | Required for entity-based splits |
Use test() during development to measure model quality. Use predict() to generate production predictions.
TestingParams
| Parameter | Type | Default | Description |
|---|---|---|---|
output_type |
OutputType |
required | Format in which to save the predictions. See OutputType below. |
local_save_location |
Path \| None |
None |
Local file path for predictions in TSV format. Must end with .tsv. |
remote_save_location |
DataLocation \| None |
None |
Remote database table for storing predictions. Snowflake and Databricks are supported. |
limit_test_batches |
int \| None |
None |
Limit number of test/predict batches to process. |
precision |
Literal[...] |
32 |
Float precision used for testing. See Precision Values. |
prediction_date |
datetime \| None |
None |
Date for which to make predictions. Required when using entity-based splits. |
Inherited Parameters
Shared with TrainingParams:
| Parameter | Type | Default | Description |
|---|---|---|---|
devices |
list[int] \| int |
1 |
GPU devices to use. |
accelerator |
"cpu" \| "gpu" |
"gpu" |
Accelerator type. |
strategy |
str \| None |
None |
Distributed strategy. |
metrics |
list[MetricParams \| CustomMetric] |
[] |
Metrics to compute during testing. See MetricParams and CustomMetric. |
top_k |
int \| None |
None |
Limit predictions to top-k items/classes (recommendation, multilabel). |
predictions_threshold |
float \| None |
None |
Classification threshold (binary, multilabel). Mutually exclusive with top_k. |
entity_ids |
EntityIds \| None |
None |
Limit predictions to specific entity IDs. |
callbacks |
list[Callback] |
[] |
PyTorch Lightning callbacks. |
approximate_decoding_params |
ApproximateDecodingParams \| None |
None |
Approximate decoding for recommendation tasks. |
OutputType
from monad.config import OutputType
# Available values:
OutputType.RAW_MODEL
OutputType.ENCODED
OutputType.DECODED
OutputType.SEMANTIC
Meaning Per Task Type
| Task | RAW_MODEL |
ENCODED |
DECODED |
SEMANTIC |
|---|---|---|---|---|
| Binary | Logits | Logits | Probabilities | 0 or 1 (based on threshold) |
| Multiclass | Log-softmax | Log-softmax | Probabilities (with filtering) | Class names (with filtering) |
| Multilabel | Logits | Logits | Probabilities (with filtering) | Class names (with filtering, requires top_k) |
| Regression | Raw output | Internal representation | Human-readable values | Human-readable values |
| Recommendation | Raw output | Sketch (compact) | Probabilities per item | Item IDs/names |
For all task types except recommendations, we suggest using DECODED.
Tip
Use DECODED for most inference pipelines. Use SEMANTIC when results need to be human-readable. Use ENCODED for recommendation models when you need compact output that can be decoded later with readout_sketch().
Usage Examples
Basic Test and Predict
Both methods accept an optional seed parameter for reproducible ordering of results.
from pathlib import Path
from monad.config import TestingParams, OutputType, MetricParams
testing_params = TestingParams(
output_type=OutputType.DECODED,
devices=[0],
local_save_location=Path("./predictions.tsv"),
metrics=[
MetricParams(alias="auroc", metric_name="AUROC"),
],
)
# Test — loads checkpoint, returns predictions
results = module.test(testing_params)
# Predict — loads checkpoint, saves predictions to local/remote location
module.predict(testing_params, seed=42)
Recommendation with Top-K
testing_params = TestingParams(
output_type=OutputType.SEMANTIC,
devices=[0],
top_k=10,
local_save_location=Path("./top10_predictions.tsv"),
)
module.predict(testing_params)
Multi-GPU Inference
testing_params = TestingParams(
output_type=OutputType.DECODED,
devices=[0, 1, 2, 3],
strategy="ddp",
local_save_location=Path("./predictions.tsv"),
)
module.predict(testing_params)
Writing Predictions to Snowflake
from monad.config import TestingParams, OutputType
from monad.config.data_source import DataLocation
testing_params = TestingParams(
output_type=OutputType.DECODED,
devices=[0],
remote_save_location=DataLocation(
database_type="snowflake",
connection_params={
"user": "${SNOWFLAKE_USER}",
"password": "${SNOWFLAKE_PASSWORD}",
"account": "${SNOWFLAKE_ACCOUNT}",
"warehouse": "${SNOWFLAKE_WAREHOUSE}",
"database": "MY_DATABASE",
"schema": "PUBLIC",
},
table_name="predictions_output",
),
)
module.predict(testing_params)
Writing Predictions to Databricks
from monad.config import TestingParams, OutputType
from monad.config.data_source import DataLocation
testing_params = TestingParams(
output_type=OutputType.DECODED,
devices=[0],
remote_save_location=DataLocation(
database_type="databricks",
connection_params={
"host": "${DATABRICKS_HOST}",
"warehouse_id": "${DATABRICKS_WAREHOUSE_ID}",
"token": "${DATABRICKS_TOKEN}",
},
table_name="predictions_output",
),
)
module.predict(testing_params)
Databricks write behavior
The target table is created on demand if it does not exist, and rows are appended in batches (no surrounding transaction). Tune the batch size with the DATABRICKS_WRITE_BATCH_SIZE environment variable (default 1000).
TSV Output Schema
When local_save_location is set, predictions are saved as a tab-separated file. The columns depend on the task type and output_type:
| Task | Columns |
|---|---|
| Binary | entity_id, score, label |
| Multiclass | entity_id, score_<class1>, score_<class2>, ..., label |
| Multilabel | entity_id, score_<class1>, score_<class2>, ..., label_<class1>, ... |
| Regression | entity_id, prediction, label |
| Recommendation | entity_id, item_id, score (with SEMANTIC/DECODED) |
Tip
Inspect the header before parsing: head -1 predictions.tsv. Column names and ordering vary by task type and output type. The label column is only present in test() output (not predict()).
Prediction Utilities
readout_sketch()
Decode recommendation predictions saved with OutputType.ENCODED into per-item scores.
from monad.ui.module import readout_sketch
generator = readout_sketch(
predictions_file="./predictions.tsv",
checkpoint_path="./reco_model",
)
for entity_id, scores in generator:
print(f"Entity: {entity_id}, Scores shape: {scores.shape}")
| Parameter | Type | Description |
|---|---|---|
predictions_file |
str |
Path to predictions file saved with OutputType.ENCODED. |
checkpoint_path |
str |
Path to the recommendation model checkpoint. |
Returns a generator yielding (entity_id: str, scores: np.ndarray) tuples.
read_target_entity_ids()
Get the mapping from target entity IDs (e.g., product IDs) to their indices in the decoded sketch.
from monad.ui.module import read_target_entity_ids
target_to_index = read_target_entity_ids(
checkpoint_path="./reco_model",
)
# Returns: {"product_001": 0, "product_002": 1, ...}
| Parameter | Type | Description |
|---|---|---|
checkpoint_path |
str |
Path to the recommendation model checkpoint. |
Returns a dict[str, int] mapping entity IDs to indices.