Skip to content

Testing Parameters

Parameters for model evaluation and prediction generation. Used with the .test() and .predict() methods on a scenario model.

test() vs predict()

Both methods accept TestingParams, but they differ in behavior:

test() predict()
Purpose Evaluate against ground truth Score new/unseen data
Metrics Computed and logged Ignored
Data split Uses the test split from training Uses prediction_date as cutoff
prediction_date Optional (defaults to test split boundary) Required for entity-based splits

Use test() during development to measure model quality. Use predict() to generate production predictions.

from monad.config import TestingParams, OutputType

TestingParams

Parameter Type Default Description
output_type OutputType required Format in which to save the predictions. See OutputType below.
local_save_location Path \| None None Local file path for predictions in TSV format. Must end with .tsv.
remote_save_location DataLocation \| None None Remote database table for storing predictions. Snowflake and Databricks are supported.
limit_test_batches int \| None None Limit number of test/predict batches to process.
precision Literal[...] 32 Float precision used for testing. See Precision Values.
prediction_date datetime \| None None Date for which to make predictions. Required when using entity-based splits.

Inherited Parameters

Shared with TrainingParams:

Parameter Type Default Description
devices list[int] \| int 1 GPU devices to use.
accelerator "cpu" \| "gpu" "gpu" Accelerator type.
strategy str \| None None Distributed strategy.
metrics list[MetricParams \| CustomMetric] [] Metrics to compute during testing. See MetricParams and CustomMetric.
top_k int \| None None Limit predictions to top-k items/classes (recommendation, multilabel).
predictions_threshold float \| None None Classification threshold (binary, multilabel). Mutually exclusive with top_k.
entity_ids EntityIds \| None None Limit predictions to specific entity IDs.
callbacks list[Callback] [] PyTorch Lightning callbacks.
approximate_decoding_params ApproximateDecodingParams \| None None Approximate decoding for recommendation tasks.

OutputType

from monad.config import OutputType

# Available values:
OutputType.RAW_MODEL
OutputType.ENCODED
OutputType.DECODED
OutputType.SEMANTIC

Meaning Per Task Type

Task RAW_MODEL ENCODED DECODED SEMANTIC
Binary Logits Logits Probabilities 0 or 1 (based on threshold)
Multiclass Log-softmax Log-softmax Probabilities (with filtering) Class names (with filtering)
Multilabel Logits Logits Probabilities (with filtering) Class names (with filtering, requires top_k)
Regression Raw output Internal representation Human-readable values Human-readable values
Recommendation Raw output Sketch (compact) Probabilities per item Item IDs/names

For all task types except recommendations, we suggest using DECODED.

Tip

Use DECODED for most inference pipelines. Use SEMANTIC when results need to be human-readable. Use ENCODED for recommendation models when you need compact output that can be decoded later with readout_sketch().

Usage Examples

Basic Test and Predict

Both methods accept an optional seed parameter for reproducible ordering of results.

from pathlib import Path
from monad.config import TestingParams, OutputType, MetricParams

testing_params = TestingParams(
    output_type=OutputType.DECODED,
    devices=[0],
    local_save_location=Path("./predictions.tsv"),
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
    ],
)

# Test — loads checkpoint, returns predictions
results = module.test(testing_params)

# Predict — loads checkpoint, saves predictions to local/remote location
module.predict(testing_params, seed=42)

Recommendation with Top-K

testing_params = TestingParams(
    output_type=OutputType.SEMANTIC,
    devices=[0],
    top_k=10,
    local_save_location=Path("./top10_predictions.tsv"),
)

module.predict(testing_params)

Multi-GPU Inference

testing_params = TestingParams(
    output_type=OutputType.DECODED,
    devices=[0, 1, 2, 3],
    strategy="ddp",
    local_save_location=Path("./predictions.tsv"),
)

module.predict(testing_params)

Writing Predictions to Snowflake

from monad.config import TestingParams, OutputType
from monad.config.data_source import DataLocation

testing_params = TestingParams(
    output_type=OutputType.DECODED,
    devices=[0],
    remote_save_location=DataLocation(
        database_type="snowflake",
        connection_params={
            "user": "${SNOWFLAKE_USER}",
            "password": "${SNOWFLAKE_PASSWORD}",
            "account": "${SNOWFLAKE_ACCOUNT}",
            "warehouse": "${SNOWFLAKE_WAREHOUSE}",
            "database": "MY_DATABASE",
            "schema": "PUBLIC",
        },
        table_name="predictions_output",
    ),
)

module.predict(testing_params)

Writing Predictions to Databricks

from monad.config import TestingParams, OutputType
from monad.config.data_source import DataLocation

testing_params = TestingParams(
    output_type=OutputType.DECODED,
    devices=[0],
    remote_save_location=DataLocation(
        database_type="databricks",
        connection_params={
            "host": "${DATABRICKS_HOST}",
            "warehouse_id": "${DATABRICKS_WAREHOUSE_ID}",
            "token": "${DATABRICKS_TOKEN}",
        },
        table_name="predictions_output",
    ),
)

module.predict(testing_params)

Databricks write behavior

The target table is created on demand if it does not exist, and rows are appended in batches (no surrounding transaction). Tune the batch size with the DATABRICKS_WRITE_BATCH_SIZE environment variable (default 1000).


TSV Output Schema

When local_save_location is set, predictions are saved as a tab-separated file. The columns depend on the task type and output_type:

Task Columns
Binary entity_id, score, label
Multiclass entity_id, score_<class1>, score_<class2>, ..., label
Multilabel entity_id, score_<class1>, score_<class2>, ..., label_<class1>, ...
Regression entity_id, prediction, label
Recommendation entity_id, item_id, score (with SEMANTIC/DECODED)

Tip

Inspect the header before parsing: head -1 predictions.tsv. Column names and ordering vary by task type and output type. The label column is only present in test() output (not predict()).


Prediction Utilities

readout_sketch()

Decode recommendation predictions saved with OutputType.ENCODED into per-item scores.

from monad.ui.module import readout_sketch

generator = readout_sketch(
    predictions_file="./predictions.tsv",
    checkpoint_path="./reco_model",
)

for entity_id, scores in generator:
    print(f"Entity: {entity_id}, Scores shape: {scores.shape}")
Parameter Type Description
predictions_file str Path to predictions file saved with OutputType.ENCODED.
checkpoint_path str Path to the recommendation model checkpoint.

Returns a generator yielding (entity_id: str, scores: np.ndarray) tuples.

read_target_entity_ids()

Get the mapping from target entity IDs (e.g., product IDs) to their indices in the decoded sketch.

from monad.ui.module import read_target_entity_ids

target_to_index = read_target_entity_ids(
    checkpoint_path="./reco_model",
)
# Returns: {"product_001": 0, "product_002": 1, ...}
Parameter Type Description
checkpoint_path str Path to the recommendation model checkpoint.

Returns a dict[str, int] mapping entity IDs to indices.