Model Tuning

This page covers the training_params and data_params settings that control what the model learns and how. For hardware, distribution, and memory settings see Scaling & Memory.

Optimization

The core training loop is controlled by a handful of parameters:

yaml

training_params:
  learning_rate: 0.0003
  epochs: 10
  precision: "bf16-mixed"
  gradient_clip_val: 1.0

See Training Parameters reference for all available fields and defaults.

Tuning tips:

The default learning rate (0.0003) works well for most datasets. The optimizer includes a built-in warmup, so this is the peak rate it warms up to and decays from.
bf16-mixed precision (default on CUDA) gives ~2× speed and ~50 % memory reduction on A100/H100. Use 16-mixed on older GPUs or 32 for debugging.
Set gradient_clip_val to null to disable gradient clipping.

Early Stopping

Stop training automatically when validation performance stops improving:

yaml

training_params:
  early_stopping:
    min_delta: 0.001
    patience: 3
    verbose: false

See EarlyStopping reference for all available fields.

Recommendations: set min_delta above 0 and patience above 1 to avoid premature stopping on noisy validation curves.

Checkpointing

Save model state at regular intervals to guard against interrupted runs:

yaml

training_params:
  checkpoint_every_n_steps: 1000

When set, BaseModel writes an intra-epoch checkpoint every N training steps in addition to the end-of-epoch save.

Entity Filtering

Restrict which entities participate in training — useful for excluding test populations, focusing on a segment, or debugging on a subset:

yaml

training_params:
  entity_ids:
    subquery: 'SELECT DISTINCT("customer_id") FROM CUSTOMERS WHERE "age" > 18'
    matching: true

You can supply entity IDs in two ways:

Write a SQL subquery that returns the IDs you want to target.
Alternatively, list the IDs in a text file and point to it with file.

Use matching: true to keep only the listed IDs, or matching: false to exclude them.

See Training Parameters reference for the full entity_ids schema.

Smoke Testing

Limit training to a few batches to verify the pipeline before committing to a full run:

yaml

training_params:
  limit_train_batches: 5
  limit_val_batches: 5

Remove batch limits for real training

This is what Basic Configuration uses by default. Remove both limits when you're ready for a real training run.

Data Sampling & Weighting

These data_params settings control how BaseModel samples and weights training examples. Defaults are optimized for most use cases — adjust only when you have a specific reason.

How sampling works

BaseModel generates training examples by splitting each entity's event history at various points in time into a "history" (input) and "future" (base for target). Split-point placement is optimized automatically to ensure meaningful training examples.

yaml

data_params:
  maximum_splitpoints_per_entity: 20
  split_point_data_sources: null
  dynamic_events_sampling: true
  ignore_entities_without_events: true
  limit_entity_num_events: null
  window_shuffling_buffer_size: 100000

When to change these:

Large datasets — lower maximum_splitpoints_per_entity (default 20) to speed up training when you have many entities. This caps how many training examples are generated per entity.
Focusing on specific event sources — set split_point_data_sources to restrict which sources contribute split-point timestamps (default: all sources).
Long histories — set limit_entity_num_events to cap events per entity (keeping the most recent), useful when some entities have disproportionately long histories that slow down training.

Keep dynamic event sampling enabled

Leave dynamic_events_sampling enabled — it randomly samples events from input to reduce overfitting. The remaining fields rarely need changing; see Training Parameters reference for the full list.

Weighting

Adjust how much influence each training example has:

yaml

data_params:
  apply_event_count_weighting: false
  apply_recency_based_weighting: false

Both are disabled by default. Consider enabling them when:

Imbalanced activity levels — turn on apply_event_count_weighting to give equal influence to low-activity and high-activity entities. Without it, entities with many events dominate training.
Recency matters more — turn on apply_recency_based_weighting to prioritize recent training examples. Useful when customer behavior shifts over time and older patterns are less relevant.