Model Tuning
This page covers the training_params and data_params settings that control what the model learns and how. For hardware, distribution, and memory settings see Scaling & Memory.
Optimization
The core training loop is controlled by a handful of parameters:
training_params:
learning_rate: 0.0003
epochs: 10
precision: "bf16-mixed"
gradient_clip_val: 1.0
See Training Parameters reference for all available fields and defaults.
Tuning tips:
- The default learning rate (
0.0003) works well for most datasets. The optimizer includes a built-in warmup, so this is the peak rate it warms up to and decays from. bf16-mixedprecision (default on CUDA) gives ~2× speed and ~50 % memory reduction on A100/H100. Use16-mixedon older GPUs or32for debugging.- Set
gradient_clip_valtonullto disable gradient clipping.
Early Stopping
Stop training automatically when validation performance stops improving:
See EarlyStopping reference for all available fields.
Recommendations: set min_delta above 0 and patience above 1 to avoid premature stopping on noisy validation curves.
Checkpointing
Save model state at regular intervals to guard against interrupted runs:
When set, BaseModel writes an intra-epoch checkpoint every N training steps in addition to the end-of-epoch save.
Entity Filtering
Restrict which entities participate in training — useful for excluding test populations, focusing on a segment, or debugging on a subset:
training_params:
entity_ids:
subquery: 'SELECT DISTINCT("customer_id") FROM CUSTOMERS WHERE "age" > 18'
matching: true
You can supply entity IDs in two ways:
- Write a SQL
subquerythat returns the IDs you want to target. - Alternatively, list the IDs in a text file and point to it with
file.
Use matching: true to keep only the listed IDs, or matching: false to exclude them.
See Training Parameters reference for the full entity_ids schema.
Smoke Testing
Limit training to a few batches to verify the pipeline before committing to a full run:
Remove batch limits for real training
This is what Basic Configuration uses by default. Remove both limits when you're ready for a real training run.
Data Sampling & Weighting
These data_params settings control how BaseModel samples and weights training examples. Defaults are optimized for most use cases — adjust only when you have a specific reason.
How sampling works
BaseModel generates training examples by splitting each entity's event history at various points in time into a "history" (input) and "future" (base for target). Split-point placement is optimized automatically to ensure meaningful training examples.
data_params:
maximum_splitpoints_per_entity: 20
split_point_data_sources: null
dynamic_events_sampling: true
ignore_entities_without_events: true
limit_entity_num_events: null
window_shuffling_buffer_size: 100000
When to change these:
- Large datasets — lower
maximum_splitpoints_per_entity(default20) to speed up training when you have many entities. This caps how many training examples are generated per entity. - Focusing on specific event sources — set
split_point_data_sourcesto restrict which sources contribute split-point timestamps (default: all sources). - Long histories — set
limit_entity_num_eventsto cap events per entity (keeping the most recent), useful when some entities have disproportionately long histories that slow down training.
Keep dynamic event sampling enabled
Leave dynamic_events_sampling enabled — it randomly samples events from input to reduce overfitting. The remaining fields rarely need changing; see Training Parameters reference for the full list.
Weighting
Adjust how much influence each training example has:
Both are disabled by default. Consider enabling them when:
- Imbalanced activity levels — turn on
apply_event_count_weightingto give equal influence to low-activity and high-activity entities. Without it, entities with many events dominate training. - Recency matters more — turn on
apply_recency_based_weightingto prioritize recent training examples. Useful when customer behavior shifts over time and older patterns are less relevant.