Jump to Content

These docs are for v1.0.0. Click to read the latest docs for v1.2.0.

Release 0.17

July 4th, 2025

New features - Core BaseModel Repository:

Hybrid train/test split
Users can now combine entity-based training and validation splits with a time-based test set. This mirrors production scenarios more closely by ensuring that the test set simulates future data while respecting entity isolation during training.

Limited end date of training and validation
Introduced the training_validation_end parameter to limit the latest date included in training and validation splits when using entity-based partitioning.

Flexible training validation interval
Introduced check_val_every_n_steps and check_val_every_n_epochs parameters for more granular control over validation frequency during training.

Reproducible results
A seed parameter has been added to key methods (fit_behavioral_representation, train_foundation_model, pretrain, fit, evaluate, predict, and test) to ensure consistent and reproducible outputs across runs.

Flexible Kerberos configuration
[BREAKING CHANGE]: In Hive connection configuration separate realm for Kerberos can now be defined with kinit_realm parameter while realm for connection string can be defined in ini file.

Improvements:

Refactored run continuation logic
The overwrite and resume parameters must now be passed directly to the fit method rather than being read from TrainingParams.

Refined interpretability date specification
The target date for interpretation should now be provided via the prediction_date parameter instead of being inferred from split.

Normalized feature importance in interpretability
Feature importance scores are now normalized based on each feature’s input size, enabling fairer comparisons between features of different scales.

Improved Parquet cache behavior
- Cache is automatically refreshed if the source Parquet file has changed—no manual deletion required.
- The cache_path parameter is now optional. However, disabling the cache is not recommended, as it can significantly increase runtime and slow down data processing.

Optimized training behavior
The warm_start_steps parameter has been removed following internal improvements to the training loop.

Accelerated training
Multiple internal optimizations have led to 2–3× faster performance on benchmark datasets, including:
- Accelerated data loading pipeline
- More efficient handling of time-based features for entities with long histories
- Streamlined data validation and comparison logic

Fixes:

Resolved an issue where complex column types (e.g., lists of strings) caused errors during preprocessing. These types are now skipped with a warning.
Fixed excessive memory usage during the prediction phase.