Release 0.17
July 4th, 2025
New features - Core BaseModel Repository:
- Hybrid train/test split
Users can now combine entity-based training and validation splits with a time-based test set. This mirrors production scenarios more closely by ensuring that the test set simulates future data while respecting entity isolation during training.
- Limited end date of training and validation
Introduced thetraining_validation_endparameter to limit the latest date included in training and validation splits when using entity-based partitioning.
- Flexible training validation interval
Introducedcheck_val_every_n_stepsandcheck_val_every_n_epochsparameters for more granular control over validation frequency during training.
- Reproducible results
Aseedparameter has been added to key methods (fit_behavioral_representation,train_foundation_model,pretrain,fit,evaluate,predict, andtest) to ensure consistent and reproducible outputs across runs.
- Flexible Kerberos configuration
[BREAKING CHANGE]: In Hive connection configuration separate realm for Kerberos can now be defined withkinit_realmparameter while realm for connection string can be defined ininifile.
Improvements:
- Refactored run continuation logic
Theoverwriteandresumeparameters must now be passed directly to thefitmethod rather than being read fromTrainingParams.
- Refined interpretability date specification
The target date for interpretation should now be provided via theprediction_dateparameter instead of being inferred fromsplit.
- Normalized feature importance in interpretability
Feature importance scores are now normalized based on each feature’s input size, enabling fairer comparisons between features of different scales.
- Improved Parquet cache behavior
- Cache is automatically refreshed if the source Parquet file has changed—no manual deletion required.
- The
cache_pathparameter is now optional. However, disabling the cache is not recommended, as it can significantly increase runtime and slow down data processing.
- Optimized training behavior
Thewarm_start_stepsparameter has been removed following internal improvements to the training loop.
-
Accelerated training
Multiple internal optimizations have led to 2–3× faster performance on benchmark datasets, including:- Accelerated data loading pipeline
- More efficient handling of time-based features for entities with long histories
- Streamlined data validation and comparison logic
Fixes:
- Resolved an issue where complex column types (e.g., lists of strings) caused errors during preprocessing. These types are now skipped with a warning.
- Fixed excessive memory usage during the prediction phase.
