Release 0.17
New features - Core BaseModel Repository:
-
Hybrid train/test split
Users can now combine entity-based training and validation splits with a time-based test set. This mirrors production scenarios more closely by ensuring that the test set simulates future data while respecting entity isolation during training. -
Limited end date of training and validation
Introduced thetraining_validation_end
parameter to limit the latest date included in training and validation splits when using entity-based partitioning. -
Flexible training validation interval
Introducedcheck_val_every_n_steps
andcheck_val_every_n_epochs
parameters for more granular control over validation frequency during training. -
Reproducible results
Aseed
parameter has been added to key methods (fit_behavioral_representation
,train_foundation_model
,pretrain
,fit
,evaluate
,predict
, andtest
) to ensure consistent and reproducible outputs across runs. -
Refactored run continuation logic
Theoverwrite
andresume
parameters must now be passed directly to thefit
method rather than being read fromTrainingParams
. -
Refined interpretability date specification
The target date for interpretation should now be provided via theprediction_date
parameter instead of being inferred fromsplit
. -
Flexible Kerberos configuration
[BREAKING CHANGE]: In Hive connection configuration separate realm for Kerberos can now be defined withkinit_realm
parameter while realm for connection string can be defined inini
file. -
Normalized feature importance in interpretability
Feature importance scores are now normalized based on each feature’s input size, enabling fairer comparisons between features of different scales. -
Improved Parquet cache behavior
- Cache is automatically refreshed if the source Parquet file has changed—no manual deletion required.
- The
cache_path
parameter is now optional. However, disabling the cache is not recommended, as it can significantly increase runtime and slow down data processing.
-
Optimized training behavior
The
warm_start_steps
parameter has been removed following internal improvements to the training loop.
Performance
Multiple internal optimizations have led to 2–3× faster performance on benchmark datasets, including:
- Accelerated data loading pipeline
- More efficient handling of time-based features for entities with long histories
- Streamlined data validation and comparison logic
Fixes:
- Resolved an issue where complex column types (e.g., lists of strings) caused errors during preprocessing. These types are now skipped with a warning.
- Fixed excessive memory usage during the prediction phase.