HomeGuidesRecipesChangelog
Changelog

Release 0.17

New features - Core BaseModel Repository:

  • Hybrid train/test split
    Users can now combine entity-based training and validation splits with a time-based test set. This mirrors production scenarios more closely by ensuring that the test set simulates future data while respecting entity isolation during training.

  • Limited end date of training and validation
    Introduced the training_validation_end parameter to limit the latest date included in training and validation splits when using entity-based partitioning.

  • Flexible training validation interval
    Introduced check_val_every_n_steps and check_val_every_n_epochs parameters for more granular control over validation frequency during training.

  • Reproducible results
    A seed parameter has been added to key methods (fit_behavioral_representation, train_foundation_model, pretrain, fit, evaluate, predict, and test) to ensure consistent and reproducible outputs across runs.

  • Refactored run continuation logic
    The overwrite and resume parameters must now be passed directly to the fit method rather than being read from TrainingParams.

  • Refined interpretability date specification
    The target date for interpretation should now be provided via the prediction_date parameter instead of being inferred from split.

  • Flexible Kerberos configuration
    [BREAKING CHANGE]: In Hive connection configuration separate realm for Kerberos can now be defined with kinit_realm parameter while realm for connection string can be defined in ini file.

  • Normalized feature importance in interpretability
    Feature importance scores are now normalized based on each feature’s input size, enabling fairer comparisons between features of different scales.

  • Improved Parquet cache behavior

    • Cache is automatically refreshed if the source Parquet file has changed—no manual deletion required.
    • The cache_path parameter is now optional. However, disabling the cache is not recommended, as it can significantly increase runtime and slow down data processing.
  • Optimized training behavior

    The warm_start_steps parameter has been removed following internal improvements to the training loop.

Performance

Multiple internal optimizations have led to 2–3× faster performance on benchmark datasets, including:

  • Accelerated data loading pipeline
  • More efficient handling of time-based features for entities with long histories
  • Streamlined data validation and comparison logic

Fixes:

  • Resolved an issue where complex column types (e.g., lists of strings) caused errors during preprocessing. These types are now skipped with a warning.
  • Fixed excessive memory usage during the prediction phase.