Release 0.11.0

New features:

  • Improved handling of time series (BETA)
    Users can now enable improved handling of time series by declaring selected numeric columns as time-series. This feature provides superior representation of event sequences and intervals.

  • Automated sanitization and qualification of column names in where_condition
    The resolve() function function can now be used in where_condition to enhance consistency and reduce the risk of errors.

  • Optimised memory utilization for datasources in parquet file format
    More stable handling of parquet files used as data sources, including filtering data at an early stage and reading parquet files in chunks to reduce peak memory usage.

  • Enhanced history / future splitting
    Additional sampling strategy ("existing") supports more modeling scenarios, such as basket context for next purchase prediction. Regular timestamps are now used for split points instead of day timestamps.

  • Enhanced interpretability of time-based features
    Provides deeper insight on the impact of time-based features by separating out periodical counts, sums, and means.

  • Event aggregations without grouping:
    Users can now perform aggregation operations such as sum(), count(), mean(), min() and max() in the target function without needing to group events.

  • Capping number of CPU resources at fit stage
    Users can now limit the utilization of computation resources during the fit stage with the num_cpus argument.

Fixes

  • Fixed an issue where certain custom metrics were not automatically cast to the appropriate data type.

  • Fixed an issue where certain features were not saved after a pretraining failure.

  • Fixed an issue where the most frequently interacting entities could be partially ignored when calculating validation metrics in recommendation tasks.

  • Fixed an issue where repeating column names across joined data sources might result in conflict.

  • Fixed an issue where the percentage of NaN values was incorrectly calculated for columns containing both NaN values and empty strings.

  • Fixed an issue where the CPU cap set with the num_cpus argument was ignored.

  • Fixed an issue where a .csv suffix was expected instead of .tsv for the predictions file.

  • Fixed an issue where a file lock set during event grouping resulted in a FileExistsError in case of slow storage.

  • Fixed an issue where interpret() resulted in an error for data sources with shared entities.

  • Fixed an issue where interpret() resulted in an error in case of empty quantiles for groups with no events.

  • Updated packages to improve performance, security, and compatibility with the latest features.