Improved handling of time series (BETA)
Users can now enable improved handling of time series by declaring selected numeric columns as time-series. This feature provides superior representation of event sequences and intervals.
Automated sanitization and qualification of column names inwhere_condition
The resolve() function function can now be used in where_condition to enhance consistency and reduce the risk of errors.
Optimised memory utilization for datasources in parquet file format
More stable handling of parquet files used as data sources, including filtering data at an early stage and reading parquet files in chunks to reduce peak memory usage.
Enhanced history / future splitting
Additional sampling strategy ("existing") supports more modeling scenarios, such as basket context for next purchase prediction. Regular timestamps are now used for split points instead of day timestamps.\
Enhanced interpretability of time-based features
Provides deeper insight on the impact of time-based features by separating out periodical counts, sums, and means.
Event aggregations without grouping:
Users can now perform aggregation operations such as sum(), count(), mean(), min() and max() in the target function without needing to group events.
Capping number of CPU resources at fit stage
Users can now limit the utilization of computation resources during the fit stage with the num_cpus argument.
Fixes
Fixed an issue where certain custom metrics were not automatically cast to the appropriate data type.
Fixed an issue where certain features were not saved after a pretraining failure.
Fixed an issue where the most frequently interacting entities could be partially ignored when calculating validation metrics in recommendation tasks.
Fixed an issue where repeating column names across joined data sources might result in conflict.
Fixed an issue where the percentage of NaN values was incorrectly calculated for columns containing both NaN values and empty strings.
Fixed an issue where the CPU cap set with the num_cpus argument was ignored.
Fixed an issue where a .csv suffix was expected instead of .tsv for the predictions file.
Fixed an issue where a file lock set during event grouping resulted in a FileExistsError in case of slow storage.
Fixed an issue where interpret() resulted in an error for data sources with shared entities.
Fixed an issue where interpret() resulted in an error in case of empty quantiles for groups with no events.
Updated packages to improve performance, security, and compatibility with the latest features.
Grouped Decimal Features in Interpretability
Introduced the ability to handle and analyze grouped decimal features, enhancing model interpretability.
Event Attributions to interpret recommendation models
Users can now trace back and understand how specific events influence model outputs and predictions.
Prediction Storage in Snowflake Database
Added functionality to save predictions directly into a Snowflake database.
Data Source Name in Minimum Group Size Logs
Added logging of the data source name when enforcing minimum group size requirements.
Join Functionality for Attribute Data Sources (enhanced)
Expanded support to allow joining attribute data sources with multiple data sources.
Filtering on Extra Columns in Data Source Definition
Users can now filter, group, and leverage extra columns passed in the data source definition.
New Parameter inDataParams:training_end_date
Introduced the training_end_date parameter, providing more flexibility and control over model training timelines.
New Parameters inTestingParams: local_save_location,remote_save_location
Introduced local_save_location and remote_save_location as parameters within TestingParams.
🚧
Note
Please adapt your configuration file to reflect this syntax change.
Extended Group Max Retries
Default values of group computation retries and retry interval have been increased. Default forGROUPS_N_RETRIES is now set to 20 and default for GROUPS_RETRY_INTERVAL is now set to 60. This reduces the likelihood of failures due to transient issues and improves overall robustness. For more information refer to Dividing event tables section.
Entity Number Limit for Target Function Validation
The number of entities that can be used when validating target functions is now capped to ensure efficiency and prevent overload during the validation process.
Enhanced Debug Messages for Target Function Validation
More comprehensive debug messages have been added during target function validation to assist in troubleshooting and increase transparency in the validation process.
Fixes
Fixed issues with None values in grouping.
Fixed regression loss calculation and logging.
Fixed errors in pandas query parsing.
Improved Neptune alerter logging.
Removed unused validations and loss functions.
Optimized memory usage in interpretability.
Fixed handling of missing metrics in Neptune.
Reduced memory consumption.
Improved directory creation based on cache path.
Enhanced schema selection in Hive builder.
Handled potential NaN values in decimal calculations.
Docs
Updated the documentation navigation to be more readable and user-friendly.
Added Recipes section for easy reference when building target functions.