February 13th, 2025

Release 0.11

New features:

Improved handling of time series (BETA)
Users can now enable improved handling of time series by declaring selected numeric columns as time-series. This feature provides superior representation of event sequences and intervals.
Automated sanitization and qualification of column names inwhere_condition
The resolve() function function can now be used in where_condition to enhance consistency and reduce the risk of errors.
Optimised memory utilization for datasources in parquet file format
More stable handling of parquet files used as data sources, including filtering data at an early stage and reading parquet files in chunks to reduce peak memory usage.
Enhanced history / future splitting
Additional sampling strategy ("existing") supports more modeling scenarios, such as basket context for next purchase prediction. Regular timestamps are now used for split points instead of day timestamps.\
Enhanced interpretability of time-based features
Provides deeper insight on the impact of time-based features by separating out periodical counts, sums, and means.
Event aggregations without grouping:
Users can now perform aggregation operations such as sum(), count(), mean(), min() and max() in the target function without needing to group events.
Capping number of CPU resources at fit stage
Users can now limit the utilization of computation resources during the fit stage with the num_cpus argument.

Fixes

Fixed an issue where certain custom metrics were not automatically cast to the appropriate data type.
Fixed an issue where certain features were not saved after a pretraining failure.
Fixed an issue where the most frequently interacting entities could be partially ignored when calculating validation metrics in recommendation tasks.
Fixed an issue where repeating column names across joined data sources might result in conflict.
Fixed an issue where the percentage of NaN values was incorrectly calculated for columns containing both NaN values and empty strings.
Fixed an issue where the CPU cap set with the num_cpus argument was ignored.
Fixed an issue where a .csv suffix was expected instead of .tsv for the predictions file.
Fixed an issue where a file lock set during event grouping resulted in a FileExistsError in case of slow storage.
Fixed an issue where interpret() resulted in an error for data sources with shared entities.
Fixed an issue where interpret() resulted in an error in case of empty quantiles for groups with no events.
Updated packages to improve performance, security, and compatibility with the latest features.

October 14th, 2024

Release 0.10

New features:

Extended data type error messages
Users can now see the column and data source in case of data type errors.
Accelerated processing of text features
Text features now employ proprietary serialized implementation.
Support for SQL lambdas when filtering entity IDs
Users can now use SQL lambda when filtering using entities_ids_subquery.

Fixes:

Updated packages to improve performance, improve security and for compatibility with the latest features.

September 10th, 2024

Release 0.9

New features

Grouped Decimal Features in Interpretability
Introduced the ability to handle and analyze grouped decimal features, enhancing model interpretability.
Event Attributions to interpret recommendation models
Users can now trace back and understand how specific events influence model outputs and predictions.
Prediction Storage in Snowflake Database
Added functionality to save predictions directly into a Snowflake database.
Data Source Name in Minimum Group Size Logs
Added logging of the data source name when enforcing minimum group size requirements.
Join Functionality for Attribute Data Sources (enhanced)
Expanded support to allow joining attribute data sources with multiple data sources.
Filtering on Extra Columns in Data Source Definition
Users can now filter, group, and leverage extra columns passed in the data source definition.
New Parameter inDataParams:training_end_date
Introduced the training_end_date parameter, providing more flexibility and control over model training timelines.
New Parameters inTestingParams: local_save_location,remote_save_location
Introduced local_save_location and remote_save_location as parameters within TestingParams.

🚧
Note
Please adapt your configuration file to reflect this syntax change.
Extended Group Max Retries
Default values of group computation retries and retry interval have been increased. Default forGROUPS_N_RETRIES is now set to 20 and default for GROUPS_RETRY_INTERVAL is now set to 60. This reduces the likelihood of failures due to transient issues and improves overall robustness. For more information refer to Dividing event tables section.
Entity Number Limit for Target Function Validation
The number of entities that can be used when validating target functions is now capped to ensure efficiency and prevent overload during the validation process.
Enhanced Debug Messages for Target Function Validation
More comprehensive debug messages have been added during target function validation to assist in troubleshooting and increase transparency in the validation process.

Fixes

Fixed issues with None values in grouping.
Fixed regression loss calculation and logging.
Fixed errors in pandas query parsing.
Improved Neptune alerter logging.
Removed unused validations and loss functions.
Optimized memory usage in interpretability.
Fixed handling of missing metrics in Neptune.
Reduced memory consumption.
Improved directory creation based on cache path.
Enhanced schema selection in Hive builder.
Handled potential NaN values in decimal calculations.

Docs

Updated the documentation navigation to be more readable and user-friendly.
Added Recipes section for easy reference when building target functions.

July 8th, 2024

Release 0.8

Features

Add max groups to event data source config
Support grouping for decimal modality
Implement groups for feature stats

Fixes

Align predict between classification and recommendation
Allow loss weighting in multiclass classification
Make clickhouse dialect provider support nullable columns
Max splitpoints set and logging configuration
Fix interpretability for decimal features

June 20th, 2024

Release 0.7

New features

Add dimension checks in interpretability
Allow to set maximum percentage of nulls in a column
Forbid undefined fields in configs
Handle none as entity_id in parquet files
Create new data source definition.
New config.yaml design.
Enable caching queried data
Handle duplicated column names when joining tables.
Support parquet data source.
Fix monad metrics
Validate allowed_columns.
Support lambdas at config level
Implement mechanism for metric initialization
Add joins to benchmarking configs
Cast main_entity_id to string
Validate columns uniqueness
Allow defining lambdas in extra columns
Add recommendations to interpretability
Set max number of expressions via environment variable
Verify if data source name contains any forbidden sequences

Fixes

Add recency modality slices to feature value interpretability
Allow join_on column in select
Allow None value for limit_train_batches
Always use stored config at pretraining phase
Changed defaults for loader params
Check data source type before accessing date column
Fix Recommendation model
Fix to date parsing in hive
Make snowflake config work with new setup
Use alias and table name correctly
Fix metrics in training params
Append suffix to with clause alias
Fix detecting cyclic joins.

April 23rd, 2024

Release 0.6

0.6.0 (2024-04-23)

Features

Add BM colors to interpretability plot
Add interpret function for use in scripts
Add methods for weighting training examples
Adjust hive to use ini files
Enable setting 'ignore_entities_without_events' flag.
Extract queries from connectors
Create common mechanism for query execution
Refactor query builders
Add treemap visualization
Add treemap generation from predefined hierarchy
Replace sampling method with actual sampling
Make attribution average optional.
Introduce Python 3.11
Add regression task to interpretability
Support training resuming
Create chunks based on partition column
Support booleans in fit stage

Fixes

Add quotation marks around table names in dialect providers
Add quotation marks to entity ids subquery
Add reset method to LongCastingMetric
Add return statement to FM get trainable module
Cast Hive decimal columns to float
Change cache dir type
Fixing id info parsing
Handle empty iterator while caching
Hash sketches hashing function and tests
Add options to change interpretability sample size
Fix time shift when caching datetimes.
Handle decimal types in Hive training iterator.
Fix ignore_entities_without_events flag
Fix combining tiles with the same name and different id
Catching prediction on None object and fixing runtime threshold
Remove dask-ml, bump ray, use compatible dask version
Set enable_checkpointing flag accordingly to the callbacks setup
Small fix in one-hot-encoders
Stop logging warnings for uppercase unquoted columns in snowflake

January 18th, 2024

Release 0.5

0.5.0 (2024-01-18)

Features

Add interpretability
Add logging column names
Add resume option for columns & fix minor bug related to text columns processing
Add target filtering to the inference module.
Use PyODBC for connecting with Hive.

Fixes

Chunking in hive queries fixed
Convert max num columns to int
Fix cleora circular dependency imports