Skip to content

Change Log


Release 1.70

Unreleased — targets monad 1.7.0

This release adds offline batch prediction from the command line and Databricks as a prediction output, gives finer control over distributed-training timeouts and query chunking, makes seeding reproducible down to modality dropout, and improves GradientSHAP attribution quality — alongside reliability fixes for checkpoint resume and progress reporting.

New Features

  • Offline batch prediction from the command line
    A new python -m monad.run --predict stage scores new data with an already-trained checkpoint, without custom code. It takes a --checkpoint-path and a --testing-params-path YAML (prediction_date, output_type, entity_ids, and a local and/or remote save location). It runs on a single GPU, and local output is written as TSV. See Inference and Training Execution.

  • Write predictions to Databricks
    remote_save_location now supports Databricks in addition to Snowflake. The target table is created on demand and rows are appended in batches, with the batch size tunable through the DATABRICKS_WRITE_BATCH_SIZE environment variable (default 1000). See Writing Predictions.

  • Configurable distributed-training timeouts
    Two new TrainingParams fields help long multi-GPU runs avoid spurious timeouts: nccl_timeout extends the timeout for NCCL collective operations, and rank_sync_timeout adds a dedicated per-step barrier that absorbs data-loading skew between ranks independently of the gradient synchronization. Both default to off and are ignored on a single device. See Distributed Training.

Improvements

  • Separate query-chunking controls for fit and data loading
    The single num_query_chunks setting is replaced by two: cleora_num_query_chunks for the fit (embedding) phase and data_loading_num_query_chunks for the train, validation, test and predict queries, so database memory pressure can be tuned for each phase independently. This is a breaking change — configurations that set num_query_chunks must be migrated to the new fields. Note that mid-epoch resume is not supported when data_loading_num_query_chunks is greater than 1. See query_optimization.

  • Reproducible modality dropout
    The seed parameter now also governs sketch-dropping randomness inside modalities, which was previously fixed internally. Seeded runs are now reproducible end to end.

  • More meaningful GradientSHAP attributions
    GradientSHAP now draws its baseline from a real background distribution sampled across the predict set, instead of an all-zero baseline. Attributions are more representative of the data — and, as a consequence, are now stochastic and seed-dependent, so they will differ from earlier releases. See GradientShapInterpreter.

  • Resilient caching
    Cache-write failures are now retried (up to three attempts) and then fall back to streaming data directly from the database, instead of failing the run.

  • Smaller checkpoints and faster mid-epoch resume
    Buffer state is serialized in a compact form, making checkpoints smaller and resuming training mid-epoch faster and lighter on memory.

  • More accurate Cleora resource estimation
    Per-column basket statistics are now computed in a single SQL pass, improving the accuracy of memory and worker sizing estimates.

Fixes

  • Checkpoint resume with time-series features
    Fixed a failure when loading or resuming from a checkpoint for models that use time-series features.

  • Progress bar respects the entity_ids filter
    When a run is restricted with entity_ids, the progress bar's total entity count now reflects the filtered subset instead of the full entity set.


Release 1.50

May 2026

This release expands interpretability with a second attribution method and a client-ready visual report, adds composable pattern-matching utilities for target function authoring, makes the Cleora embedding dimension configurable, and brings reliability and performance improvements across Databricks, Cleora, fasttext, and DuckDB.

New Features

  • GradientSHAP attribution method
    interpret() now supports a second attribution method, GradientSHAP, selected with method="gradient_shap". It can be faster than the default Integrated Gradients on many workloads and produces denser per-feature attributions, while Integrated Gradients remains the default for deterministic and audit-friendly workflows. See Attribution Methods for the full comparison and a switch example.

  • SHAP-library-style report for client hand-off
    A new save_shap_plots=True flag on interpret() renders a parallel shap/ directory containing beeswarm, global bar, heatmap, top-N waterfall, and interactive force plots, plus a single static shap_report.html index that links everything together. The report is designed for client hand-off and requires the optional shap extra (poetry install -E interpretability). See the new SHAP Report guide.

  • Direct interpreter access for scripted attribution
    Four new task-specific interpreter classes — GradientShapInterpreter, ClassificationGradientShapInterpreter, RecommendationGradientShapInterpreter, and RegressionGradientShapInterpreter — are exported from monad.interpretability for users who batch attribution outside the standard interpret() pipeline or want to tune knobs like n_samples, stdevs, or seed.

  • Standalone SHAP report helpers
    New attributions_to_shap_explanation() and save_shap_report() helpers are exposed at the package root, so the SHAP-style report can be rendered from already-computed attributions without re-running interpret(). This is useful when integrating reports into custom batched pipelines.

  • Pattern utilities for target function authoring
    The Pattern API for matching events in target functions gains six new composable utilities — first and last properties, not_followed_by, to_pattern, elapsed_time, and count_within — reducing repetitive timestamp-handling code for negation patterns, occurrence selection, time-to-event, and frequency counting.

  • Configurable Cleora embedding dimension
    A new cleora_dim field controls the output dimension of Cleora graph embeddings, letting you steer model input size and memory footprint per dataset instead of relying on the previously hard-coded default.

  • Suppress selected foundation model features during fine-tuning
    Selected foundation model modalities can now be ignored when fine-tuning a downstream model. This is useful when adapting a pretrained foundation model to a task that does not benefit from the full FM feature set.

  • Pre-fit uniqueness check on the main entity column
    BaseModel now verifies that the main entity attribute column contains unique values before training begins, catching schema mistakes early instead of mid-run.

  • Skip already-computed features on resume
    Resumed fits no longer re-run data sample analysis for features that were already computed in a previous run, shortening warm-restart cycles on large datasets.

Improvements

  • Resource estimation is now opt-in
    The resource estimation step introduced in Release 1.30 is no longer enabled by default, reducing time-to-first-run for users who do not need an estimate on every fit. It can still be enabled explicitly when needed.

  • Columnar streaming from Databricks
    Data is now streamed from Databricks in columnar format, reducing transfer overhead and improving throughput on large reads.

  • Databricks reliability and extensibility
    The Databricks integration now retries transient failures automatically, and extra parameters can be passed through to the Databricks SQL connector for advanced configuration.

  • Faster Cleora aggregation on large graphs
    Cleora query planning and aggregation have been optimized to reduce runtime on large graphs.

  • Lower fasttext memory and CPU footprint
    fasttext training now caps CPU usage and limits the volume of text it consumes per run, reducing RAM pressure and contention on machines with many cores.

  • Lower DuckDB memory footprint
    DuckDB-backed workloads now use less RAM, making large fits more feasible on smaller machines.

Fixes

  • Correct NULL handling in Cleora subqueries
    Fixed an issue where NULL values in Cleora subqueries were not handled correctly, which could produce inaccurate downstream aggregation results. Users who ran Cleora-based pipelines on datasets containing NULLs in graph inputs are advised to re-run affected fits to ensure result correctness.

  • Faster query planning via duplicate CTE elimination
    Fixed a planner issue where common table expressions could be emitted twice during query concatenation, causing the same work to be executed redundantly. Affected queries now run faster with no change in output.

  • Third-party dependency refresh
    Upstream dependencies have been updated to their latest compatible versions for improved security and stability. No breaking changes are expected.


Release 1.30

March 2026

This release introduces real-time training progress visibility, pre-training resource estimation, a quick validation mode, and richer data profiling reports — alongside inference performance improvements and expanded database support.

New Features

  • Live training progress tracking
    Training now displays a live progress bar showing entity counts, so you can monitor long-running jobs without checking logs. Progress is reported for both foundation model training and inference.

  • Resource estimation before training
    You can now estimate memory and compute requirements before launching a full training run. This helps you choose the right hardware configuration and avoid out-of-memory failures on large datasets.

  • Quick check mode
    A new fast validation mode lets you run a quick sanity check on your configuration and data pipeline before committing to a full training run. Quick check applies data limits automatically so you get rapid feedback on configuration errors or data issues.

  • Enriched data profiling reports
    The fit report now includes inferred column types, lists of skipped and special columns, and actionable recommendations. This makes it easier to validate your data setup and catch misconfigured columns before training.

  • Automatic redundancy detection for categorical columns
    BaseModel now detects redundant categorical columns (e.g. two columns that are exact mappings of each other) during the fit phase and reports them, helping you simplify your data schema.

  • Suggested config generation
    After the fit stage, BaseModel now generates a suggested_config.yaml file that applies the column report findings to your original config. Detected time-series columns are added as sql_lambdas with column_type_overrides, and redundant bijection columns are added to disallowed_columns. It's ready to use as a config.yaml. After the review you can use it as your new config.

  • Automatic DataLoader calibration
    BaseModel can now automatically find the optimal num_workers and prefetch_factor for the DataLoader before foundation model training. When enabled, the system benchmarks multiple configurations and selects the most efficient one — improving data loading throughput without manual tuning.

Improvements

  • Faster and more efficient inference pipeline
    The inference pipeline has been restructured for better throughput and lower latency, with improved memory handling during data decoding.

  • Flexible entity split percentages
    Entity split ratios for training, validation, and test sets now accept decimal values (e.g. 0.7, 0.15, 0.15), giving you finer control over data partitioning.

  • Configurable validation batch size
    A new val_batch_size parameter allows you to set a separate batch size for validation, useful when validation data has different memory characteristics than training data.

  • Parquet data source entity ID support
    Parquet data sources now support entity ID filtering, reaching full feature parity with other supported database connectors.

  • Clearer error messages
    Error messages across training and inference have been consolidated and improved, providing more context and actionable guidance when something goes wrong.

  • Improved text feature handling
    Text feature processing is now more robust, with better normalization of text embeddings for more consistent model performance.

  • Horizontal scaling and model quantization for inference
    Inference deployments now support num_replicas for horizontal scaling. Inference can also load quantized models for reduced memory footprint.

  • Automatic GPU device selection
    training_params.devices now defaults to "auto", which automatically selects the least-occupied GPU. Falls back to CPU if no GPUs are available. Existing configs with explicit device values are unaffected.

Fixes

  • Dataset seed consistency
    Fixed an issue where the random seed was not incremented correctly between epochs, which could lead to less varied sampling across training runs.

  • Batch limit enforcement
    Fixed an issue where dataset element limits were not always enforced at epoch boundaries, potentially causing longer-than-expected training epochs.

  • Third-party package updates
    Updated packages to improve performance, strengthen security, and ensure compatibility with the latest features.


Release 1.20

November 17, 2025

This release introduces more flexible time-window handling, improved checkpointing and resume behavior, richer metric support, and continued enhancements to image-based modeling.

New Features

  • Mid-epoch resume support when using multiple GPU Training can now be resumed mid-epoch also when training in parallel on multiple GPUs. By default, checkpoints are created between training and validation. Users can also configure checkpointing every n steps, enabling faster recovery from interruptions without restarting an entire epoch.

  • Flexible time window selection for history and future Replacing next_n_days and next_n_hours with interval_from, a more general time-window utility that allows users to define arbitrary time intervals using precise durations. This enables selecting both historical and future periods with higher precision and clearer semantics, especially for short or irregular time horizons. Previously used target functions require refactoring — please refer to documentation and recipes.

  • Ranking metrics support (MRR, NDCG) Added built-in support for common ranking metrics such as Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG), simplifying evaluation of recommendation and ranking scenarios.

Improvements

  • NaN-robust aggregations by default Aggregation functions (sum, mean, min, max) now ignore NaN values by default, leading to more stable and predictable results when working with incomplete or noisy data.

  • Flexible expressions across grouping and filtering Grouping and filtering operations, as well as aggregation methods, now accept Python callables everywhere a column name was previously required. This allows users to compute values dynamically — such as trans['price'] * trans['quantity'] — and use them directly for grouping, filtering, or aggregation.

  • Image features in shared entities Shared entities now support image embeddings, allowing visual features to be unified across multiple data sources just like text or categorical attributes.

  • More precise training window validation The has_incomplete_training_window function now supports finer-grained time units, allowing checks in minutes, hours, or days. Previously used target functions require refactoring — please refer to documentation and recipes.

Fixes

  • Databricks timestamp handling Fixed an issue where timestamps could be misinterpreted when reading data from Databricks sources.

  • Third-party package updates Updated packages to improve performance, strengthen security, and ensure compatibility with the latest features.


Release 1.00

November 3, 2025

This major release introduces image embedding support, improved data streaming efficiency, and enhanced caching performance monitoring, alongside multiple stability and documentation updates.

With the introduction of image embeddings, multiple core refactors, and the publication of the API Reference, BaseModel officially reaches version 1.00.

New Features

  • Image embedding support Users can now add images, and BaseModel automatically generates image embeddings that integrate seamlessly with behavioral, text, and tabular data for complete multimodal modeling.

  • Unix timestamp support Users can now use Unix timestamps directly in time-related functions for greater flexibility in data processing.

Improvements

  • Improved data streaming efficiency Reduced memory usage and increased performance for large datasets, resulting in smoother and faster data handling.

  • Revamped timezone handling Enhanced timestamp alignment across multiple data sources for consistent temporal comparisons.

  • Robust handling of missing numerical data More stable aggregation and event computations when numerical values are partially missing.

  • Optimized data transformations Improved efficiency when processing large data structures within pipelines.

  • Simplified async stream handling Streamlined background data operations for greater reliability and maintainability.

  • Improved query consistency More predictable and stable query behavior across data modules, enhancing reliability in data access.

  • Additional caching performance benchmarks Improved cache performance benchmarking across supported databases, enabling further optimization.

Fixes

  • Trainer loss logging Fixed an issue where train_loss_epoch could log as NaN during certain training configurations.

  • Time series count handling Corrected how time series model manages series of counts, ensuring accurate scaling and alignment.

  • Checkpoint reliability Fixed a checkpointing issue that could prevent model state from saving correctly during long training sessions.

  • Minor bugs and code maintenance Fixed various small issues and improved overall code stability.

Documentation

  • Comprehensive API Reference Users can now access a complete API Reference section describing all classes and functions available in BaseModel.

  • New guides and FAQ Added a new FAQ section and a detailed guide on Data Types & Features.


Release 0.20

October 9, 2025

This technical release focuses on stability and model robustness improvements.

New Features

  • Custom metrics support Users can now define and register custom evaluation metrics within downstream models. This includes full compatibility with torchmetrics, duplicate-name checks, and consistent integration across training, validation, and monitoring phases.

Improvements

  • Improved stability and speed of real-time inference Optimized the inference server interface and pipeline for more stable initialization, better resource utilization, and faster CI execution.

  • Improved numerical stability of training on highly multimodal data Enhanced buffer sampling and shuffling to ensure better coverage of training examples, smoother convergence, and improved overall training stability.

  • Improved regression and classification predictions Revised the random splitting strategy and applied a uniform mixture to raw scores, leading to more balanced score distributions and reduced training bias.

Fixes

  • Sketch width and depth alignment Prevented potential crashes caused by conflicting sketch dimensions when handling certain class counts.

  • Date parsing for time series Fixed an issue where date columns were not parsed or sanitized in time-series data when a date format was provided.

  • One-hot recommendation metrics for low candidate counts Prevented crashes on OneHotRecommendationTask models when the candidate pool was smaller than k.

  • Text embedding stability Fixed occasional crashes during text model training under specific edge conditions.

  • Short time-series handling Resolved a crash that occurred when the time-series length was shorter than the kernel size.

  • Insufficient training data handling Introduced graceful exit with a clear message when available data is insufficient to fill a full batch across all selected devices.


Release 0.19

September 1, 2025

New Features

  • Time window slicing for event data Event data sources can now be restricted to a defined start and end timestamp with the new slice_time_window function. This allows analyses and training runs to focus on specific periods without extra preprocessing.

  • Direct access to date columns Event data sources now expose a date column (timestamps), allowing time-based filtering and grouping with filter() and groupBy() functions.

Improvements

  • Simplified interpretation output The interpret function now produces cleaner results by removing non-essential fields.

  • Improved window shuffling defaults The buffer size has been changed to 100k, enhancing randomness in shuffled windows and improving generalization during training.

Fixes

  • Loss computation Refined loss calculations in downstream models.

  • Module visibility Ensured consistent access to essential modules.

  • Date handling Added support for numeric date formats and fixed timestamp edge cases.

  • Split point generation Restricted to explicitly defined data sources.

Documentation

  • Expanded sample target functions A new set of ready-to-use target functions has been added to help prototype and compare approaches more quickly.

Release 0.18

August 7, 2025

New Features

  • New recommendation task for limited item pool When fine-tuning for recommendation problems, users can now choose from two specialized classes: OneHotRecommendationTask (fixed-size vector for the total number of recommendable entities) and RecommendationTask (probabilistic sketch representation). The one-hot variant is especially beneficial when the number of recommendable entities is relatively low (e.g. <5,000).

  • Loss weighting for One-Hot Recommendation and Multilabel Classification Users can now optionally return weights in the target function to control the relative importance of individual target elements.

  • Entity filtering Users can now define which targets should be included or excluded from predictions and ground truth using predictions_to_include_fn and predictions_to_exclude_fn parameters.

  • Adjustable number of split points for random sampling strategy The maximum number of split points per observation used during a single epoch has been increased from 1 to the square root of the number of event timestamps when target_sampling_strategy="random".

  • Flexible entity split Entity split parameters such as training, validation, test, and training_validation_end can now be changed during scenario model training.

  • Sketch merging support Sketches derived from shared entity columns can now be added to other sketches, enabling hybrid representations that combine different behavioral signals.

  • Retention policy support during training Users can now use the entity_history_limit parameter to define the maximum history time range for a single observation per data source during training.

  • Extended logging in ML experiment tracking tools The number of validation batches is now logged in lifecycle management tools such as Neptune and MLflow.

Improvements

  • Consistent entity split Entity split into training, validation, and test is now consistent between fit and train_foundation_model phases.

Fixes

  • Timezone in date columns Fixed an error when date columns containing time zone caused errors during the fit phase.

  • Snowflake token authorization Resolved an issue when Snowflake token authorization was used unconditionally if a token was present.

  • Multi-GPU checkpoint overwrite Fixed an issue when the overwrite setting removed checkpoint files during multi-GPU training.

  • Cache creation Restored cache creation when enabled in configuration.


Release 0.17

July 4, 2025

New Features

  • Hybrid train/test split Users can now combine entity-based training and validation splits with a time-based test set, mirroring production scenarios more closely.

  • Limited end date of training and validation Introduced the training_validation_end parameter to limit the latest date included in training and validation splits.

  • Flexible training validation interval Introduced check_val_every_n_steps and check_val_every_n_epochs parameters for more granular control over validation frequency.

  • Reproducible results A seed parameter has been added to key methods (fit_behavioral_representation, train_foundation_model, pretrain, fit, evaluate, predict, and test) to ensure consistent outputs across runs.

  • Flexible Kerberos configuration Separate realm for Kerberos can now be defined with kinit_realm parameter while realm for connection string can be defined in the ini file.

Improvements

  • Refactored run continuation logic The overwrite and resume parameters must now be passed directly to the fit method rather than being read from TrainingParams.

  • Refined interpretability date specification The target date for interpretation should now be provided via the prediction_date parameter.

  • Normalized feature importance in interpretability Feature importance scores are now normalized based on each feature's input size, enabling fairer comparisons.

  • Improved Parquet cache behavior Cache is automatically refreshed if the source Parquet file has changed — no manual deletion required.

  • Accelerated training Multiple internal optimizations have led to 2–3× faster performance on benchmark datasets, including accelerated data loading, more efficient handling of time-based features, and streamlined validation logic.

Fixes

  • Complex column types Resolved an issue where complex column types (e.g. lists of strings) caused errors during preprocessing.

  • Prediction memory usage Fixed excessive memory usage during the prediction phase.


Release 0.15

May 7, 2025

New Features

  • Entity-based splits Users can now separate training, validation, and testing sets based on time range or entity ID.

  • New method for selecting entity IDs A new entity_ids parameter can be added to YAML configuration to act as a global filter during foundation model training. The same parameter is available in TrainingParams and TestingParams for scenario-level control. The previously used entities_ids_subquery parameter has been removed.

Improvements

  • Refactored feature loading pipeline Improvements to ensure stability and clarity of error messages. Key changes: use_recency_sketches and use_last_basket_sketches must now be passed directly to pretrain() or train_foundation_model; features_path can no longer be modified at scenario training stage; the data_loading_params block has been removed from the configuration file.

  • Enhanced BigQuery connector Users can now specify a different project as computation engine and a different one as data location.

  • Standardized dependency error messages All optional dependency checks now generate standardized error messages with clear instructions.

Fixes

  • Validation set date Fixed an error when validation set starting date could result in empty history.

  • Test set date handling Fixed an error when the starting date of the testing set could be treated as history.

  • Low-cardinality categorical columns Fixed an error when a column with less than two values overridden to categorical type caused preprocessing to fail.

  • Interpretability duplicated modalities Fixed an error when interpretability returned duplicated modality names.

  • Classification threshold requirement Fixed an error where classification tasks required thresholds even when output type was set to SEMANTIC.

  • Package updates Updated packages to improve performance, security, and compatibility.


Release 0.14

April 10, 2025

New Features — Core BaseModel Repository

  • Modular foundation model training The two components of the pretrain stage — data preprocessing & representation fitting, and FM training — can now be run independently via fit_behavioral_representation and train_foundation_model.

  • Flexible prediction outputs Introduced different types of predicted output defined by mandatory output_type parameter. Added readout_sketch and read_target_entity_ids functions to map recommendation outputs to feature values.

  • Enhanced model training with early stopping Introduced early_stopping parameter to prevent overfitting.

  • Expanded model interpretability Introduced interpret_entity function to compute event-level attributions for a single main entity.

  • Automated model testing Introduced test method to compute metrics based on predictions and ground truth.

  • Flexible BigQuery connection Added project_id parameter to define project different from the one in the service account.

New Features — GUI Application (Snowflake Native)

  • Cascading run execution Enables dependent jobs/runs to trigger in sequence.

  • Run and job status tracking Added detailed status tracking for better monitoring.

  • New table designs Updated tables with improved layout and readability.

  • Validation improvements Enhanced input and data validation across the platform.

  • Multi-GPU training Enabled distributed model training across multiple GPUs.

  • Listing state restoration Automatically restores UI state when returning to listings.

Fixes

  • Distributed training entity loading Fixed an issue where all main entities were loaded on one GPU during distributed training.

  • Multiclass default metric Fixed an issue where default multiclass metric was returning an error.

  • Time-series interpretability Fixed an issue where interpretability attributions for time-series features were empty.

Dependencies

  • The dask library is no longer a dependency.

Release 0.13

March 17, 2025

New Features

  • Expanded model scalability Added support for FSDP2 to enable distributed training across multiple GPUs.

  • Flexible inference customization Introduced targets_to_include and targets_to_exclude parameters for more control over inference outputs.

  • Enhanced diagnostics and issue tracking Improved exception handling and logging during inference for better troubleshooting.

  • Advanced interpretability for time series Enabled support for interpreting time-series variables in model explanations.

Fixes

  • Time-series model resume Fixed an error that prevented models with time-series features from resuming properly.

  • Foundation model head parameter Fixed an issue where the parameter controlling the use of the foundation model head was not properly passed.

  • Package updates Updated packages to improve performance, security, and compatibility.


Release 0.12

February 13, 2025

New Features

  • Enhancements to Foundation Model training Improvements to the foundation model training process, leading to better performance in downstream applications. Key changes include simplified configuration by removing certain time-based parameters, an updated optimizer that eliminates manual learning rate scheduling, improved feature representation through automated parameter tuning, and performance gains from optimized data handling.

  • Improved Parquet file support Faster and more scalable data processing with an upgraded backend engine, enhanced memory management, and expanded support for advanced query operations.

  • Interpretability Added support for event-level interpretability in classification and regression models.

Fixes

  • Column fit resume flag Fixed an issue where the _FINISHED flag for column fit tasks was occasionally set incorrectly, resulting in an unstable resume option.

  • Training log output Fixed an issue where the log in main.log was empty, incomplete, or incorrectly written during model training.

  • Next-n-hours timestamp exclusion Fixed an issue where the next_n_hours parameter used in the target function was excluding the starting timestamp.

  • Package updates Updated packages to improve performance, security, and compatibility.


Release 0.11

February 13, 2025

New Features

  • Improved handling of time series (BETA) Users can now enable improved handling of time series by declaring selected numeric columns as time-series. This feature provides superior representation of event sequences and intervals.

  • Automated sanitization and qualification of column names in where_condition The resolve() function can now be used in where_condition to enhance consistency and reduce the risk of errors.

  • Optimized memory utilization for Parquet data sources More stable handling of parquet files, including filtering data at an early stage and reading parquet files in chunks to reduce peak memory usage.

  • Enhanced history / future splitting Additional sampling strategy (existing) supports more modeling scenarios, such as basket context for next purchase prediction. Regular timestamps are now used for split points instead of day timestamps.

  • Enhanced interpretability of time-based features Provides deeper insight on the impact of time-based features by separating out periodical counts, sums, and means.

  • Event aggregations without grouping Users can now perform aggregation operations such as sum(), count(), mean(), min(), and max() in the target function without needing to group events.

  • Capping number of CPU resources at fit stage Users can now limit the utilization of computation resources during the fit stage with the num_cpus argument.

Fixes

  • Custom metric casting Fixed an issue where certain custom metrics were not automatically cast to the appropriate data type.

  • Feature saving after pretraining failure Fixed an issue where certain features were not saved after a pretraining failure.

  • Recommendation validation metrics Fixed an issue where the most frequently interacting entities could be partially ignored when calculating validation metrics in recommendation tasks.

  • Duplicate column names across data sources Fixed an issue where repeating column names across joined data sources might result in conflict.

  • NaN percentage calculation Fixed an issue where the percentage of NaN values was incorrectly calculated for columns containing both NaN values and empty strings.

  • CPU cap enforcement Fixed an issue where the CPU cap set with the num_cpus argument was ignored.

  • Predictions file suffix Fixed an issue where a .csv suffix was expected instead of .tsv for the predictions file.

  • File lock during event grouping Fixed an issue where a file lock set during event grouping resulted in a FileExistsError in case of slow storage.

  • Interpretability with shared entities Fixed an issue where interpret() resulted in an error for data sources with shared entities.

  • Interpretability with empty quantiles Fixed an issue where interpret() resulted in an error in case of empty quantiles for groups with no events.

  • Package updates Updated packages to improve performance, security, and compatibility.