In this tutorial, we will guide you though doing propensity predictions on one of the Kaggle datasets - namely this one : HM Personalized Fashion Recommendations

Step 1 - configure the foundation model and its data sources

First step is to correctly configure datasource connection and the foundation model. In this example, we will be using Snowflake data warehouse, but it is possible to use any other supported connection type found in documentation.

data_sources:
  - type: main_entity_attribute
    main_entity_column: customer_id
    name: customers
    data_location:
      database_type: snowflake
      connection_params:
        user: ${SNOWFLAKE_USER}
        password: ${SNOWFLAKE_PASSWORD}
        account: ${SNOWFLAKE_ACCOUNT}
        warehouse: ${SNOWFLAKE_WAREHOUSE}
        database: HM_KAGGLE
        db_schema: PRIVATE
      table_name: customers
  - type: event
    main_entity_column: customer_id
    name: transactions
    date_column: 
      name: t_dat
    data_location:
      database_type: snowflake
      connection_params:
        user: ${SNOWFLAKE_USER}
        password: ${SNOWFLAKE_PASSWORD}
        account: ${SNOWFLAKE_ACCOUNT}
        warehouse: ${SNOWFLAKE_WAREHOUSE}
        database: HM_KAGGLE
        db_schema: PRIVATE
      table_name: transactions

data_params:
  data_start_date: 2018-09-20 00:00:00
  validation_start_date: 2020-09-01 00:00:00
  
loader_params:
  batch_size: 256
  num_workers: 5

training_params:
  learning_rate: 0.0003
  epochs: 3

hidden_dim: 2048

In this case, we have already joined data from ARTICLES table into the TRANSACTIONS table, so we only provide 2 data sources: one of event type and one of attribute type.

For the purpose of this tutorial, we leave the rest of parameters unchanged. More details about them can be found in documentation.

Naturally, under connection_paramsusers need to provide their credentials for example to snowflake or any other supported database. We can move on to the next stage:

Step 2 - run the preprocessing and train foundation model

At this point, the only thing we need to to is to run the BaseModel on the previously prepared configuration file.

Let's run the following CLI command:

python -m monad.run --pretrain --config-path "path/to/config.yml" --output-path "path/to/store/pretrain/artifacts" --overwrite

At this stage user can monitor what is happening by reviewing the logs outputted to console.

After a while, data preprocessing will be done and the FoundationModel will be ready. All necessary files will be stored in the artifacts folder defined in the --features-path above.

fm - foundation model folder with model checkpoint
features - folder with features transformations/embeddings

This is it! Foundation Model is ready. We are ready to move on to the next stage.

Step 3 - configure a scenario model

At this stage we need to prepare configuration for the Downstream task training phase. This is done by writing your own python script or by modifying existing templates.

In this case, let's say we want to calculate propensity to buy specifc categories per user.

Let's have a look at training script used in this tutorial:

from typing import Dict

import numpy as np

from monad.ui.config import TrainingParams
from monad.ui.module import MultilabelClassificationTask, load_from_foundation_model
from monad.ui.target_function import Attributes, Events, has_incomplete_training_window, next_n_days, SPLIT_TIMESTAMP

TARGET_NAMES = [
    "Denim Trousers",
    "Swimwear",
    "Trousers",
    "Jersey Basic",
    "Ladies Sport Bottoms",
    "Basic 1",
    "Jersey fancy",
    "Blouse",
    "Shorts",
    "Trouser",
    "Ladies Sport Bras",
    "Casual Lingerie",
    "Expressive Lingerie",
    "Dress",
    "Dresses",
    "Tops Knitwear",
    "Skirt",
    "Nightwear",
    "Knitwear",
]
TARGET_ENTITY = "department_name"


def target_fn(_history: Events, future: Events, _entity: Attributes, _ctx: Dict) -> np.ndarray:
    target_window_days = 21
    if has_incomplete_training_window(_ctx, target_window_days):
        return None
    future = next_n_days(future, _ctx[SPLIT_TIMESTAMP], target_window_days)
    purchase_target, _ = future["transactions"].groupBy(TARGET_ENTITY).exists(groups=TARGET_NAMES)
    return purchase_target

# type of ML task and size of the model output
num_outputs = len(TARGET_NAMES)
task = MultilabelClassificationTask(num_classes=num_outputs)
 # number of classes to predict, here equal to count of departments

# paths
fm_path = "/path/to/your/model/fm"
checkpoint_dir="/path/to/store/your/downstream/scenario"

training_params = TrainingParams(
    learning_rate=0.0001,
    checkpoint_dir=checkpoint_dir,
    epochs=1,
    devices=[1],
)

if __name__ == "__main__":
    trainer = load_from_foundation_model(
        checkpoint_path=fm_path,
        downstream_task=task,
        target_fn=target_fn,
    )
    trainer.fit(training_params=training_params)

TARGET_ENTITY - in this case it is basically target column that has the interesting categories, that we define in TARGET_NAMES.

Next, we define the target_fn in a way, that describes users' purchase behaviour. In this case it will be labelling 1 the classes (specific brands) that the customer will purchase in the future.

We then need to specify the type of ML task and the size of the output:

num_outputs = len(TARGET_NAMES) specifies the number of classes the model (multilabel classifier in this case) needs to predict for.
task = MultiLabelClassificationTask(num_classes=num_outputs) defines the downstream learning task that will be used in training.

We need to also define two paths: to foundation model fm folder (to load the model) and to the directory we want to store our model checkpoints. We input the latter to TrainingParams where we can also choose to modify some parameters.

The final part of the script instantiates the trainer by loading foundation model and adding to it our new task, target function and output size, and trains the model using just defined training parameters.

Once the file is prepared, we want to run it via python train.py.

Step 4 - generate predictions

Now the final step - run predictions. You will run it the same way as before, by preparing and running a python script. In our case it it will look like this:

from monad.ui.config import OutputType, TestingParams
from monad.ui.module import load_from_checkpoint

if __name__ == "__main__":
    checkpoint_dir = "checkpoint"
    testing_module = load_from_checkpoint(checkpoint_dir)
    testing_params = TestingParams(
        local_save_location="checkpoint/preds.csv",
        output_type=OutputType.DECODED,
    )
    testing_module.predict(testing_params=testing_params)