Joining additional attributes

Enriching event tables by joining with additional attributes

⚠️

Check This First!

This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.


The role of the attributes

You can further enrich your training data by adding information about other entities - products, stores, services etc. This brings two benefits:

  • foundation model is able to consider and accumulate this information while training,
  • downstream models can then predict behavior in the context of information available in the attribute table
    (eg. propensity to purchase a category, which is a fact available in product attributes table).

📘

Learn More

Please refer to the relevant section of Model Target Function to learn how to access columns from joined data sources in target functions of downstream models.


Joining the attributes - working example

For BaseModel to make use of this additional information about input entities you need to:

  • define data sources with attribute type tables as described here,
  • join them with your event data using joined_data_sources field.

Let's look at the following scenario:

  • your event data source contains the transactions of your customers (main entity) identified with customer_id,
  • you also have your customer characteristics provided in customers main_entity_attribute table,
  • however, you also have other tables providing context information for products and stores that you want the model to make use of, that can be linked to transactions with productID and storeID respectively.

In this case, we should use joined_data_sources to add product and store context to events to process in training. This is demonstrated in the code below:

  • there is customers main_entity_attribute table and then products and stores attribute tables
  • transactions event table is then joined with the attribute tables:
    • products table by a single product_id column,
    • stores table by 2 columns: store_id and format.

data_sources:
  
  - type: main_entity_attribute
    name: customers
    main_entity_column: customer_id
    data_location:
      database_type: bigquery
      connection_params:
        filename: /path/to/your/location.json
      schema_name: yourdataset
      table_name: user_attributes
   
  - type: attribute
    name: products
    data_location:
      database_type: bigquery
      connection_params:
        filename: /path/to/your/location.json
      schema_name: yourdataset
      table_name: products
  
  - type: attribute
    name: stores
    data_location:
      database_type: bigquery
      connection_params:
        filename: /path/to/your/location.json
      schema_name: yourdataset
      table_name: stores
  
  - type: event
    name: transactions
    main_entity_column: customer_id
    date_column: 
     	name: event_time
    data_location:
      database_type: bigquery
      connection_params:
        filename: /path/to/your/location.json
      schema_name: yourdataset
      table_name: user_trans
    joined_data_sources:
      - name: products
        join_on: 
          - [product_id, product_id]
      - name: stores
        join_on: 
          - [store_ID, store_ID]
          - [format, format]

🚧

Note

The whole table will be joined to the event table, unless the definition of attribute data source is further modified by allowed / disallowed columns, lambdas etc as described here.