Joining additional attributes
Enriching event tables by joining with additional attributes
Check This First!
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.
The role of the attributes
You can further enrich your training data by adding information about other entities - products, stores, services etc. This brings two benefits:
- foundation model is able to consider and accumulate this information while training,
- downstream models can then predict behavior in the context of information available in the attribute table
(eg. propensity to purchase a category, which is a fact available in product attributes table).
Learn More
Please refer to the relevant section of Model Target Function to learn how to access columns from joined data sources in target functions of downstream models.
Joining the attributes - working example
For BaseModel to make use of this additional information about input entities you need to:
- define data sources with
attribute
type tables as described here, - join them with your event data using
joined_data_sources
field.
Let's look at the following scenario:
- your
event
data source contains the transactions of your customers (main entity) identified withcustomer_id
, - you also have your customer characteristics provided in customers
main_entity_attribute
table, - however, you also have other tables providing context information for products and stores that you want the model to make use of, that can be linked to transactions with
productID
andstoreID
respectively.
In this case, we should use joined_data_sources
to add product and store context to events to process in training. This is demonstrated in the code below:
- there is customers
main_entity_attribute
table and then products and storesattribute
tables - transactions
event
table is then joined with theattribute
tables:- products table by a single
product_id
column, - stores table by 2 columns:
store_id
andformat
.
- products table by a single
data_sources:
- type: main_entity_attribute
name: customers
main_entity_column: customer_id
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: user_attributes
- type: attribute
name: products
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: products
- type: attribute
name: stores
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: stores
- type: event
name: transactions
main_entity_column: customer_id
date_column:
name: event_time
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: user_trans
joined_data_sources:
- name: products
join_on:
- [product_id, product_id]
- name: stores
join_on:
- [store_ID, store_ID]
- [format, format]
Note
The whole table will be joined to the event table, unless the definition of
attribute
data source is further modified by allowed / disallowed columns, lambdas etc as described here.
Updated 21 days ago