Joining additional attributes
Enriching event tables by joining with additional attributes
Check This First!This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.
The role of the attributes
You can further enrich your training data by adding information about other entities - products, stores, services etc. This brings two benefits:
- foundation model is able to consider and accumulate this information while training,
- scenario models can then predict behavior in the context of information available in the attribute table
(eg. propensity to purchase a category, which is a fact available in product attributes table).
Learn MorePlease refer to the relevant section of Model Target Function to learn how to access columns from joined data sources in target functions of downstream models.
Joining the attributes - working example
For BaseModel to make use of this additional information about input entities you need to:
- define data sources with
attributetype tables as described here, - join them with your event data using
joined_data_sourcesfield.
Let's look at the following scenario:
- your
eventdata source contains the transactions of your customers (main entity) identified withcustomer_id, - you also have your customer characteristics provided in customers
main_entity_attributetable, - however, you also have other tables providing context information for products and stores that you want the model to make use of, that can be linked to transactions with
productIDandstoreIDrespectively.
In this case, we should use joined_data_sources to add product and store context to events to process in training. This is demonstrated in the code below:
- there is customers
main_entity_attributetable and then products and storesattributetables - transactions
eventtable is then joined with theattributetables:- products table by a single
product_idcolumn, - stores table by 2 columns:
store_idandformat.
- products table by a single
data_sources:
- type: main_entity_attribute
name: customers
main_entity_column: customer_id
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: user_attributes
- type: attribute
name: products
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: products
- type: attribute
name: stores
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: stores
- type: event
name: transactions
main_entity_column: customer_id
date_column:
name: event_time
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/location.json
schema_name: yourdataset
table_name: user_trans
joined_data_sources:
- name: products
join_on:
- [product_id, product_id]
- name: stores
join_on:
- [store_ID, store_ID]
- [format, format]
NoteThe whole table will be joined to the event table, unless the definition of
attributedata source is further modified by allowed / disallowed columns, lambdas etc as described here.
Updated 8 months ago
