Data Model and Sources

Overall design of the data for your model

⚠️

Check This First!

This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.


At the heart of BaseModel is a behavioral foundation model that requires data sources capturing behavioral information. This article explains key terms, data structures, and provides an overview of how the data integrates.

Explanation of Key Terms

Entities

The entities are a collective name for subjects and objects linked by interaction, ie.:

  • Human beings whose behavior we want to model, e.g. users, customers, employees etc.
  • The objects they interact with: e.g. products, services, stores etc.

Each entity must have its unique ID.

Main Entity

The main entity is the entity whose behavior we focus on and want to predict. Most typically this would be your client, user, employee etc. Its identifier is called main_entity_column.

📘

Did you know?

Main entity does not have to be a human being! Although it is the less popular scenario, objects such as ATMs, cell towers, points of sale, bank branches, etc. can also play the role of main entity.


Example Scenarios

  • Main Entity: User
    We want to predict the users' favorite brand based on products they viewed.
    In this scenario, the main entity will be the user, the events will be page views, and the main_entity_column recorded for each event will be the unique identifier of the user, e.g. UserID.

  • Entity: ATM
    We want to predict the amount of cash that needs to be deposited in particular ATMs.
    In this scenario, the ATM will be the main entity, the events will be withdrawals over time, and the main_entity_column recorded for each event will be the ATM's unique identifier, e.g., ATM_serial_no.

Events

Events are observations recording interactions between entities over time, e.g. transactions, likes, complaints etc.
Below are a few examples of event data, although it is not an exhaustive list: the available events strongly depend on industry, organisation profile, environment and types of interacting entities.

Interaction typeExample events
web interactionspage views, searches, transactions, product returns, support queries
offline interactionstransactions, contracts signed, customer support calls
mobile app interactionsclicks, scrolls, push events, location, sensor data

For the use of BaseModel they need to be stored in tables containing the IDs of interacting entities and timestamp of the event, organized by event, and assigned event type.

Attributes

The characteristics of the entities which can be consumed by BaseModel to enrich its learnings.
Below are a few examples of possible attributes, although it is not an exhaustive list: the available attributes strongly depend on industry, organisation profile, environment and types of interacting entities.

EntityExample attributes
customersociodemographic, location, loyalty program, subscriptions
employeeseniority, specialty, skills
suppliername, location, industry

For the use of BaseModel they need to be stored in tables containing the entity ID and assigned attribute type.
Specifically, the characteristics of the main entity are called Main Entity Attribute and stored in a table of main_entity_attributetype.

Behavior Data Model Example

BaseModel data feeds come in three types of tables:

  • event ie. any of the tables containing the time-stamped interactions linked to the entities,
  • main_entity_attribute ie. the characteristics of the main entity,
  • attribute ie. properties of any other entity being interacted with by the main one.

For the foundation model to be trained you need min. one event table but it is recommended to add some attribute data to enrich the events. In practice, there may be multiple event and attribute tables.

Below is a simple example of the data that could be used to train BaseModel:


In the diagram above there are:

  • Two event tables, Transactions and Page Views; each record (event) must have:

    • The main_entity_column storing the unique identifier of the main entity (here: client_ID),
    • The date_column storing the timestamp of the event.
  • The main entity attribute table - with client_ID as main_entity_column (the unique identifier),

  • Two other attribute tables, each should be joined to at least one of event tables:

    • stores, with shop as the unique identifier of stores entity,
    • products, with sku as the unique identifier of product entity.

The following articles will explain how to define data sourcesin the YAML config file, and how to customize and enrich events' data.