Data Model and Sources
Overall design of the data for your model
At the heart of BaseModel is a behavioral foundation model that requires data sources capturing behavioral information. This article explains key terms, data structures, and provides an overview of how the data integrates.
Explanation of Key Terms
Entities
The entities are a collective name for subjects and objects linked by interaction, ie.:
- Human beings whose behavior we want to model, e.g. users, customers, employees etc.
- The objects they interact with: e.g. products, services, stores etc.
Each entity must have its unique ID.
Main Entity
The main entity is the entity whose behavior we focus on and want to predict. Most typically this would be your client, user, employee etc. Its identifier is called main_entity_column
.
Note
Main entity does not have to be a human being! Although it is the less popular scenario, objects such as ATMs, cell towers, points of sale, bank branches, etc. can also play the role of main entity.
Example Scenarios
-
Main Entity: User
We want to predict the users' favorite brand based on products they viewed.
In this scenario, the main entity will be the user, the events will be page views, and themain_entity_column
recorded for each event will be the unique identifier of the user, e.g. UserID. -
Entity: ATM
We want to predict the amount of cash that needs to be deposited in particular ATMs.
In this scenario, the ATM will be the main entity, the events will be withdrawals over time, and themain_entity_column
recorded for each event will be the ATM's unique identifier, e.g., ATM_serial_no.
Events
Events are observations recording interactions between entities over time, e.g. transactions, likes, complaints etc.
Below are a few examples of event data, although it is not an exhaustive list: the available events strongly depend on industry, organisation profile, environment and types of interacting entities.
Interaction type | Example events |
---|---|
web interactions | page views, searches, transactions, product returns, support queries |
offline interactions | transactions, contracts signed, customer support calls |
mobile app interactions | clicks, scrolls, push events, location, sensor data |
For the use of BaseModel they need to be stored in tables containing the IDs of interacting entities and timestamp of the event, organized by event, and assigned event
type.
Attributes
The characteristics of the entities which can be consumed by BaseModel to enrich its learnings.
Below are a few examples of possible attributes, although it is not an exhaustive list: the available attributes strongly depend on industry, organisation profile, environment and types of interacting entities.
Entity | Example attributes |
---|---|
customer | sociodemographic, location, loyalty program, subscriptions |
employee | seniority, specialty, skills |
supplier | name, location, industry |
For the use of BaseModel they need to be stored in tables containing the entity ID and assigned attribute
type.
Specifically, the characteristics of the main entity are called Main Entity Attribute and stored in a table of main_entity_attribute
type.
Behavior Data Model Example
BaseModel data feeds come in three types of tables:
event
ie. any of the tables containing the time-stamped interactions linked to the entities,main_entity_attribute
ie. the characteristics of the main entity,attribute
ie. properties of any other entity being interacted with by the main one.
For the foundation model to be trained you need min. one event table but it is recommended to add some attribute data to enrich the events. In practice, there may be multiple event and attribute tables.
Below is a simple example of the data that could be used to train BaseModel:
In the diagram above there are:
-
Two event tables, Transactions and Page Views; each record (event) must have:
- The main_entity_column storing the unique identifier of the main entity (here: client_ID),
- The date_column storing the timestamp of the event.
-
The main entity attribute table - with client_ID as main_entity_column (the unique identifier),
-
Two other attribute tables, each should be joined to at least one of event tables:
- stores, with shop as the unique identifier of stores entity,
- products, with sku as the unique identifier of product entity.
The following articles will explain how to define data sourcesin the YAML
config file, and how to customize and enrich events' data.
Updated 4 months ago