Managing Data
BaseModel learns from behavioral data — time-stamped interactions between entities. This page explains the data model BaseModel expects, how it automatically turns your columns into features, and where to go for each configuration task.
Basic Configuration already sets up a working data_sources and data_params block. The guides below help you expand and refine that starting point.
In this section
| Guide | Description |
|---|---|
| This page | Understand the data model and automatic feature encoding |
| Connect Sources | Swap Parquet for the production backend you use |
| Select & Organize | Add more tables, filter columns and rows, configure joins, and fine-tune the split |
| Enrich & Transform | Layer on computed columns, type overrides, event grouping, and shared entities |
The Behavioral Data Model
The behavioral data model that powers every foundation model is built on three concepts:
- entities — the actors and objects
- events — their interactions over time
- attributes — their properties
Entities
Entities are the subjects and objects linked by interaction: users, customers, employees on one side; products, stores, services on the other. Each entity has a unique ID.
- Main entity — the one whose behavior you want to model and predict. Most often a customer or user, though it can be anything with a behavioral history (ATMs, cell towers, points of sale).
- Other entities — the objects the main entity interacts with: products, stores, services, content items.
Events
Events are time-stamped records of interactions between entities — transactions, page views, support calls, sensor readings. For BaseModel to learn, each event must include the main entity's ID and a timestamp. You need at least one event table to train a foundation model.
Attributes
Attributes are the characteristics of entities that enrich the model's understanding — customer demographics, product categories, store metadata.
- Main entity attributes — one row per main entity (e.g., a customer profile table). Automatically joined on the entity ID.
- Attributes — dimension tables for other entities (e.g., a product catalogue). Require explicit joins to event tables.
Attribute tables are optional but recommended — they give the model richer context about the entities involved in each event.
How BaseModel Encodes Your Data
BaseModel automatically infers feature types from your columns and encodes them for training. You don't need to do feature engineering — but understanding the mapping helps you validate the schema and decide when an override is worthwhile.
The inference pipeline works in this order:
- Declared columns — columns you designate as
date_columnandmain_entity_columnare excluded from feature generation. - User overrides — any column listed in
column_type_overridesgets the assigned type; inference is skipped. - Auto-detection — all remaining columns are classified based on schema metadata and value statistics.
Standard feature types
| Type | When inferred | Encoding | Example columns |
|---|---|---|---|
| Decimal | Float columns | Normalized numeric | price, amount, rating |
| Categorical | String / int with low cardinality | One-hot | status, country, channel |
| Categorical Compressed | String / int with high cardinality | Learned embedding | product_id, session_id |
| Timestamp | Declared date_column |
Temporal decomposition | transaction_date |
Advanced feature types (require explicit override)
| Type | Override value | Use case |
|---|---|---|
time_series |
Numeric column with meaningful temporal patterns — price history, balances, sensor data | |
text |
Free-form string with > ~5 tokens on average — descriptions, reviews, feedback | |
image |
Path or URL to a visual asset — product photos, property listings |
Advanced overrides are configured via column_type_overrides and sometimes paired with sql_lambdas. See Enrich & Transform for details.
Handle integer columns correctly
Integers are treated as categorical by default. If a column should be numeric (e.g., quantity, count), either override it to decimal or cast it with an sql_lambda.