Requirements
Hardware
| Component | Minimum |
|---|---|
| GPU | recommended NVIDIA A100 or better; CUDA 12+ at minimum a multi-GPU cluster of A10s / L40s; CUDA 12+ |
| RAM | 240 GB |
| CPU | 32 cores |
| Disk | 1 TB |
| Runtime | Docker-capable environment |
Training and inference scale linearly with the number of GPUs.
Data
Minimum Structure
You need at least one event data source with:
| Column | Description |
|---|---|
| Entity ID | Unique identifier, e.g. customer_id |
| Timestamp | When the event occurred |
| Event attributes (min. 1) | e.g. product_id, price, category |
Adding entity attributes is recommended but not required:
| Example table | Example columns |
|---|---|
| Customer attributes | customer_id, age, region, signup_date, segment |
| Item attributes | product_id, category, brand, price_tier, product_name |
| Store attributes | store_id, format, region, zip_code, city |
Volume Guidelines
| Requirement | Threshold |
|---|---|
| Unique main entities | ≥ 10 000 (e.g. customers, users) |
| Event volume | ≥ 100 000 interactions per month |
| History — frequent interactions (banking, telco, FMCG, …) | ≥ 3 months |
| History — infrequent interactions (fashion, insurance, automotive, …) | ≥ 1 year |
Supported Data Sources
- Snowflake
- BigQuery
- Azure Synapse
- Parquet
- Databricks
- Hive
- ClickHouse
See Data Connect Sources for connection details.
Example Performance Profile
Real-world retail customer benchmark:
| Metric | Value |
|---|---|
| Events | ~8 billion |
| Unique clients | ~18 million |
| Unique products | ~1 million |
| Foundation model training | 12 h on 1× NVIDIA A100 |
| Scenario fine-tuning (16 000 brands) | 10 h on 1× NVIDIA A100 |
| Inference throughput | 2 718 clients/sec per GPU |
Additional optimizations (Low‑Rank Adapter Tuning, model quantization) can further reduce cost at a slight quality trade-off.