Skip to content

Connect Sources

Every entry in the data_sources list needs a data_location block that tells BaseModel where to read from. This page covers the supported backends and the connection parameters for each.

Supported databases

Database database_type Required fields
Parquet parquet path
Snowflake snowflake user, password, account, warehouse, database, db_schema
BigQuery bigquery filename
Databricks databricks host, warehouse_id, (token or client_id + client_secret)
Hive hive hive_params
ClickHouse clickhouse host
Synapse synapse server_name, database_name, user

Connection pattern

Every data_location block follows the same structure:

yaml
data_location:
  database_type: ...        # see table above
  connection_params:
    ...                     # engine-specific — see reference
  table_name: my_table

Engine-specific details

Each engine has its own required fields, optional parameters, and authentication methods. See the full reference for YAML examples and parameter tables:

Best Practices

Never commit credentials to config files

Never commit passwords or tokens into YAML configuration files. All connection_params values support ${ENV_VAR} syntax.

  • Stick to one database_type per config. All tables within a single foundation model config should use the same backend. If your data lives in multiple engines, either export the smaller tables into the larger one or export everything to Parquet.

Test connectivity before training

Verify that your credentials, network access, and table names are correct using your database client before launching a training run.

For the full list of parameters per engine, see the Data Configuration Reference.