[OLD] Connection Parameters
Here, we present example connection parameters for each of supported databases, and database-specific configurations.
Connection to the data source is configured in the Pretrain phase.
Pretrain phase configuration example:
Snowflake
data_sources:
-type: attribute
main_entity_column: UserID
name: customers
data_location:
source: snowflake
connection_params:
user: username,
password: strongpassword123,
account: xy12345.west-europe.azure,
database: EXAMPLE_DB,
schema: EXAMPLE_SCHEMA,
role: ACCOUNT_ADMIN
table_name: customers
disallowed_columns: [CreatedAt]
-type: event
main_entity_column: UserID
name: purchases
date_column: Timestamp
data_location:
source: snowflake
connection_params:
user: username,
password: strongpassword123,
account: xy12345.west-europe.azure,
database: EXAMPLE_DB,
schema: EXAMPLE_SCHEMA,
role: ACCOUNT_ADMIN
table_name: purchases
where_condition: "Timestamp >= today() - 365"
sql_lambda: "TO_DOUBLE(price)"
clickhouse
data_sources:
- type: event
name: product_buy
data_location:
database_type: clickhouse
connection_params:
host: 'HOST_IP'
schema_name: SCHEMA
table_name: TABLE_NAME
where_condition: "action = 'product.buy'"
main_entity_column: client_id
date_column:
name: event_timestamp
column_type_overrides:
title: text
hive
data_sources:
- type: event
name: transactions
main_entity_column: customer_id
date_column:
name: t_dat
format: "yyyy-MM-dd"
disallowed_columns: ["price"]
sql_lambdas:
- alias: price
expression: CAST({{resolve_fn('price')}} as DOUBLE)
data_location:
source: hive
table_name: transactions
where_condition: "t_dat > '2019-11-01'"
connection_params:
# Query string parameters
hive_params:
DSN: SmokeTests
# Path to ini file (optional, can be set via env variable ODBCINI)
ini_file: "/PATH_TO_INI_FILE"
teradata
data_sources:
- type: attribute
main_entity_column: customer_id
name: customers
data_location:
source: teradata
connection_params:
host: "IP_ADDRESS"
user: "USER"
password: "PASS"
dataset_name: DB
table_name: Customers
- type: event
main_entity_column: customer_id
name: transactions
date_column: t_dat
data_location:
source: teradata
connection_params:
host: "IP_ADDRESS
user: "USER"
password: "PASS"
dataset_name: DB
table_name: Transactions
bigquery
{
type:
project_id:
private_key_id:
private_key:
client_email:
client_id:
auth_uri:
token_uri:
auth_provider_x509_cert_url:
client_x509_cert_url:
}
synapse
{
server_name:
database_name:
user:
password:
}
parquet
There is also a possibility to load data from parquet file in this case no connection parameters are needed.
In the Pretrain phase the path to parquet file needs to be provided in the datasource parameters under path
.
Example:
data_sources:
- type: main_entity_attribute
name: customers
data_location:
database_type: parquet
table_name: customers
connection_params:
path: PATH_TO_PARQUET
main_entity_column: customer_id
- type: event
name: transactions
data_location:
database_type: parquet
table_name: transactions
connection_params:
path: PATH_TO_PARQUET
main_entity_column: customer_id
date_column:
name: t_dat
Important:
Data source configuration differs for parquet source, as some operations are available only when working with database connection, e.g.,
sql_lambda
. Parquet data source also does not currently support joins.
Updated 5 months ago