[OLD] Connection Parameters

Here, we present example connection parameters for each of supported databases, and database-specific configurations.
Connection to the data source is configured in the Pretrain phase.

Pretrain phase configuration example:

Snowflake

data_sources:
    -type: attribute
     main_entity_column: UserID
     name: customers
     data_location:
        source: snowflake
        connection_params:
            user: username,
            password: strongpassword123,
            account: xy12345.west-europe.azure,
            database: EXAMPLE_DB,
            schema: EXAMPLE_SCHEMA,
            role: ACCOUNT_ADMIN
        table_name: customers
     disallowed_columns: [CreatedAt]
    -type: event
     main_entity_column: UserID
     name: purchases
     date_column: Timestamp
     data_location:
        source: snowflake
        connection_params:
            user: username,
            password: strongpassword123,
            account: xy12345.west-europe.azure,
            database: EXAMPLE_DB,
            schema: EXAMPLE_SCHEMA,
            role: ACCOUNT_ADMIN
        table_name: purchases
        where_condition: "Timestamp >= today() - 365"
        sql_lambda: "TO_DOUBLE(price)"

clickhouse

data_sources:
  - type: event
    name: product_buy
    data_location:
      database_type: clickhouse
      connection_params:
        host: 'HOST_IP'
      schema_name: SCHEMA
      table_name: TABLE_NAME
    where_condition: "action = 'product.buy'"
    main_entity_column: client_id
    date_column:
      name: event_timestamp
    column_type_overrides:
       title: text

hive

data_sources:
  - type: event
    name: transactions
    main_entity_column: customer_id
    date_column: 
      name: t_dat
      format: "yyyy-MM-dd"
    disallowed_columns: ["price"]
    sql_lambdas:  
      - alias: price
        expression: CAST({{resolve_fn('price')}} as DOUBLE)
    data_location:
      source: hive
      table_name: transactions
      where_condition: "t_dat > '2019-11-01'"
      connection_params:
        # Query string parameters
        hive_params:
          DSN: SmokeTests
        # Path to ini file (optional, can be set via env variable ODBCINI) 
        ini_file: "/PATH_TO_INI_FILE"

teradata

data_sources:
  - type: attribute
    main_entity_column: customer_id
    name: customers
    data_location:
      source: teradata
      connection_params:
        host: "IP_ADDRESS"
        user: "USER"
        password: "PASS"
      dataset_name: DB
      table_name: Customers
  - type: event
    main_entity_column: customer_id
    name: transactions
    date_column: t_dat
    data_location:
      source: teradata
      connection_params:
        host: "IP_ADDRESS
        user: "USER"
        password: "PASS"
      dataset_name: DB
      table_name: Transactions

bigquery

{
    type:   
    project_id: 
    private_key_id: 
    private_key: 
    client_email: 
    client_id: 
    auth_uri: 
    token_uri: 
    auth_provider_x509_cert_url: 
    client_x509_cert_url:  
}

synapse

{
    server_name: 
    database_name: 
    user: 
    password: 
}

parquet

There is also a possibility to load data from parquet file in this case no connection parameters are needed.

In the Pretrain phase the path to parquet file needs to be provided in the datasource parameters under path.
Example:

data_sources:
  - type: main_entity_attribute
    name: customers
    data_location:
      database_type: parquet
      table_name: customers
      connection_params:
        path: PATH_TO_PARQUET
    main_entity_column: customer_id
  - type: event
    name: transactions
    data_location:
      database_type: parquet
      table_name: transactions
      connection_params:
        path:  PATH_TO_PARQUET
    main_entity_column: customer_id
    date_column: 
      name: t_dat

🚧

Important:

Data source configuration differs for parquet source, as some operations are available only when working with database connection, e.g., sql_lambda. Parquet data source also does not currently support joins.