Azure Synapse data sources

Connection parameters in YAML configuration file

⚠️

Check This First!

This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.


Various data sources are specified in the YAML file used by the pretrain function and configured by the entries in data_location section. Below is an example code for Synapse that should be adapted to your configuration.

data_location:
  database_type: synapse
  connection_params:
    server_name: "server_name", # eg. "tcp:your-server.sql.azuresynapse.net,1234"
    database_name: some_db_name,
    user: username,
    password: strongpassword123,
  schema_name: example_schema
  table_name: some_table 
Parameters
  • database_type : str, required
    No default value.
    Information about the database type or source file. All data tables should be stored in the same type.
    Set to: synapse.

  • connection_params : dict, required
    Configures the connection to the database.
    For Synapse, its required keyword arguments are:

    • server_name : str, required
      No default value.
      Your server name, a DatabaseConfig object. Environment variable can be called.
      Example: {your_server}.sql.azuresynapse.net, "${SYNAPSE_SERVER_NAME}"

    • database_name : str, required
      No default value.
      The name of your specific database within Azure Synapse (DB dedicated SQL pool).
      Example: some_db_name

    • user : str, required
      No default value.
      Specifies the login name of the user for the connection. Environment variable can be called.
      Examples: "firstnamelastname", "${SYNAPSE_USER}"

    • password : str, required
      No default value.
      Specifies the password for the specified user. Environment variable can be called.
      Examples: "strongpassword123", "${SYNAPSE_PASSWORD}"

  • schema_name (str) : str, required
    No default value.
    Specifies the data schema to use to create features.
    Example: test_schema.

  • table_name : str, required
    No default value.
    Specifies the table to use to create features.
    Example: customers.

The connection_params should be set separately in each data_location block, for each data source.

🚧

Note

For security reasons, avoid providing token and Synapse connection variables directly in the code; instead, set them as environment variables and call as such, an in the example below.

Example

The following example demonstrates the connection to Synapse in the context of a simple configuration with two data sources.

data_sources:
  -type: main_entity_attribute
   main_entity_column: UserID
   name: customers
   data_location:
     database_type: synapse
     connection_params:
       server_name: ${SYNAPSE_SERVER_NAME}
       user: ${SYNAPSE_USER}
       password: ${SYNAPSE_PASSWORD}
       database_name: example_name
     schema_name: test_schema
     table_name: customers
   disallowed_columns: [CreatedAt]
  -type: event
   main_entity_column: UserID
   name: purchases
   date_column: Timestamp
   data_location:
     database_type: synapse
     connection_params:
       server_name: ${SYNAPSE_SERVER_NAME}
       user: ${SYNAPSE_USER}
       password: ${SYNAPSE_PASSWORD}
       database_name: example_name
     schema_name: test_schema
     table_name: purchases
   where_condition: "Timestamp >= today() - 365"
   sql_lambdas: 
     - alias: price_float
       expression: "TO_DOUBLE(price)"

📘

Note

The detailed description of optional fields such as disallowed_columns, where_condition, sql_lambda, and many others is provided here