Google BigQuery data sources
Connection parameters in YAML configuration file
Check This First!
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.
Various data sources are specified in the YAML file used by the pretrain
function and configured by the entries in data_location
section. Below is an example code for BigQuery that should be adapted to your configuration.
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/file.json
schema_name: your_schema_name
table_name: your_table_name
Parameters |
---|
-
database_type : str, required
No default value.
Information about the database type or source file. All data tables should be stored in the same type.
Set to:bigquery
.
-
connection_params : dict, required
Configures the connection to the database.
For BigQuery, its required keyword arguments are:
-
filename : str, required
No default value.
The path to the service account json file.Example:
bigquery-user.json
.
-
-
schema_name (str) : str, required
No default value.
Specifies the data schema to use to create features.
Example:test_schema
.
-
table_name : str, required
No default value.
Specifies the table to use to create features.
Example:customers
.
Theconnection_params
should be set separately in eachdata_location
block, for each data source.
Example |
---|
The following example demonstrates the connection to BigQuery in the context of a simple configuration with two data sources.
data_sources:
-type: main_entity_attribute
main_entity_column: UserID
name: customers
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/file.json
schema_name: your_schema_name
table_name: customers
disallowed_columns: [CreatedAt]
-type: event
main_entity_column: UserID
name: purchases
date_column: Timestamp
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/file.json
schema_name: your_schema_name
table_name: purchases
where_condition: "Timestamp >= today() - 365"
sql_lambdas:
- alias: price_float
expression: "TO_DOUBLE(price)"
Learn More
The detailed description of optional fields such as
disallowed_columns
,where_condition
,sql_lambda
, and many others is provided here
Updated 15 days ago