Google BigQuery data sources
Connection parameters in YAML configuration file
Check This First!
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.
Various data sources are specified in the YAML file used by the pretrain
function and configured by the entries in data_location
section. Below is an example configuration for BigQuery, which should be adapted to your environment:
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/file.json
project_id: optional project name if is different from the one in the service account
schema_name: your_schema_name
table_name: your_table_name
Parameters |
---|
-
database_type : str, required
No default value.
Specifies the type of database. All listed tables must come from the same source type. Set to:bigquery
. -
connection_params : dict, required
Connection configuration. For BigQuery, the required keys are:-
filename : str, required
No default value.
Path to the service account JSON file (see example below).Example:
bigquery-user.json
. -
project_id: str, optional
Default value: None
Overrides the Google Cloud project ID defined in the service account JSON file. Use this if the BigQuery project you want to use for querying or computation is different from the one the service account belongs to.
-
-
schema_name (str) : str, required
No default value.
The schema to use when building features. You may also passproject_id.schema_name
if the data resides in a different GCP project than the one used to execute the query. Example:test_schema
. -
table_name : str, required
No default value.
Specifies the table to use to create features. Example:customers
.Note
The
connection_params
must be defined individually for eachdata_location
block, even if the same file is reused across multiple sources.
To connect BaseModel to BigQuery, you need a Google Cloud service account key file in JSON format. This file contains credentials that authorize secure, programmatic access to your BigQuery project.
You can generate this file in the Google Cloud Console by creating a service account, assigning the necessary BigQuery roles, and downloading the key as a JSON file.
Below is a sample structure of the file:
{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "your-private-key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\nYOUR-PRIVATE-KEY\n-----END PRIVATE KEY-----\n",
"client_email": "[email protected]",
"client_id": "your-client-id",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-service-account%40your-project-id.iam.gserviceaccount.com",
"universe_domain": "googleapis.com"
}
Examples |
---|
The example below demonstrates how to configure multiple BigQuery-based data sources, including main entity and event data:
data_sources:
-type: main_entity_attribute
main_entity_column: UserID
name: customers
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/file.json
schema_name: your_schema_name
table_name: customers
disallowed_columns: [CreatedAt]
-type: event
main_entity_column: UserID
name: purchases
date_column:
name: Timestamp
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/file.json
schema_name: your_schema_name
table_name: purchases
where_condition: "Timestamp >= today() - 365"
sql_lambdas:
- alias: price_float
expression: "TO_DOUBLE(price)"
If your data is stored in a project that differs from the one used to execute the query, you can specify both the compute and storage projects explicitly:
data_sources:
-type: event
main_entity_column: UserID
name: purchases
date_column:
name: Timestamp
data_location:
database_type: bigquery
connection_params:
filename: /path/to/your/file.json
project_id: compute_project_id
schema_name: data_project_id.data_schema_name
table_name: purchases
Learn More
The detailed description of optional fields such as
disallowed_columns
,where_condition
,sql_lambda
, and many others is provided here
Updated 24 days ago