Apache Hive data sources
Connection parameters in YAML configuration file
Check This First!
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.
Various data sources are specified in the YAML file used by the pretrain
function and configured by the entries in data_location
section. Below is an example code that should be adapted to your configuration.
data_location:
database_type: hive
connection_params:
# Query string parameters
hive_params:
DSN: SmokeTests
# Path to ini file (optional, can be set via env variable ODBCINI)
ini_file: "/PATH_TO_INI_FILE"
table_name: some_table
Parameters |
---|
-
database_type : str, required
No default value.
Information about the database type or source file. All data tables should be stored in the same type.
Set to:hive
.
-
connection_params : dict, required
Configures the connection to the database.
If environment variable ODBCINI orini_file
field in connection_params are set, configuration for connection is specified by .ini file and setting matchingDSN
param inhive_params
field.
Additionally, all of the parameters specified inhive_params
field will override the parameters set in the ini file.
Authenticating with Kerberos requires specifying fieldkerberos_params
in connection params. Parameters specified in Kerberos config will override ones set in .ini file, or ashive_params
.Keyword arguments are:
-
hive_params : dict
Specifies connection parameters to Hive. Defined with keywords:
-
DSN : str, optional
DSN used for the connection. -
Driver : str, optional
Path to connection driver. -
Port : int, optional
Connection port. -
HiveServerType : int, optional
Type of Hive server. -
Additional parameters can be found in Apache Hive Configuration Properties.
Each parameter should have prefix
SSP_
, for example to sethive.test.mode.samplefreq=100
addSSP_hive.test.mode.samplefreq: 100
tohive_params
-
-
ini_file : str, optional
Path to ini file. If not provided, path will be taken from ODBCINI environmental variable.
-
kerberos_params : dict, optional
Specifies authentication with Kerberos if needed. Defined with keywords:
-
user : str
Kerberos principal name. -
realm : str
Kerberos realm name. -
kerberos_host : str
Kerberos service host ip. -
kerberos_service_name : str
Kerberos service name eg. 'hive'. -
kerberos_fqdn : str
Fully qualified domain name. -
keytab_path : str
Pa th to the keytab file. -
krb5_config_path : str
Pa th to the krb5.conf file. Defaults to "/etc/krb5.conf". -
password : str, optional
A password in plain text or a path to a file containing the password. Password file works only with Heimdal Kerberos client. Defaults to None. -
kerberos_renewal_interval_minutes : int, optional
Interval in minutes at which to renew the Kerberos ticket. Defaults to 540. -
verbose : bool, optional
Whether to print verbose output. Defaults to False.
-
-
-
table_name : str
Specifies the table to use to create features. Example:customers
.
The connection_params
should be set separately in each data_location
block, for each data source.
Example |
---|
The following example demonstrates the connection to Hive in the context of a simple configuration with two data sources.
data_sources:
-type: main_entity_attribute
main_entity_column: UserID
name: customers
data_location:
database_type: hive
connection_params:
# Query string parameters
hive_params:
DSN: SmokeTests
# Path to ini file (optional, can be set via env variable ODBCINI)
ini_file: "/PATH_TO_INI_FILE"
table_name: customers
disallowed_columns: [CreatedAt]
-type: event
main_entity_column: UserID
name: purchases
date_column: Timestamp
data_location:
database_type: hive
connection_params:
# Query string parameters
hive_params:
DSN: SmokeTests
# Path to ini file (optional, can be set via env variable ODBCINI)
ini_file: "/PATH_TO_INI_FILE"
table_name: purchases
where_condition: "Timestamp >= today() - 365"
sql_lambdas:
- alias: price_float
expression: "TO_DOUBLE(price)"
Learn More
The detailed description of optional fields such as
disallowed_columns
,where_condition
,sql_lambda
, and many others is provided here
Updated 15 days ago