Managing space and memory

memory_constraining_params and query_optimization blocks in YAML configuration file

⚠️

Check This First!

This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.


Controlling the model size

Parameters in memory_constraining_params block determine the size of the model:

  • explicitly by setting model architecture,
  • implicitly by steering the size of the input.

Depending on your infrastructure and data, you may want to reduce the model size or, conversely, increase it.

Parameters
  • hidden_dim : int
    default: NUM_LAYERS_DEFAULT
    The size of the hidden layers. NUM_LAYERS_DEFAULT constant sets it to 2048.

  • num_layers : int
    default: NUM_LAYERS_DEFAULT
    The number of hidden layers. NUM_LAYERS_DEFAULT constant sets it to 4.

  • emde_quality : float
    default: 1.0
    The quality of the features' density estimation. The lower the quality, the smaller the sketches and therefore the input to the model.

Example
hidden_dim: 1024
emde_quality: 0.8

Optimizing query

The settings here control the degree of parallelization.

Parameters
  • num_query_chunks : int, optional
    default: 1
    This parameter represents the number of segments a query should be divided into. Splitting the query into smaller pieces can help reduce memory consumption on the database end, which is particularly useful for queries that require significant memory resources.

  • num_cpus : int, optional
    default: 0
    The number of CPUs used for fit phase. If left unchanged at 0, all available CPUs will be used.

  • sampling_params : SamplingParams, optional
    default: SamplingParams
    Sampling parameters used to change the size of the sample obtained from the dataset. Sample will be used to train proper model features. Changing default values is not recommended. Keyword arguments are:
    • num_entities : int, optional
      Maximal number of entities to sample. If not provided, optimal number of samples will be calculated automatically.
    • history_limit: int, optional
      Maximak number of events per entity to sample. If not provided, optimal number of samples will be calculated automatically.

Example
query_optimization:
  num_query_chunks: 4
  num_cpus: 10