HomeGuidesRecipesChangelog
Log In
Guides

Control of data loading process

data_loader_params blocks in YAML configuration file

⚠️

Check This First!

This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.


data_loader_params block allows you to set constructor parameters for PyTorch DataLoader.
These settings modify how the data is loaded, such as batch sizes, workers etc.

Parameters
  • batch_size : int
    default: 256
    The size of the batch: how many samples per batch to load.
  • num_workers : int
    default: 0
    How many sub-processes to use for data loading. 0 means that the data will be loaded in the main process. Increasing number of workers results in splitting queries into smaller pieces which reduce memory consumption on the database end.
  • pin_memory : boolean
    default: False
    If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them.
  • drop_last : boolean
    default: False
    Set to True to drop the last incomplete batch if the dataset size is not divisible by.
  • pin_memory_device : str
    default: None
    The device memory should be pinned to, if pin_memory is True.
  • prefetch_factor : int
    default: 2
    Number of batches loaded in advance by each worker.
Example
data_loader_params:
  batch_size: 256
  num_workers: 5