Control of data loading process

⚠️
Check This First!
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.

data_loader_params block allows you to set constructor parameters for PyTorch DataLoader.
These settings modify how the data is loaded, such as batch sizes, workers etc.

Parameters

batch_size : int
default: 256
The size of the batch: how many samples per batch to load.
num_workers : int
default: 0
How many sub-processes to use for data loading. 0 means that the data will be loaded in the main process. Increasing number of workers results in splitting queries into smaller pieces which reduce memory consumption on the database end.
pin_memory : boolean
default: False
If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them.
drop_last : boolean
default: False
Set to True to drop the last incomplete batch if the dataset size is not divisible by.
pin_memory_device : str
default: None
The device memory should be pinned to, if pin_memory is True.
prefetch_factor : int
default: 2
Number of batches loaded in advance by each worker.

Example

data_loader_params:
  batch_size: 256
  num_workers: 5