Snowpark Container Services
Deploy and run BaseModel on Snowflake Snowpark Container Services (SPCS). This guide covers account setup, image registry login, environment provisioning, and job execution.
Prerequisites
- Snowflake account with Snowpark Container Services permissions (compute pools, image repos, services/jobs)
- SQL runner (Snowsight Worksheets, SnowSQL, or Snowflake VS Code extension)
- Docker CLI (for building/pushing images)
- Snowflake CLI (
snow) if you want the simplest documented registry auth flow
Docker Login (Image Registry)
Snowflake provides an OCIv2-compliant image registry in your account for pushing/pulling images.
Option A (recommended) — Snowflake CLI login:
Uses your current Snowflake CLI connection and logs Docker into the account registry.
Option B — Registry token, then Docker login:
Snowflake CLI can return a registry auth token for your current connection.
snow spcs image-registry token --connection <connection_name> --format json | \
docker login <orgname>-<acctname>.registry.snowflakecomputing.com \
--username <your_snowflake_username> --password-stdin
Recommended: use Snowflake CLI for registry login
The exact username/password pattern depends on how you authenticate (SSO vs password vs PAT). The lowest-friction path is snow spcs image-registry login.
One-Time Environment Setup
Run once with an ACCOUNTADMIN-level role. This creates a workload role, a GPU compute pool, a database/schema/stage, and an image repository.
-- role
CREATE ROLE IF NOT EXISTS monad_role;
GRANT ROLE monad_role TO USER <your_admin_user>;
-- database + schema
CREATE DATABASE IF NOT EXISTS monad_db;
CREATE SCHEMA IF NOT EXISTS monad_db.data_schema;
-- stage for scripts/configs/output
CREATE STAGE IF NOT EXISTS monad_db.data_schema.monad_stage;
-- image repository (stores container images in Snowflake)
CREATE IMAGE REPOSITORY IF NOT EXISTS monad_db.data_schema.image_repository;
-- GPU compute pool
-- Pick a supported GPU instance family in your account/region.
-- List valid families with: SHOW COMPUTE POOL INSTANCE FAMILIES
CREATE COMPUTE POOL IF NOT EXISTS monad_compute_pool_gpu
MIN_NODES = 1
MAX_NODES = 1
INSTANCE_FAMILY = GPU_NV_S
INITIALLY_SUSPENDED = TRUE
AUTO_SUSPEND_SECS = 300
AUTO_RESUME = TRUE;
-- grants (minimal baseline; adjust to your org's standards)
GRANT USAGE ON DATABASE monad_db TO ROLE monad_role;
GRANT USAGE ON SCHEMA monad_db.data_schema TO ROLE monad_role;
GRANT USAGE, READ, WRITE ON STAGE monad_db.data_schema.monad_stage TO ROLE monad_role;
GRANT USAGE ON COMPUTE POOL monad_compute_pool_gpu TO ROLE monad_role;
Push the BaseModel Image
After the repository exists, push your image to the account registry:
Event Table (Logs / Telemetry)
SPCS logs and telemetry go to the active event table. If you haven't configured one, Snowflake uses SNOWFLAKE.TELEMETRY.EVENTS by default.
You do not need a custom event table to get started, but you should verify your telemetry configuration and where you want logs to land. See Snowflake docs: Event Table for details.
Running Jobs
A typical job flow:
- Upload scripts and configs to a Snowflake stage
- Write a job spec (YAML)
- Execute the job via SQL (job service)
Script Wrapper
Wrap BaseModel calls in a CLI so the job spec can pass arguments:
import argparse
from pathlib import Path
from monad.ui import pretrain
def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser()
p.add_argument("--config", type=Path, required=True)
p.add_argument("--features-path", type=Path, required=True)
p.add_argument("--storage-config", type=Path, required=False)
rerun = p.add_mutually_exclusive_group()
rerun.add_argument("--resume", action="store_true", default=False)
rerun.add_argument("--overwrite", action="store_true", default=False)
return p.parse_args()
if __name__ == "__main__":
params = parse_args()
pretrain(
config_path=params.config,
output_path=params.features_path,
storage_config_path=params.storage_config,
resume=params.resume,
overwrite=params.overwrite,
)
Job Specs
The spec defines containers, environment, command, resource requests, and stage-mounted volumes. All BaseModel jobs share the same structure — only args and resource limits change.
spec:
containers:
- name: main
image: /monad_db/data_schema/image_repository/monad:latest
env:
SNOWFLAKE_WAREHOUSE: monad_warehouse
SNOWFLAKE_DATABASE: HM_KAGGLE
SNOWFLAKE_SCHEMA: PUBLIC
command: ["python"]
args:
- /app/monad_stage/scripts/pretrain.py
- --config
- /app/monad_stage/configs/fm_config.yaml
- --features-path
- /app/monad_stage/fm_output
resources:
limits: { nvidia.com/gpu: 1 }
requests: { nvidia.com/gpu: 1 }
volumeMounts:
- name: monad-stage
mountPath: /app/monad_stage
- name: dev-shm
mountPath: /dev/shm
volumes:
- name: monad-stage
source: stage
stageConfig:
name: "@monad_db.data_schema.monad_stage"
uid: 1000
gid: 1000
- name: dev-shm
source: memory
size: 48Gi
spec:
containers:
- name: main
image: /monad_db/data_schema/image_repository/monad:latest
env:
SNOWFLAKE_WAREHOUSE: monad_warehouse
SNOWFLAKE_DATABASE: HM_KAGGLE
SNOWFLAKE_SCHEMA: PUBLIC
command: ["python"]
args:
- /app/monad_stage/scripts/train.py
- --checkpoint-dir
- /app/monad_stage/fm_output
- --save-path
- /app/monad_stage/scenario_output
resources:
limits: { nvidia.com/gpu: 1 }
requests: { nvidia.com/gpu: 1 }
volumeMounts:
- name: monad-stage
mountPath: /app/monad_stage
- name: dev-shm
mountPath: /dev/shm
volumes:
- name: monad-stage
source: stage
stageConfig:
name: "@monad_db.data_schema.monad_stage"
uid: 1000
gid: 1000
- name: dev-shm
source: memory
size: 48Gi
spec:
containers:
- name: main
image: /monad_db/data_schema/image_repository/monad:latest
env:
SNOWFLAKE_WAREHOUSE: monad_warehouse
SNOWFLAKE_DATABASE: HM_KAGGLE
SNOWFLAKE_SCHEMA: PUBLIC
command: ["python"]
args:
- /app/monad_stage/scripts/predict.py
- --save-path
- /app/monad_stage/predictions
- --checkpoint-dir
- /app/monad_stage/scenario_output/checkpoints
resources:
limits: { nvidia.com/gpu: 1 }
requests: { nvidia.com/gpu: 1 }
volumeMounts:
- name: monad-stage
mountPath: /app/monad_stage
- name: dev-shm
mountPath: /dev/shm
volumes:
- name: monad-stage
source: stage
stageConfig:
name: "@monad_db.data_schema.monad_stage"
uid: 1000
gid: 1000
- name: dev-shm
source: memory
size: 48Gi