Skip to content

Snowpark Container Services

Deploy and run BaseModel on Snowflake Snowpark Container Services (SPCS). This guide covers account setup, image registry login, environment provisioning, and job execution.

Prerequisites

  • Snowflake account with Snowpark Container Services permissions (compute pools, image repos, services/jobs)
  • SQL runner (Snowsight Worksheets, SnowSQL, or Snowflake VS Code extension)
  • Docker CLI (for building/pushing images)
  • Snowflake CLI (snow) if you want the simplest documented registry auth flow

Docker Login (Image Registry)

Snowflake provides an OCIv2-compliant image registry in your account for pushing/pulling images.

Option A (recommended) — Snowflake CLI login:

Uses your current Snowflake CLI connection and logs Docker into the account registry.

bash
snow spcs image-registry login --connection <connection_name>

Option B — Registry token, then Docker login:

Snowflake CLI can return a registry auth token for your current connection.

bash
snow spcs image-registry token --connection <connection_name> --format json | \
  docker login <orgname>-<acctname>.registry.snowflakecomputing.com \
    --username <your_snowflake_username> --password-stdin

Recommended: use Snowflake CLI for registry login

The exact username/password pattern depends on how you authenticate (SSO vs password vs PAT). The lowest-friction path is snow spcs image-registry login.

One-Time Environment Setup

Run once with an ACCOUNTADMIN-level role. This creates a workload role, a GPU compute pool, a database/schema/stage, and an image repository.

SQL
-- role
CREATE ROLE IF NOT EXISTS monad_role;
GRANT ROLE monad_role TO USER <your_admin_user>;

-- database + schema
CREATE DATABASE IF NOT EXISTS monad_db;
CREATE SCHEMA IF NOT EXISTS monad_db.data_schema;

-- stage for scripts/configs/output
CREATE STAGE IF NOT EXISTS monad_db.data_schema.monad_stage;

-- image repository (stores container images in Snowflake)
CREATE IMAGE REPOSITORY IF NOT EXISTS monad_db.data_schema.image_repository;

-- GPU compute pool
-- Pick a supported GPU instance family in your account/region.
-- List valid families with: SHOW COMPUTE POOL INSTANCE FAMILIES
CREATE COMPUTE POOL IF NOT EXISTS monad_compute_pool_gpu
  MIN_NODES = 1
  MAX_NODES = 1
  INSTANCE_FAMILY = GPU_NV_S
  INITIALLY_SUSPENDED = TRUE
  AUTO_SUSPEND_SECS = 300
  AUTO_RESUME = TRUE;

-- grants (minimal baseline; adjust to your org's standards)
GRANT USAGE ON DATABASE monad_db TO ROLE monad_role;
GRANT USAGE ON SCHEMA monad_db.data_schema TO ROLE monad_role;
GRANT USAGE, READ, WRITE ON STAGE monad_db.data_schema.monad_stage TO ROLE monad_role;
GRANT USAGE ON COMPUTE POOL monad_compute_pool_gpu TO ROLE monad_role;

Push the BaseModel Image

After the repository exists, push your image to the account registry:

bash
docker push <orgname>-<acctname>.registry.snowflakecomputing.com/<repo_path>/monad:<tag>

Event Table (Logs / Telemetry)

SPCS logs and telemetry go to the active event table. If you haven't configured one, Snowflake uses SNOWFLAKE.TELEMETRY.EVENTS by default.

You do not need a custom event table to get started, but you should verify your telemetry configuration and where you want logs to land. See Snowflake docs: Event Table for details.

Running Jobs

A typical job flow:

  1. Upload scripts and configs to a Snowflake stage
  2. Write a job spec (YAML)
  3. Execute the job via SQL (job service)

Script Wrapper

Wrap BaseModel calls in a CLI so the job spec can pass arguments:

Python
import argparse
from pathlib import Path
from monad.ui import pretrain

def parse_args() -> argparse.Namespace:
    p = argparse.ArgumentParser()
    p.add_argument("--config", type=Path, required=True)
    p.add_argument("--features-path", type=Path, required=True)
    p.add_argument("--storage-config", type=Path, required=False)

    rerun = p.add_mutually_exclusive_group()
    rerun.add_argument("--resume", action="store_true", default=False)
    rerun.add_argument("--overwrite", action="store_true", default=False)

    return p.parse_args()

if __name__ == "__main__":
    params = parse_args()
    pretrain(
        config_path=params.config,
        output_path=params.features_path,
        storage_config_path=params.storage_config,
        resume=params.resume,
        overwrite=params.overwrite,
    )

Job Specs

The spec defines containers, environment, command, resource requests, and stage-mounted volumes. All BaseModel jobs share the same structure — only args and resource limits change.

YAML
spec:
  containers:
    - name: main
      image: /monad_db/data_schema/image_repository/monad:latest
      env:
        SNOWFLAKE_WAREHOUSE: monad_warehouse
        SNOWFLAKE_DATABASE: HM_KAGGLE
        SNOWFLAKE_SCHEMA: PUBLIC
      command: ["python"]
      args:
        - /app/monad_stage/scripts/pretrain.py
        - --config
        - /app/monad_stage/configs/fm_config.yaml
        - --features-path
        - /app/monad_stage/fm_output
      resources:
        limits:   { nvidia.com/gpu: 1 }
        requests: { nvidia.com/gpu: 1 }
      volumeMounts:
        - name: monad-stage
          mountPath: /app/monad_stage
        - name: dev-shm
          mountPath: /dev/shm

  volumes:
    - name: monad-stage
      source: stage
      stageConfig:
        name: "@monad_db.data_schema.monad_stage"
      uid: 1000
      gid: 1000
    - name: dev-shm
      source: memory
      size: 48Gi
YAML
spec:
  containers:
    - name: main
      image: /monad_db/data_schema/image_repository/monad:latest
      env:
        SNOWFLAKE_WAREHOUSE: monad_warehouse
        SNOWFLAKE_DATABASE: HM_KAGGLE
        SNOWFLAKE_SCHEMA: PUBLIC
      command: ["python"]
      args:
        - /app/monad_stage/scripts/train.py
        - --checkpoint-dir
        - /app/monad_stage/fm_output
        - --save-path
        - /app/monad_stage/scenario_output
      resources:
        limits:   { nvidia.com/gpu: 1 }
        requests: { nvidia.com/gpu: 1 }
      volumeMounts:
        - name: monad-stage
          mountPath: /app/monad_stage
        - name: dev-shm
          mountPath: /dev/shm

  volumes:
    - name: monad-stage
      source: stage
      stageConfig:
        name: "@monad_db.data_schema.monad_stage"
      uid: 1000
      gid: 1000
    - name: dev-shm
      source: memory
      size: 48Gi
YAML
spec:
  containers:
    - name: main
      image: /monad_db/data_schema/image_repository/monad:latest
      env:
        SNOWFLAKE_WAREHOUSE: monad_warehouse
        SNOWFLAKE_DATABASE: HM_KAGGLE
        SNOWFLAKE_SCHEMA: PUBLIC
      command: ["python"]
      args:
        - /app/monad_stage/scripts/predict.py
        - --save-path
        - /app/monad_stage/predictions
        - --checkpoint-dir
        - /app/monad_stage/scenario_output/checkpoints
      resources:
        limits:   { nvidia.com/gpu: 1 }
        requests: { nvidia.com/gpu: 1 }
      volumeMounts:
        - name: monad-stage
          mountPath: /app/monad_stage
        - name: dev-shm
          mountPath: /dev/shm

  volumes:
    - name: monad-stage
      source: stage
      stageConfig:
        name: "@monad_db.data_schema.monad_stage"
      uid: 1000
      gid: 1000
    - name: dev-shm
      source: memory
      size: 48Gi