Target Function: Operations on Events

The input transformations allowed in functions

⚠️

Check This First!

This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.


Target functions transform events (history, future), and attributes into a label (or labels) for supervised learning suitable for a given business scenario.


You can build the following types of targets in BaseModel:

target type & use casesexpected label shape & notes
binary — churn, conversion, fraudsingle binary value (0/1), usually returned as a length-1 array (e.g. np.array([label], dtype=np.float32))
multilabel — category / brand propensityvector of binary indicators (0/1), one per label in a fixed list; multiple 1s allowed
multiclass — next category, dominant segmentvector representing exactly one class; typically a one-hot vector or a normalized probability distribution (sums to 1)
regression — spend, margin, frequencysingle float value, typically returned as a length-1 array
recommendation — next item, next basketsketch-based label derived from events rather than an explicit numeric vector

Operations On Events

The following operations can be implemented directly on event objects, both history and future.

Link to API reference for class DataSourceEvents


count()

Calculates the number of events.

Signature

count() -> int

Example

churn = 0 if future["purchases"].count() > 0 else 1


sum()

Sums values from a column or a computed expression.

Signature

sum(column: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], ignore_nan: bool = True) -> float

Examples

# sum of a column
total_spend = future["transactions"].sum(column="purchase_value")

# sum of a computed expression (price * quantity)
total_spend = future["transactions"].sum(
    column=lambda data: data["price"] * data["quantity"],
)

mean()

Computes the mean from a column or a computed expression.

Signature

mean(column: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], ignore_nan: bool = True) -> float

Examples

# mean of a column
avg_price = future["transactions"].mean(column="price")

# mean of a computed expression (price * quantity)
avg_line_value = future["transactions"].mean(
    column=lambda data: data["price"] * data["quantity"],
)

min()

Returns the minimum from a column or a computed expression.

Signature

min(column: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], ignore_nan: bool = True) -> float

Example

min_line_value = future["transactions"].min(
    column=lambda data: data["price"] * data["quantity"],
)

max()

Returns the maximum from a column or a computed expression.

Signature

max(column: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], ignore_nan: bool = True) -> float

Example

max_line_value = future["transactions"].max(
    column=lambda data: data["price"] * data["quantity"],
)

apply()

Applies a function element-wise to a target column.

Signature

apply(func: Callable[[Any], Any], target: str) -> DataSourceEvents

Example

normalized = future["product_buy"].apply(
    func=lambda x: x.lower(),
    target="brand",
)

filter()

Filters events using either:

  • a column name (str), or
  • a callable expression producing an array aligned with events.

Signature

filter(by: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], condition: Callable[[Any], bool]) -> DataSourceEvents

Examples

# filter using a column name
promo_only = future["transactions"].filter(
    by="campaign_id",
    condition=lambda x: x in campaigns_in_scope,
)

# filter using a computed expression (price * quantity)
large_lines = future["transactions"].filter(
    by=lambda data: data["price"] * data["quantity"],
    condition=lambda x: x >= 100.0,
)

groupBy()

Groups events by one or more columns and returns an EventsGroupBy proxy.

Signature

groupBy(by: str | list[str]) -> EventsGroupBy

⚠️

Note

groupBy only produces output when followed by a grouped operation (e.g. count, sum, exists, ...).
A full list of supported operations, along with examples, is provided in the subsection below.

Example

present, brands = (
    future["product_buy"]
    .groupBy("brand")
    .exists(groups=["Nike", "Adidas"])
)

⚠️

Note

All aggregations (sum, mean, min, max) ignore NaN values by default (ignore_nan=True).


Operations On Grouped Events

Grouped operations are applied to the result of groupBy(...) (an EventsGroupBy object).

Link to API reference for class EventsGroupBy

count()

Counts elements in each group.

Signature

count(normalize: bool = False, groups: list[Any] | None = None) -> tuple[np.ndarray, list[str]]

Example

counts, brands = (
    future["purchases"]
    .groupBy("brand")
    .count(normalize=True, groups=["Garmin", "Suunto"])
)

sum()

Sums a column or a computed expression within each group.

Signature

sum(target: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], groups: list[Any] | None = None, ignore_nan: bool = True) -> tuple[np.ndarray, list[str]]

Examples

# sum of a column per category
spend_by_cat, categories = (
    future["transactions"]
    .groupBy("category")
    .sum(target="purchase_value")
)

# sum of computed expression per category
spend_by_cat, categories = (
    future["transactions"]
    .groupBy("category")
    .sum(target=lambda data: data["price"] * data["quantity"])
)

mean()

Computes the mean of a column or a computed expression within each group.

Signature

mean(target: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], groups: list[Any] | None = None, ignore_nan: bool = True) -> tuple[np.ndarray, list[str]]

Example

avg_line_value_by_brand, brands = (
    future["transactions"]
    .groupBy("brand")
    .mean(target=lambda data: data["price"] * data["quantity"])
)

min()

Computes the minimum of a column or a computed expression within each group.

Signature

min(target: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], groups: list[Any] | None = None, ignore_nan: bool = True) -> tuple[np.ndarray, list[str]]

Example

min_line_value_by_cat, categories = (
    future["transactions"]
    .groupBy("category")
    .min(target=lambda data: data["price"] * data["quantity"])
)

max()

Computes the maximum of a column or a computed expression within each group.

Signature

max(target: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], groups: list[Any] | None = None, ignore_nan: bool = True) -> tuple[np.ndarray, list[str]]

Example

max_line_value_by_store, stores = (
    future["transactions"]
    .groupBy("store")
    .max(target=lambda data: data["price"] * data["quantity"])
)

exists()

Checks whether each provided group has any events.

Signature

exists(groups: list[Any]) -> tuple[np.ndarray, list[str]]

Example

present, target_brands = (
    future["transactions"]
    .groupBy("brand")
    .exists(groups=TARGET_BRANDS)
)

apply()

Applies a custom reducer to each group’s values.

Signature

apply(func: Callable[[np.ndarray], Any], default_value: Any, target: str | Callable[[Mapping[str, np.ndarray]], np.ndarray], groups: list[Any] | None = None) -> tuple[Any, list[str]]

Example

import numpy as np

max_line_value, brands = (
    future["transactions"]
    .groupBy("brand")
    .apply(
        func=lambda x: float(np.nanmax(x)),
        default_value=0.0,
        target=lambda data: data["price"] * data["quantity"],
        groups=["Garmin", "Suunto"],
    )
)

⚠️

Note

All grouped aggregations (sum, mean, min, max) ignore NaN values by default (ignore_nan=True).