Binary Classification Target Function

⚠️
Check This First!
This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.

In this subpage we will look at some examples of target functions for Binary Classification - a machine learning model that classifies observations (entities) into one of two classes, commonly designated as 0 or 1.

These models help to answer all sorts of yes / no questions by outputting the probabilities of the classes, eg.:

Will the customer lapse ("churn") or not,
Will the patient show up for the appointment or not,
Is the message spam or not,
Will the shopper spend more than certain amount or not, etc.

Standard Template for Binary Classification Target Functions

Each function for classification problem will:

accept as parameters history, future, entity and ctx, as described here,
specify target time window and perform some inputs transformation as explained in this section
output a one-dimensional numpy array of float32 data type and with a size of 1.

def target_fn(_history: Events, future: Events, _entity: Attributes, _ctx: Dict) -> np.ndarray:
    
    # trim future to the desired time window
    
    # transformation of events into the desired target

    return np.array(target, dtype=np.float32)

We will now explore the transformations looking at two examples of functions for binary classification problems.

Is a customer likely to churn?

An ideal model would return the label of 1 for every customer who would lapse, and 0 for all who would continue to engage. We should start by establishing a clear criterion related to events in history and future that would enable us to label customers like that.

The steps are as follow:

For efficiency, we exclude customers with no transaction events in history (already churned) upfront,
We then flag the remaining customers (who had transaction events in history) and
- no transaction events in future as 1 ( = "will churn"),
- some transaction events in future as 0 ( = "will not churn"),

Click on the recipe below to see the entire code.

Will a customer spend more than a certain amount on a category?

Let's look at a more complex use case: targeting customers most likely to purchase products from a list categories and spend above certain threshold. The steps here will be as follow:

First exclude all purchase events outside the focus product category,
Sum the predicted spend amount (appropriate column of transaction events in future) by customer
Compare with the expected threshold and flag as 0 if the sum is below it, and as 1 if equal to it or above,

Click on the recipe below to see the entire code.