Regression Target Function

Predicting a numerical value

⚠️

Check This First!

This article refers to BaseModel accessed via Docker container. Please refer to Snowflake Native App section if you are using BaseModel as SF GUI application.


In this subpage we will look at some examples of target functions for Regression, a statistical technique used to predict a continuous outcome variable based on one or more independent variables (often referred to as predictors or features). Regression also helps in understanding the relationship between variables and forecasting future trends.

This type of modeling is extensively used across various fields such as finance, marketing, healthcare, and social sciences for forecasting, trend analysis, and decision making. Here are a few use cases use cases where we may want to apply regression modeling in the context of behavioral data:

  • In retail or gaming, assess Customer Lifetime Value based on former engagement and purchase behavior,
  • In social media / digital platforms, predict user engagement (number of posts, likes, comments etc.),
  • In finance, forecast retirement savings based on investment and spending behavior,
  • In insurance, forecast claim amounts based on past events.

In all these cases we want to predict particular numerical value rather than classes, or products etc.

Standard Template for Regression Target Functions

Each function for classification problem will:

  • accept as parameters history, future, entity and ctx, as described here,
  • perform some transformation on these inputs as explained in this section
  • output a one-dimensional numpy array of float32 data type and a size of 1 (the value to predict).

def target_fn(_history: Events, future: Events, _entity: Attributes, _ctx: Dict) -> np.ndarray:
    
    # transformation of events into the desired target

    return np.array(target, dtype=np.float32)

We will now explore the transformations looking at three examples of functions for regression problems.

Predict customer spend value, number of purchased items or shopping visits

In regression problems, rather than 1s, 0s, or probabilities, we expect the model to come with a particular value. Thus, what we need to give to it as the target, is either the specific column with the value we seek, or the way to calculate that value. This is how we would implement it in the code in this case:

  • Assumption: we have trained a foundation model that leveraged an event data source containing customer purchases; it contains basket_id (referring to distinct visits), item quantity and price.
  • For quantity, we can simply point the model to the column representing number of items in future events,
  • Since we do not have the total spend as a column, we need to multiply price and quantity columns cast as numpy arrays, and sum it to get the total monetary amount,
  • Finally, we can use numpy's unique() to target the model on predicting number of visits.

Check the detailed recipe for all three functions here:

📘