Skip to content

Predict In-Game Purchase for New Players

Task type: BinaryClassificationTask Industry: Gaming / Mobile Games

The first few sessions are critical for monetisation in free-to-play games. Identifying new players who are likely to make their first in-game purchase allows game designers to optimize onboarding flows, and marketing teams to deliver personalized offers at the moment of highest purchase intent — before the player loses interest.

What makes this advanced? Lifecycle and session counting — tracks player progression through tutorial and sessions, counts sessions to define "new player" boundary, filters future purchases within that session window.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): sessions, in_game_purchases, tutorial_completion

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

The function must return one of:

  • np.array([1], dtype=np.float32)positive case
  • np.array([0], dtype=np.float32)negative case
  • Noneexclude this entity from training

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window


# === Configuration ===
MAX_SESSIONS_NEW_PLAYER = 5
SESSIONS_DATA_SOURCE = "sessions"
PURCHASES_DATA_SOURCE = "in_game_purchases"
TUTORIAL_DATA_SOURCE = "tutorial_completion"

def in_game_purchase_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict if a new player makes an in-game purchase within first 5 sessions."""

    # 1. Exclude players who haven't completed the tutorial
    if (
        len(history[TUTORIAL_DATA_SOURCE]) + len(future[TUTORIAL_DATA_SOURCE]) == 0
    ):
        return None

    # 2. Only target new players (fewer than 5 sessions so far)
    sessions_so_far = len(history[SESSIONS_DATA_SOURCE])
    if sessions_so_far >= MAX_SESSIONS_NEW_PLAYER:
        return None

    # 3. If already purchased in history, label as positive
    if len(history[PURCHASES_DATA_SOURCE]) > 0:
        return np.array([1], dtype=np.float32)

    # 4. Ensure enough future sessions exist to reach 5 total
    remaining_sessions = MAX_SESSIONS_NEW_PLAYER - sessions_so_far
    if len(future[SESSIONS_DATA_SOURCE]) < remaining_sessions:
        return None

    # 5. Check for purchases between first and last qualifying session
    start = future[SESSIONS_DATA_SOURCE].timestamps[0]
    end = future[SESSIONS_DATA_SOURCE].timestamps[:remaining_sessions][-1]

    purchases_in_window = future[PURCHASES_DATA_SOURCE].filter(
        by="timestamps", condition=lambda ts: start < ts < end
    )

    return np.array([1 if len(purchases_in_window) > 0 else 0], dtype=np.float32)

Step-by-Step Breakdown

① Check tutorial completion

Python
if (
    len(history[TUTORIAL_DATA_SOURCE]) + len(future[TUTORIAL_DATA_SOURCE]) == 0
):
    return None

Players who never completed the tutorial are excluded from training. They did not experience the core game loop, so their purchase behavior is not representative of the target population.

② Filter to new players only

Python
sessions_so_far = len(history[SESSIONS_DATA_SOURCE])
if sessions_so_far >= MAX_SESSIONS_NEW_PLAYER:
    return None

Only players with fewer than 5 historical sessions qualify as "new." Experienced players are excluded because the model specifically targets early-lifecycle monetisation.

③ Handle early purchasers

Python
if len(history[PURCHASES_DATA_SOURCE]) > 0:
    return np.array([1], dtype=np.float32)

If the player has already made a purchase before the split timestamp, they are immediately labeled positive. This captures players who converted very early in their lifecycle.

④ Validate session count

Python
remaining_sessions = MAX_SESSIONS_NEW_PLAYER - sessions_so_far
if len(future[SESSIONS_DATA_SOURCE]) < remaining_sessions:
    return None

The function requires enough future sessions to reach the 5-session threshold. Players who churned before completing 5 sessions are excluded to avoid labeling dropouts as non-purchasers.

⑤ Check for purchases in session window

Python
start = future[SESSIONS_DATA_SOURCE].timestamps[0]
end = future[SESSIONS_DATA_SOURCE].timestamps[:remaining_sessions][-1]

purchases_in_window = future[PURCHASES_DATA_SOURCE].filter(
    by="timestamps", condition=lambda ts: start < ts < end
)

return np.array([1 if len(purchases_in_window) > 0 else 0], dtype=np.float32)

The purchase window spans from the first future session to the session that completes the 5-session threshold. Any in-game purchase within this window results in a positive label.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, BinaryClassificationTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=BinaryClassificationTask(),
    target_fn=in_game_purchase_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
        MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
        MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="recall", metric_name="Recall"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
AUROC Measures overall ranking quality.
AUPRC More informative when the positive class is rare.
Recall Proportion of actual positives caught.
Precision Proportion of predicted positives that are correct.
F1 Score Harmonic mean of precision and recall.

Production Tips

  1. Tune the session threshold for your game genre. 5 sessions is a starting point, but casual games may see purchases within 2-3 sessions while complex RPGs may need 10+. Analyze your conversion funnel to find the optimal window.
  2. Consider session duration, not just count. A player who plays 5 sessions of 2 minutes each is very different from one who plays 5 sessions of 30 minutes. Adding session duration as a feature can improve prediction quality.
  3. Segment by acquisition channel. Players acquired through paid ads vs organic discovery have very different purchase propensities. Consider training separate models or adding the acquisition source as a feature.
  4. Monitor for tutorial completion rate changes. Game updates that change the tutorial can shift the population of tutorial completers, affecting model performance. Retrain after major tutorial changes.
  5. Validate against actual revenue data. Cross-reference predicted purchasers with actual revenue to ensure the model captures high-value conversions, not just low-value impulse buys.