Predict In-Game Purchase for New Players
Task type: BinaryClassificationTask
Industry: Gaming / Mobile Games
The first few sessions are critical for monetisation in free-to-play games. Identifying new players who are likely to make their first in-game purchase allows game designers to optimize onboarding flows, and marketing teams to deliver personalized offers at the moment of highest purchase intent — before the player loses interest.
What makes this advanced? Lifecycle and session counting — tracks player progression through tutorial and sessions, counts sessions to define "new player" boundary, filters future purchases within that session window.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
sessions,in_game_purchases,tutorial_completion
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
The function must return one of:
np.array([1], dtype=np.float32)— positive casenp.array([0], dtype=np.float32)— negative caseNone— exclude this entity from training
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
# === Configuration ===
MAX_SESSIONS_NEW_PLAYER = 5
SESSIONS_DATA_SOURCE = "sessions"
PURCHASES_DATA_SOURCE = "in_game_purchases"
TUTORIAL_DATA_SOURCE = "tutorial_completion"
def in_game_purchase_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Predict if a new player makes an in-game purchase within first 5 sessions."""
# 1. Exclude players who haven't completed the tutorial
if (
len(history[TUTORIAL_DATA_SOURCE]) + len(future[TUTORIAL_DATA_SOURCE]) == 0
):
return None
# 2. Only target new players (fewer than 5 sessions so far)
sessions_so_far = len(history[SESSIONS_DATA_SOURCE])
if sessions_so_far >= MAX_SESSIONS_NEW_PLAYER:
return None
# 3. If already purchased in history, label as positive
if len(history[PURCHASES_DATA_SOURCE]) > 0:
return np.array([1], dtype=np.float32)
# 4. Ensure enough future sessions exist to reach 5 total
remaining_sessions = MAX_SESSIONS_NEW_PLAYER - sessions_so_far
if len(future[SESSIONS_DATA_SOURCE]) < remaining_sessions:
return None
# 5. Check for purchases between first and last qualifying session
start = future[SESSIONS_DATA_SOURCE].timestamps[0]
end = future[SESSIONS_DATA_SOURCE].timestamps[:remaining_sessions][-1]
purchases_in_window = future[PURCHASES_DATA_SOURCE].filter(
by="timestamps", condition=lambda ts: start < ts < end
)
return np.array([1 if len(purchases_in_window) > 0 else 0], dtype=np.float32)
Step-by-Step Breakdown
① Check tutorial completion
if (
len(history[TUTORIAL_DATA_SOURCE]) + len(future[TUTORIAL_DATA_SOURCE]) == 0
):
return None
Players who never completed the tutorial are excluded from training. They did not experience the core game loop, so their purchase behavior is not representative of the target population.
② Filter to new players only
sessions_so_far = len(history[SESSIONS_DATA_SOURCE])
if sessions_so_far >= MAX_SESSIONS_NEW_PLAYER:
return None
Only players with fewer than 5 historical sessions qualify as "new." Experienced players are excluded because the model specifically targets early-lifecycle monetisation.
③ Handle early purchasers
If the player has already made a purchase before the split timestamp, they are immediately labeled positive. This captures players who converted very early in their lifecycle.
④ Validate session count
remaining_sessions = MAX_SESSIONS_NEW_PLAYER - sessions_so_far
if len(future[SESSIONS_DATA_SOURCE]) < remaining_sessions:
return None
The function requires enough future sessions to reach the 5-session threshold. Players who churned before completing 5 sessions are excluded to avoid labeling dropouts as non-purchasers.
⑤ Check for purchases in session window
start = future[SESSIONS_DATA_SOURCE].timestamps[0]
end = future[SESSIONS_DATA_SOURCE].timestamps[:remaining_sessions][-1]
purchases_in_window = future[PURCHASES_DATA_SOURCE].filter(
by="timestamps", condition=lambda ts: start < ts < end
)
return np.array([1 if len(purchases_in_window) > 0 else 0], dtype=np.float32)
The purchase window spans from the first future session to the session that completes the 5-session threshold. Any in-game purchase within this window results in a positive label.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, BinaryClassificationTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=BinaryClassificationTask(),
target_fn=in_game_purchase_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
],
metric_to_monitor="val_auroc_0",
metric_monitoring_mode=MetricMonitoringMode.MAX,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC"),
MetricParams(alias="auprc", metric_name="AveragePrecision"),
MetricParams(alias="recall", metric_name="Recall"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| AUROC | Measures overall ranking quality. |
| AUPRC | More informative when the positive class is rare. |
| Recall | Proportion of actual positives caught. |
| Precision | Proportion of predicted positives that are correct. |
| F1 Score | Harmonic mean of precision and recall. |
Production Tips
- Tune the session threshold for your game genre. 5 sessions is a starting point, but casual games may see purchases within 2-3 sessions while complex RPGs may need 10+. Analyze your conversion funnel to find the optimal window.
- Consider session duration, not just count. A player who plays 5 sessions of 2 minutes each is very different from one who plays 5 sessions of 30 minutes. Adding session duration as a feature can improve prediction quality.
- Segment by acquisition channel. Players acquired through paid ads vs organic discovery have very different purchase propensities. Consider training separate models or adding the acquisition source as a feature.
- Monitor for tutorial completion rate changes. Game updates that change the tutorial can shift the population of tutorial completers, affecting model performance. Retrain after major tutorial changes.
- Validate against actual revenue data. Cross-reference predicted purchasers with actual revenue to ensure the model captures high-value conversions, not just low-value impulse buys.