Predict Course Completion Without Premium
Task type: BinaryClassificationTask
Industry: EdTech / Online Learning
Understanding which learners can complete courses without premium features helps product teams identify where the free tier is sufficient and where premium adds genuine value. This insight drives pricing strategy, feature gating decisions, and targeted upgrade campaigns for users who would genuinely benefit from premium.
What makes this advanced? Multi-condition timestamp matching — combines course milestone tracking, subscription purchase timestamps, and time-window comparisons using numpy broadcasting.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
milestones,subscriptions
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
The function must return one of:
np.array([1], dtype=np.float32)— positive casenp.array([0], dtype=np.float32)— negative caseNone— exclude this entity from training
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
from monad.constants import SECONDS_PER_DAY
# === Configuration ===
TARGET_WINDOW_DAYS = 90
PREMIUM_WINDOW_DAYS = 30
MILESTONES_DATA_SOURCE = "milestones"
SUBSCRIPTIONS_DATA_SOURCE = "subscriptions"
def course_completion_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Predict course completion without premium purchase."""
if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
return None
split_ts = ctx[SPLIT_TIMESTAMP]
# 1. Get premium subscription purchase timestamps
history_premium_ts = (
history[SUBSCRIPTIONS_DATA_SOURCE]
.filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
.filter(by="status", condition=lambda v: v == "ACTIVE")
.timestamps
)
future_premium_ts = (
future[SUBSCRIPTIONS_DATA_SOURCE]
.filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
.filter(by="status", condition=lambda v: v == "ACTIVE")
.timestamps
)
# 2. Find courses completed within the target window
completed = (
future[MILESTONES_DATA_SOURCE]
.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS))
.filter(by="milestone_type", condition=lambda v: v == "course_complete")
)
counts, course_ids = completed.groupBy(by="course_id").count()
if counts.size == 0:
return np.array([0], dtype=np.float32)
# 3. For each completed course, check premium purchase timing
all_not_applicable = True
for cnt, c_id in zip(counts, course_ids):
if history_premium_ts.size == 0 and future_premium_ts.size == 0:
if cnt > 0:
return np.array([1], dtype=np.float32)
else:
# Find level 1 completion timestamps for this course
lvl1_ts = np.concatenate([
history[MILESTONES_DATA_SOURCE]
.filter(by="course_id", condition=lambda v: v == c_id)
.filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
.timestamps,
future[MILESTONES_DATA_SOURCE]
.filter(by="course_id", condition=lambda v: v == c_id)
.filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
.timestamps,
])
if lvl1_ts.size > 0:
# Check if premium was purchased within 30 days of level 1
future_diffs = future_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]
history_diffs = history_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]
if np.any((future_diffs > 0) & (future_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
all_not_applicable = False
elif np.any((history_diffs > 0) & (history_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
continue # Premium was bought before split — skip this course
else:
if cnt > 0:
return np.array([1], dtype=np.float32)
all_not_applicable = False
else:
if cnt > 0:
return np.array([1], dtype=np.float32)
all_not_applicable = False
if all_not_applicable:
return None
return np.array([0], dtype=np.float32)
Step-by-Step Breakdown
① Get premium subscription timestamps
history_premium_ts = (
history[SUBSCRIPTIONS_DATA_SOURCE]
.filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
.filter(by="status", condition=lambda v: v == "ACTIVE")
.timestamps
)
future_premium_ts = (
future[SUBSCRIPTIONS_DATA_SOURCE]
.filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
.filter(by="status", condition=lambda v: v == "ACTIVE")
.timestamps
)
Premium subscription events are collected from both history and future, filtered by type and active status. These timestamps are needed to determine whether a premium purchase occurred within the critical window around course milestones.
② Find completed courses in window
completed = (
future[MILESTONES_DATA_SOURCE]
.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS))
.filter(by="milestone_type", condition=lambda v: v == "course_complete")
)
counts, course_ids = completed.groupBy(by="course_id").count()
Course completion milestones within the 90-day target window are grouped by course ID. If no courses were completed, the user is labeled negative immediately.
③ For each course check premium timing using numpy broadcasting
lvl1_ts = np.concatenate([
history[MILESTONES_DATA_SOURCE]
.filter(by="course_id", condition=lambda v: v == c_id)
.filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
.timestamps,
future[MILESTONES_DATA_SOURCE]
.filter(by="course_id", condition=lambda v: v == c_id)
.filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
.timestamps,
])
if lvl1_ts.size > 0:
future_diffs = future_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]
history_diffs = history_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]
For each completed course, the function finds the level 1 completion timestamps and uses numpy broadcasting to compute pairwise time differences between all premium purchase timestamps and all level 1 milestones. This efficiently checks whether any premium purchase fell within the 30-day window after any level 1 completion.
④ Apply multi-condition logic
if np.any((future_diffs > 0) & (future_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
all_not_applicable = False
elif np.any((history_diffs > 0) & (history_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
continue # Premium was bought before split — skip this course
else:
if cnt > 0:
return np.array([1], dtype=np.float32)
all_not_applicable = False
Three cases are handled: (1) if premium was purchased after the split within 30 days of level 1, the course is marked as premium-assisted; (2) if premium was purchased before the split within 30 days of level 1, the course is skipped since it cannot be predicted; (3) if no premium purchase is linked to the course, a completion counts as a positive label. The all_not_applicable flag ensures that entities where all courses fall into the skip category are excluded from training.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, BinaryClassificationTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=BinaryClassificationTask(),
target_fn=course_completion_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
],
metric_to_monitor="val_auroc_0",
metric_monitoring_mode=MetricMonitoringMode.MAX,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC"),
MetricParams(alias="auprc", metric_name="AveragePrecision"),
MetricParams(alias="recall", metric_name="Recall"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| AUROC | Measures overall ranking quality. |
| AUPRC | More informative when the positive class is rare. |
| Recall | Proportion of actual positives caught. |
| Precision | Proportion of predicted positives that are correct. |
| F1 Score | Harmonic mean of precision and recall. |
Production Tips
- Adjust the premium window to match your conversion funnel. 30 days after level 1 is a starting point, but analyze your actual premium conversion timing to find the window that best captures upgrade decisions triggered by course difficulty.
- Consider course difficulty tiers. Easy courses may be completable without premium by most users, while advanced courses may require premium features. Segment predictions by course difficulty for more actionable insights.
- Monitor for free trial effects. Users on free premium trials may appear as non-premium completers. Filter out or separately handle trial periods to avoid contaminating the training signal.
- Validate the level 1 milestone assumption. Level 1 completion is used as the trigger point for premium purchase timing. If your platform has a different critical decision point, adjust the milestone type accordingly.
- Watch for curriculum changes. Course content updates can shift completion difficulty and premium necessity. Retrain the model after significant curriculum changes to maintain prediction accuracy.