Skip to content

Predict Course Completion Without Premium

Task type: BinaryClassificationTask Industry: EdTech / Online Learning

Understanding which learners can complete courses without premium features helps product teams identify where the free tier is sufficient and where premium adds genuine value. This insight drives pricing strategy, feature gating decisions, and targeted upgrade campaigns for users who would genuinely benefit from premium.

What makes this advanced? Multi-condition timestamp matching — combines course milestone tracking, subscription purchase timestamps, and time-window comparisons using numpy broadcasting.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): milestones, subscriptions

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

The function must return one of:

  • np.array([1], dtype=np.float32)positive case
  • np.array([0], dtype=np.float32)negative case
  • Noneexclude this entity from training

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window

from monad.constants import SECONDS_PER_DAY

# === Configuration ===
TARGET_WINDOW_DAYS = 90
PREMIUM_WINDOW_DAYS = 30
MILESTONES_DATA_SOURCE = "milestones"
SUBSCRIPTIONS_DATA_SOURCE = "subscriptions"

def course_completion_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict course completion without premium purchase."""

    if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
        return None

    split_ts = ctx[SPLIT_TIMESTAMP]

    # 1. Get premium subscription purchase timestamps
    history_premium_ts = (
        history[SUBSCRIPTIONS_DATA_SOURCE]
        .filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
        .filter(by="status", condition=lambda v: v == "ACTIVE")
        .timestamps
    )
    future_premium_ts = (
        future[SUBSCRIPTIONS_DATA_SOURCE]
        .filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
        .filter(by="status", condition=lambda v: v == "ACTIVE")
        .timestamps
    )

    # 2. Find courses completed within the target window
    completed = (
        future[MILESTONES_DATA_SOURCE]
        .interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS))
        .filter(by="milestone_type", condition=lambda v: v == "course_complete")
    )
    counts, course_ids = completed.groupBy(by="course_id").count()

    if counts.size == 0:
        return np.array([0], dtype=np.float32)

    # 3. For each completed course, check premium purchase timing
    all_not_applicable = True
    for cnt, c_id in zip(counts, course_ids):
        if history_premium_ts.size == 0 and future_premium_ts.size == 0:
            if cnt > 0:
                return np.array([1], dtype=np.float32)
        else:

            # Find level 1 completion timestamps for this course
            lvl1_ts = np.concatenate([
                history[MILESTONES_DATA_SOURCE]
                .filter(by="course_id", condition=lambda v: v == c_id)
                .filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
                .timestamps,
                future[MILESTONES_DATA_SOURCE]
                .filter(by="course_id", condition=lambda v: v == c_id)
                .filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
                .timestamps,
            ])

            if lvl1_ts.size > 0:

                # Check if premium was purchased within 30 days of level 1
                future_diffs = future_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]
                history_diffs = history_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]

                if np.any((future_diffs > 0) & (future_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
                    all_not_applicable = False
                elif np.any((history_diffs > 0) & (history_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
                    continue  # Premium was bought before split — skip this course
                else:
                    if cnt > 0:
                        return np.array([1], dtype=np.float32)
                    all_not_applicable = False
            else:
                if cnt > 0:
                    return np.array([1], dtype=np.float32)
                all_not_applicable = False

    if all_not_applicable:
        return None
    return np.array([0], dtype=np.float32)

Step-by-Step Breakdown

① Get premium subscription timestamps

Python
history_premium_ts = (
    history[SUBSCRIPTIONS_DATA_SOURCE]
    .filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
    .filter(by="status", condition=lambda v: v == "ACTIVE")
    .timestamps
)
future_premium_ts = (
    future[SUBSCRIPTIONS_DATA_SOURCE]
    .filter(by="subscription_type", condition=lambda v: v == "PREMIUM")
    .filter(by="status", condition=lambda v: v == "ACTIVE")
    .timestamps
)

Premium subscription events are collected from both history and future, filtered by type and active status. These timestamps are needed to determine whether a premium purchase occurred within the critical window around course milestones.

② Find completed courses in window

Python
completed = (
    future[MILESTONES_DATA_SOURCE]
    .interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS))
    .filter(by="milestone_type", condition=lambda v: v == "course_complete")
)
counts, course_ids = completed.groupBy(by="course_id").count()

Course completion milestones within the 90-day target window are grouped by course ID. If no courses were completed, the user is labeled negative immediately.

③ For each course check premium timing using numpy broadcasting

Python
lvl1_ts = np.concatenate([
    history[MILESTONES_DATA_SOURCE]
    .filter(by="course_id", condition=lambda v: v == c_id)
    .filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
    .timestamps,
    future[MILESTONES_DATA_SOURCE]
    .filter(by="course_id", condition=lambda v: v == c_id)
    .filter(by="milestone_type", condition=lambda v: v == "level_1_complete")
    .timestamps,
])

if lvl1_ts.size > 0:
    future_diffs = future_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]
    history_diffs = history_premium_ts[:, np.newaxis] - lvl1_ts[np.newaxis, :]

For each completed course, the function finds the level 1 completion timestamps and uses numpy broadcasting to compute pairwise time differences between all premium purchase timestamps and all level 1 milestones. This efficiently checks whether any premium purchase fell within the 30-day window after any level 1 completion.

④ Apply multi-condition logic

Python
if np.any((future_diffs > 0) & (future_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
    all_not_applicable = False
elif np.any((history_diffs > 0) & (history_diffs < PREMIUM_WINDOW_DAYS * SECONDS_PER_DAY)):
    continue  # Premium was bought before split — skip this course
else:
    if cnt > 0:
        return np.array([1], dtype=np.float32)
    all_not_applicable = False

Three cases are handled: (1) if premium was purchased after the split within 30 days of level 1, the course is marked as premium-assisted; (2) if premium was purchased before the split within 30 days of level 1, the course is skipped since it cannot be predicted; (3) if no premium purchase is linked to the course, a completion counts as a positive label. The all_not_applicable flag ensures that entities where all courses fall into the skip category are excluded from training.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, BinaryClassificationTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=BinaryClassificationTask(),
    target_fn=course_completion_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
        MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
        MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="recall", metric_name="Recall"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
AUROC Measures overall ranking quality.
AUPRC More informative when the positive class is rare.
Recall Proportion of actual positives caught.
Precision Proportion of predicted positives that are correct.
F1 Score Harmonic mean of precision and recall.

Production Tips

  1. Adjust the premium window to match your conversion funnel. 30 days after level 1 is a starting point, but analyze your actual premium conversion timing to find the window that best captures upgrade decisions triggered by course difficulty.
  2. Consider course difficulty tiers. Easy courses may be completable without premium by most users, while advanced courses may require premium features. Segment predictions by course difficulty for more actionable insights.
  3. Monitor for free trial effects. Users on free premium trials may appear as non-premium completers. Filter out or separately handle trial periods to avoid contaminating the training signal.
  4. Validate the level 1 milestone assumption. Level 1 completion is used as the trigger point for premium purchase timing. If your platform has a different critical decision point, adjust the milestone type accordingly.
  5. Watch for curriculum changes. Course content updates can shift completion difficulty and premium necessity. Retrain the model after significant curriculum changes to maintain prediction accuracy.