Skip to content

Predict New Lesson Duration

Task type: RegressionTask Industry: EdTech / Online Learning

New lesson engagement is a leading indicator of learning progression. Users who are actively consuming new content (rather than revisiting old lessons) are advancing through the curriculum and are less likely to churn. By predicting the total duration of new lessons a user will take, product teams can identify learners who are stalling and intervene with personalized content suggestions or study reminders.

What makes this advanced? Set-based tracking with repeat filtering — maintains a rolling set of seen lesson IDs, skips lessons from history and repeated lessons.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): lesson_logs with lesson_id, duration, and is_repeat columns

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

For regression tasks, the function must return one of:

  • np.array([value], dtype=np.float32) — the predicted continuous value (total new-lesson duration).
  • Noneexclude this entity (e.g., incomplete data).

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window


# === Configuration ===
LESSON_DATA_SOURCE = "lesson_logs"

def new_lesson_duration_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict total duration of new, non-repeated lessons."""

    lessons = future[LESSON_DATA_SOURCE]
    seen_lessons = set(
        np.unique(history[LESSON_DATA_SOURCE]["lesson_id"].events).tolist()
    )

    total_duration = 0.0
    for lesson_id, duration, is_repeat in zip(
        lessons["lesson_id"].events,
        lessons["duration"].events,
        lessons["is_repeat"].events,
    ):
        if lesson_id not in seen_lessons:
            if not is_repeat:
                total_duration += duration
            seen_lessons.add(lesson_id)

    return np.array([total_duration], dtype=np.float32)

Step-by-Step Breakdown

① Build the set of historically seen lessons

Python
seen_lessons = set(
    np.unique(history[LESSON_DATA_SOURCE]["lesson_id"].events).tolist()
)

All lesson IDs from the user's history are collected into a set. np.unique deduplicates, and .tolist() converts to Python types for efficient set operations. Any lesson already in this set is considered "not new".

② Iterate through future lessons with dual filtering

Python
for lesson_id, duration, is_repeat in zip(
    lessons["lesson_id"].events,
    lessons["duration"].events,
    lessons["is_repeat"].events,
):
    if lesson_id not in seen_lessons:
        if not is_repeat:
            total_duration += duration
        seen_lessons.add(lesson_id)

The loop applies two filters simultaneously:

  • Not in history: The lesson_id not in seen_lessons check excludes lessons the user has taken before.
  • Not a repeat: The is_repeat flag (from the data source) catches lessons that are marked as review or practice sessions even if the lesson ID is technically new.

Critically, seen_lessons.add(lesson_id) is called regardless of the is_repeat flag — this prevents the same lesson from being counted multiple times if it appears in the future data more than once.

③ Return the total duration

Python
return np.array([total_duration], dtype=np.float32)

The accumulated duration of all qualifying new lessons is returned. A value of 0.0 is valid (the user took no new lessons). The function never returns None in this design — even zero engagement is a meaningful regression target.

④ Note: no explicit window validation

This target function does not limit the future window. All future lesson data is considered. If you need a bounded prediction horizon, add has_incomplete_training_window and interval_from as shown in other regression recipes.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, RegressionTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=RegressionTask(num_targets=1),
    target_fn=new_lesson_duration_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
    metric_to_monitor="val_mae_0",
    metric_monitoring_mode=MetricMonitoringMode.MIN,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
MAE Average absolute error — intuitive and robust to outliers.
RMSE Penalises large errors more heavily than MAE.
Proportion of variance explained by the model.
MAPE Percentage-based error — useful for comparing across scales.

Production Tips

  1. Define "new" consistently across the platform. Ensure the is_repeat flag is populated reliably. If some lessons lack this flag, default to treating them as non-repeats and rely solely on the seen_lessons set.
  2. Add a target window for more actionable predictions. Without a window, the target can vary wildly. Adding a 30-day or 60-day window makes predictions more comparable and actionable.
  3. Segment by course or difficulty level. New lessons in an advanced course are a stronger engagement signal than introductory content. Consider weighting durations by course difficulty.
  4. Use predictions to personalize learning paths. Users predicted to consume high new-lesson durations can be recommended advanced content, while users with low predicted engagement may need motivational nudges or easier material.