Predict New Lesson Duration
Task type: RegressionTask
Industry: EdTech / Online Learning
New lesson engagement is a leading indicator of learning progression. Users who are actively consuming new content (rather than revisiting old lessons) are advancing through the curriculum and are less likely to churn. By predicting the total duration of new lessons a user will take, product teams can identify learners who are stalling and intervene with personalized content suggestions or study reminders.
What makes this advanced? Set-based tracking with repeat filtering — maintains a rolling set of seen lesson IDs, skips lessons from history and repeated lessons.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
lesson_logswithlesson_id,duration, andis_repeatcolumns
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
For regression tasks, the function must return one of:
np.array([value], dtype=np.float32)— the predicted continuous value (total new-lesson duration).None— exclude this entity (e.g., incomplete data).
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
# === Configuration ===
LESSON_DATA_SOURCE = "lesson_logs"
def new_lesson_duration_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Predict total duration of new, non-repeated lessons."""
lessons = future[LESSON_DATA_SOURCE]
seen_lessons = set(
np.unique(history[LESSON_DATA_SOURCE]["lesson_id"].events).tolist()
)
total_duration = 0.0
for lesson_id, duration, is_repeat in zip(
lessons["lesson_id"].events,
lessons["duration"].events,
lessons["is_repeat"].events,
):
if lesson_id not in seen_lessons:
if not is_repeat:
total_duration += duration
seen_lessons.add(lesson_id)
return np.array([total_duration], dtype=np.float32)
Step-by-Step Breakdown
① Build the set of historically seen lessons
All lesson IDs from the user's history are collected into a set. np.unique deduplicates, and .tolist() converts to Python types for efficient set operations. Any lesson already in this set is considered "not new".
② Iterate through future lessons with dual filtering
for lesson_id, duration, is_repeat in zip(
lessons["lesson_id"].events,
lessons["duration"].events,
lessons["is_repeat"].events,
):
if lesson_id not in seen_lessons:
if not is_repeat:
total_duration += duration
seen_lessons.add(lesson_id)
The loop applies two filters simultaneously:
- Not in history: The
lesson_id not in seen_lessonscheck excludes lessons the user has taken before. - Not a repeat: The
is_repeatflag (from the data source) catches lessons that are marked as review or practice sessions even if the lesson ID is technically new.
Critically, seen_lessons.add(lesson_id) is called regardless of the is_repeat flag — this prevents the same lesson from being counted multiple times if it appears in the future data more than once.
③ Return the total duration
The accumulated duration of all qualifying new lessons is returned. A value of 0.0 is valid (the user took no new lessons). The function never returns None in this design — even zero engagement is a meaningful regression target.
④ Note: no explicit window validation
This target function does not limit the future window. All future lesson data is considered. If you need a bounded prediction horizon, add has_incomplete_training_window and interval_from as shown in other regression recipes.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, RegressionTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=RegressionTask(num_targets=1),
target_fn=new_lesson_duration_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
metric_to_monitor="val_mae_0",
metric_monitoring_mode=MetricMonitoringMode.MIN,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| MAE | Average absolute error — intuitive and robust to outliers. |
| RMSE | Penalises large errors more heavily than MAE. |
| R² | Proportion of variance explained by the model. |
| MAPE | Percentage-based error — useful for comparing across scales. |
Production Tips
- Define "new" consistently across the platform. Ensure the
is_repeatflag is populated reliably. If some lessons lack this flag, default to treating them as non-repeats and rely solely on theseen_lessonsset. - Add a target window for more actionable predictions. Without a window, the target can vary wildly. Adding a 30-day or 60-day window makes predictions more comparable and actionable.
- Segment by course or difficulty level. New lessons in an advanced course are a stronger engagement signal than introductory content. Consider weighting durations by course difficulty.
- Use predictions to personalize learning paths. Users predicted to consume high new-lesson durations can be recommended advanced content, while users with low predicted engagement may need motivational nudges or easier material.