Predict Training Duration Excluding Short Sessions
Task type: RegressionTask
Industry: Fitness / Wellness
Short workout sessions (under 10 minutes) are often accidental app opens, warm-ups logged separately, or data noise. By filtering them out and predicting only meaningful training volume, fitness platforms can forecast user engagement more accurately, identify users at risk of dropping off, and tailor workout recommendations to match predicted activity levels.
What makes this advanced? Duration filtering + sum aggregation — filters out sessions below a threshold, then sums remaining durations within the target window.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
workout_logswith aduration_minutescolumn
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
For regression tasks, the function must return one of:
np.array([value], dtype=np.float32)— the predicted continuous value (total training minutes).None— exclude this entity (e.g., incomplete data).
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
# === Configuration ===
TARGET_WINDOW_DAYS = 30
ACTIVITY_DATA_SOURCE = "workout_logs"
MIN_SESSION_MINUTES = 10
def training_duration_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Predict total training time (min), excluding sessions < 10 min."""
if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
return None
filtered = future[ACTIVITY_DATA_SOURCE].filter(
by="duration_minutes", condition=lambda v: v >= MIN_SESSION_MINUTES
)
filtered = filtered.interval_from(
ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)
return np.array([filtered.sum(column="duration_minutes")], dtype=np.float32)
Step-by-Step Breakdown
① Validate the training window
Ensures 30 days of future data are available for consistent labeling.
② Filter out short sessions
filtered = future[ACTIVITY_DATA_SOURCE].filter(
by="duration_minutes", condition=lambda v: v >= MIN_SESSION_MINUTES
)
Sessions shorter than 10 minutes are removed. This eliminates noise from accidental tracking, brief stretching logged as workouts, and other non-meaningful entries that would inflate the training volume estimate.
③ Trim to the target window
After duration filtering, the events are restricted to the 30-day window. Filtering before windowing ensures that even sessions at the window boundaries are evaluated against the duration threshold.
④ Sum and return
The .sum() method aggregates the duration_minutes column across all qualifying sessions. A user with three 30-minute workouts would have a target of 90.0. A user with no qualifying sessions would have a target of 0.0 — this is a valid regression value, not excluded.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, RegressionTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=RegressionTask(num_targets=1),
target_fn=training_duration_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
metric_to_monitor="val_mae_0",
metric_monitoring_mode=MetricMonitoringMode.MIN,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| MAE | Average absolute error — intuitive and robust to outliers. |
| RMSE | Penalises large errors more heavily than MAE. |
| R² | Proportion of variance explained by the model. |
| MAPE | Percentage-based error — useful for comparing across scales. |
Production Tips
- Calibrate the minimum session threshold. 10 minutes is a reasonable default, but analyze your session duration distribution. If your app tracks stretching or cool-downs separately, a lower threshold may be appropriate.
- Segment by activity type. Running, weight training, and yoga have very different typical durations. Consider building activity-specific models for more accurate predictions.
- Use predictions for engagement scoring. Combine predicted training volume with actual volume to create an engagement score: users consistently below their predicted volume may be disengaging.
- Watch for seasonal patterns. Gym attendance spikes in January and drops in summer. Retrain quarterly to capture these cycles.
- Consider log-transforming the target. Training duration distributions are typically right-skewed. Applying
np.log1pcan stabilise training and improve prediction accuracy.