Skip to content

Predict Training Duration Excluding Short Sessions

Task type: RegressionTask Industry: Fitness / Wellness

Short workout sessions (under 10 minutes) are often accidental app opens, warm-ups logged separately, or data noise. By filtering them out and predicting only meaningful training volume, fitness platforms can forecast user engagement more accurately, identify users at risk of dropping off, and tailor workout recommendations to match predicted activity levels.

What makes this advanced? Duration filtering + sum aggregation — filters out sessions below a threshold, then sums remaining durations within the target window.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): workout_logs with a duration_minutes column

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

For regression tasks, the function must return one of:

  • np.array([value], dtype=np.float32) — the predicted continuous value (total training minutes).
  • Noneexclude this entity (e.g., incomplete data).

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window


# === Configuration ===
TARGET_WINDOW_DAYS = 30
ACTIVITY_DATA_SOURCE = "workout_logs"
MIN_SESSION_MINUTES = 10

def training_duration_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict total training time (min), excluding sessions < 10 min."""

    if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
        return None

    filtered = future[ACTIVITY_DATA_SOURCE].filter(
        by="duration_minutes", condition=lambda v: v >= MIN_SESSION_MINUTES
    )
    filtered = filtered.interval_from(
        ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
    )

    return np.array([filtered.sum(column="duration_minutes")], dtype=np.float32)

Step-by-Step Breakdown

① Validate the training window

Python
if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
    return None

Ensures 30 days of future data are available for consistent labeling.

② Filter out short sessions

Python
filtered = future[ACTIVITY_DATA_SOURCE].filter(
    by="duration_minutes", condition=lambda v: v >= MIN_SESSION_MINUTES
)

Sessions shorter than 10 minutes are removed. This eliminates noise from accidental tracking, brief stretching logged as workouts, and other non-meaningful entries that would inflate the training volume estimate.

③ Trim to the target window

Python
filtered = filtered.interval_from(
    ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)

After duration filtering, the events are restricted to the 30-day window. Filtering before windowing ensures that even sessions at the window boundaries are evaluated against the duration threshold.

④ Sum and return

Python
return np.array([filtered.sum(column="duration_minutes")], dtype=np.float32)

The .sum() method aggregates the duration_minutes column across all qualifying sessions. A user with three 30-minute workouts would have a target of 90.0. A user with no qualifying sessions would have a target of 0.0 — this is a valid regression value, not excluded.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, RegressionTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=RegressionTask(num_targets=1),
    target_fn=training_duration_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
    metric_to_monitor="val_mae_0",
    metric_monitoring_mode=MetricMonitoringMode.MIN,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
MAE Average absolute error — intuitive and robust to outliers.
RMSE Penalises large errors more heavily than MAE.
Proportion of variance explained by the model.
MAPE Percentage-based error — useful for comparing across scales.

Production Tips

  1. Calibrate the minimum session threshold. 10 minutes is a reasonable default, but analyze your session duration distribution. If your app tracks stretching or cool-downs separately, a lower threshold may be appropriate.
  2. Segment by activity type. Running, weight training, and yoga have very different typical durations. Consider building activity-specific models for more accurate predictions.
  3. Use predictions for engagement scoring. Combine predicted training volume with actual volume to create an engagement score: users consistently below their predicted volume may be disengaging.
  4. Watch for seasonal patterns. Gym attendance spikes in January and drops in summer. Retrain quarterly to capture these cycles.
  5. Consider log-transforming the target. Training duration distributions are typically right-skewed. Applying np.log1p can stabilise training and improve prediction accuracy.