Skip to content

Predict Peak Daily Mobile Data Usage

Task type: RegressionTask Industry: Telecom

Network congestion is driven by peak usage, not average usage. By predicting each subscriber's peak daily data consumption, network planning teams can anticipate capacity bottlenecks, trigger proactive data-plan upgrade offers, and identify subscribers at risk of exceeding their plan limits — all before the spike actually happens.

What makes this advanced? Pandas daily groupby — converts timestamps to datetime, groups by day, finds maximum daily sum.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): data_usage with a mb_used column

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

For regression tasks, the function must return one of:

  • np.array([value], dtype=np.float32) — the predicted continuous value (peak daily MB usage).
  • Noneexclude this entity (e.g., incomplete data).

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window

import pandas as pd

# === Configuration ===
TARGET_WINDOW_DAYS = 30
DATA_USAGE_SOURCE = "data_usage"

def peak_daily_usage_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict highest daily mobile data usage in 30 days."""

    if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
        return None

    usage = future[DATA_USAGE_SOURCE].interval_from(
        ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
    )

    if len(usage) == 0:
        return np.array([0], dtype=np.float32)

    df = pd.DataFrame({
        "timestamp": pd.to_datetime(usage.timestamps, unit="s"),
        "mb_used": usage["mb_used"].events,
    })
    daily = df.groupby(pd.Grouper(key="timestamp", freq="D")).sum()

    return np.array([daily["mb_used"].max()], dtype=np.float32)

Step-by-Step Breakdown

① Validate the training window

Python
if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
    return None

Ensures 30 days of future data are available. Shorter windows would underestimate the true peak.

② Extract usage events in the target window

Python
usage = future[DATA_USAGE_SOURCE].interval_from(
    ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)

Restricts data usage events to the 30-day observation window.

③ Build a pandas DataFrame for daily aggregation

Python
df = pd.DataFrame({
    "timestamp": pd.to_datetime(usage.timestamps, unit="s"),
    "mb_used": usage["mb_used"].events,
})
daily = df.groupby(pd.Grouper(key="timestamp", freq="D")).sum()

Unix timestamps are converted to pandas datetime for calendar-aware grouping. pd.Grouper(freq="D") groups events by calendar day, and .sum() totals the MB used per day. This handles multiple usage sessions per day correctly.

④ Return the peak daily value

Python
return np.array([daily["mb_used"].max()], dtype=np.float32)

.max() across the daily totals yields the single highest-usage day in the window. This is the regression target — the peak day, not the average.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, RegressionTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=RegressionTask(num_targets=1),
    target_fn=peak_daily_usage_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
    metric_to_monitor="val_mae_0",
    metric_monitoring_mode=MetricMonitoringMode.MIN,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
MAE Average absolute error — intuitive and robust to outliers.
RMSE Penalises large errors more heavily than MAE.
Proportion of variance explained by the model.
MAPE Percentage-based error — useful for comparing across scales.

Production Tips

  1. Distinguish Wi-Fi from cellular usage. If your data includes a connection-type column, filter to cellular-only data for network capacity planning. Wi-Fi usage does not impact your network.
  2. Consider percentile-based targets. Instead of the absolute peak (which may be an outlier), use the 95th percentile daily usage for a more stable regression target.
  3. Use predictions for proactive plan upgrades. Subscribers predicted to hit a high peak can receive targeted data-plan upgrade offers before they experience throttling or overage charges.
  4. Account for time-zone differences. "Daily" aggregation depends on the time zone. Use the subscriber's local time zone if available, or default to a consistent reference.