Predict Peak Daily Mobile Data Usage
Task type: RegressionTask
Industry: Telecom
Network congestion is driven by peak usage, not average usage. By predicting each subscriber's peak daily data consumption, network planning teams can anticipate capacity bottlenecks, trigger proactive data-plan upgrade offers, and identify subscribers at risk of exceeding their plan limits — all before the spike actually happens.
What makes this advanced? Pandas daily groupby — converts timestamps to datetime, groups by day, finds maximum daily sum.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
data_usagewith amb_usedcolumn
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
For regression tasks, the function must return one of:
np.array([value], dtype=np.float32)— the predicted continuous value (peak daily MB usage).None— exclude this entity (e.g., incomplete data).
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
import pandas as pd
# === Configuration ===
TARGET_WINDOW_DAYS = 30
DATA_USAGE_SOURCE = "data_usage"
def peak_daily_usage_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Predict highest daily mobile data usage in 30 days."""
if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
return None
usage = future[DATA_USAGE_SOURCE].interval_from(
ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)
if len(usage) == 0:
return np.array([0], dtype=np.float32)
df = pd.DataFrame({
"timestamp": pd.to_datetime(usage.timestamps, unit="s"),
"mb_used": usage["mb_used"].events,
})
daily = df.groupby(pd.Grouper(key="timestamp", freq="D")).sum()
return np.array([daily["mb_used"].max()], dtype=np.float32)
Step-by-Step Breakdown
① Validate the training window
Ensures 30 days of future data are available. Shorter windows would underestimate the true peak.
② Extract usage events in the target window
usage = future[DATA_USAGE_SOURCE].interval_from(
ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)
Restricts data usage events to the 30-day observation window.
③ Build a pandas DataFrame for daily aggregation
df = pd.DataFrame({
"timestamp": pd.to_datetime(usage.timestamps, unit="s"),
"mb_used": usage["mb_used"].events,
})
daily = df.groupby(pd.Grouper(key="timestamp", freq="D")).sum()
Unix timestamps are converted to pandas datetime for calendar-aware grouping. pd.Grouper(freq="D") groups events by calendar day, and .sum() totals the MB used per day. This handles multiple usage sessions per day correctly.
④ Return the peak daily value
.max() across the daily totals yields the single highest-usage day in the window. This is the regression target — the peak day, not the average.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, RegressionTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=RegressionTask(num_targets=1),
target_fn=peak_daily_usage_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
metric_to_monitor="val_mae_0",
metric_monitoring_mode=MetricMonitoringMode.MIN,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| MAE | Average absolute error — intuitive and robust to outliers. |
| RMSE | Penalises large errors more heavily than MAE. |
| R² | Proportion of variance explained by the model. |
| MAPE | Percentage-based error — useful for comparing across scales. |
Production Tips
- Distinguish Wi-Fi from cellular usage. If your data includes a connection-type column, filter to cellular-only data for network capacity planning. Wi-Fi usage does not impact your network.
- Consider percentile-based targets. Instead of the absolute peak (which may be an outlier), use the 95th percentile daily usage for a more stable regression target.
- Use predictions for proactive plan upgrades. Subscribers predicted to hit a high peak can receive targeted data-plan upgrade offers before they experience throttling or overage charges.
- Account for time-zone differences. "Daily" aggregation depends on the time zone. Use the subscriber's local time zone if available, or default to a consistent reference.