Count Days with Long Calls
Task type: RegressionTask
Industry: Telecom
Long phone calls are a strong engagement signal in telecom — they indicate active voice usage and high plan utilisation. Predicting how many days a subscriber will make long calls helps retention teams identify heavy voice users for premium plan offers, and flags subscribers whose long-call frequency is declining as potential churn risks.
What makes this advanced? Duration filtering + unique day counting — filters events by duration threshold, extracts unique calendar days.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
call_logswith aduration_minutescolumn
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
For regression tasks, the function must return one of:
np.array([value], dtype=np.float32)— the predicted continuous value (number of days with long calls).None— exclude this entity (e.g., incomplete data).
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
import pandas as pd
# === Configuration ===
TARGET_WINDOW_DAYS = 30
CALL_LOGS_SOURCE = "call_logs"
MIN_CALL_DURATION = 20
def days_with_long_calls_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Count days with calls exceeding 20 minutes."""
if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
return None
calls = future[CALL_LOGS_SOURCE].interval_from(
ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)
long_calls = calls.filter(
by="duration_minutes", condition=lambda v: v > MIN_CALL_DURATION
)
if len(long_calls) == 0:
return np.array([0], dtype=np.float32)
days = pd.to_datetime(long_calls.timestamps, unit="s").normalize()
unique_days = days.unique()
return np.array([len(unique_days)], dtype=np.float32)
Step-by-Step Breakdown
① Validate the training window
Ensures 30 days of future data are available for consistent labeling.
② Filter to long calls in the target window
calls = future[CALL_LOGS_SOURCE].interval_from(
ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)
long_calls = calls.filter(
by="duration_minutes", condition=lambda v: v > MIN_CALL_DURATION
)
First, call events are trimmed to the 30-day window. Then, the .filter() method retains only calls exceeding 20 minutes. This two-step approach keeps the logic clear and composable.
③ Handle zero long calls
Subscribers with no long calls in the window receive a target of 0 rather than being excluded. This is intentional — zero-day counts are valid and informative regression targets.
④ Count unique calendar days
days = pd.to_datetime(long_calls.timestamps, unit="s").normalize()
unique_days = days.unique()
return np.array([len(unique_days)], dtype=np.float32)
Timestamps are converted to pandas datetime and normalised to midnight (removing time components). .unique() then counts distinct calendar days. A subscriber who made 5 long calls on the same day contributes only 1 to the count — the target measures how many days had long calls, not how many long calls total.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, RegressionTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=RegressionTask(num_targets=1),
target_fn=days_with_long_calls_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
metric_to_monitor="val_mae_0",
metric_monitoring_mode=MetricMonitoringMode.MIN,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
MetricParams(alias="mse", metric_name="MeanSquaredError"),
MetricParams(alias="r2", metric_name="R2Score"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| MAE | Average absolute error — intuitive and robust to outliers. |
| RMSE | Penalises large errors more heavily than MAE. |
| R² | Proportion of variance explained by the model. |
| MAPE | Percentage-based error — useful for comparing across scales. |
Production Tips
- Tune the duration threshold for your market. 20 minutes is a reasonable default, but call behavior varies by region and demographic. Analyze your call duration distribution to find a meaningful threshold.
- Distinguish between call types. If your data includes call categories (personal, business, conference), consider filtering to specific types for more targeted predictions.
- Use predictions for plan recommendations. Subscribers predicted to have many long-call days may benefit from unlimited voice plans, while those predicted to have few may prefer data-heavy plans.
- Normalise by plan type. Unlimited-plan subscribers naturally have more long calls. Consider training separate models or adding plan type as a feature.