Count Days with Long Calls

Task type: RegressionTask Industry: Telecom

Long phone calls are a strong engagement signal in telecom — they indicate active voice usage and high plan utilisation. Predicting how many days a subscriber will make long calls helps retention teams identify heavy voice users for premium plan offers, and flags subscribers whose long-call frequency is declining as potential churn risks.

What makes this advanced? Duration filtering + unique day counting — filters events by duration threshold, extracts unique calendar days.

Prerequisites

Before writing a target function you need:

A trained foundation model built on event data that includes the relevant data sources.
The monad library installed in your environment.
Data source(s): call_logs with a duration_minutes column

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument	Type	Description
`history`	`Events`	All events before the temporal split.
`future`	`Events`	All events after the temporal split.
`attributes`	`Attributes`	Static entity attributes.
`ctx`	`Dict`	Context dictionary containing `SPLIT_TIMESTAMP`, data mode, etc.

For regression tasks, the function must return one of:

np.array([value], dtype=np.float32) — the predicted continuous value (number of days with long calls).
None — exclude this entity (e.g., incomplete data).

Full Example

Python

import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window

import pandas as pd

# === Configuration ===
TARGET_WINDOW_DAYS = 30
CALL_LOGS_SOURCE = "call_logs"
MIN_CALL_DURATION = 20

def days_with_long_calls_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Count days with calls exceeding 20 minutes."""

    if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
        return None

    calls = future[CALL_LOGS_SOURCE].interval_from(
        ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
    )
    long_calls = calls.filter(
        by="duration_minutes", condition=lambda v: v > MIN_CALL_DURATION
    )

    if len(long_calls) == 0:
        return np.array([0], dtype=np.float32)

    days = pd.to_datetime(long_calls.timestamps, unit="s").normalize()
    unique_days = days.unique()

    return np.array([len(unique_days)], dtype=np.float32)

Step-by-Step Breakdown

① Validate the training window

Python

if has_incomplete_training_window(ctx, timedelta(days=TARGET_WINDOW_DAYS)):
    return None

Ensures 30 days of future data are available for consistent labeling.

② Filter to long calls in the target window

Python

calls = future[CALL_LOGS_SOURCE].interval_from(
    ctx[SPLIT_TIMESTAMP], timedelta(days=TARGET_WINDOW_DAYS)
)
long_calls = calls.filter(
    by="duration_minutes", condition=lambda v: v > MIN_CALL_DURATION
)

First, call events are trimmed to the 30-day window. Then, the .filter() method retains only calls exceeding 20 minutes. This two-step approach keeps the logic clear and composable.

③ Handle zero long calls

Python

if len(long_calls) == 0:
    return np.array([0], dtype=np.float32)

Subscribers with no long calls in the window receive a target of 0 rather than being excluded. This is intentional — zero-day counts are valid and informative regression targets.

④ Count unique calendar days

Python

days = pd.to_datetime(long_calls.timestamps, unit="s").normalize()
unique_days = days.unique()
return np.array([len(unique_days)], dtype=np.float32)

Timestamps are converted to pandas datetime and normalised to midnight (removing time components). .unique() then counts distinct calendar days. A subscriber who made 5 long calls on the same day contributes only 1 to the count — the target measures how many days had long calls, not how many long calls total.

Training

Once the target function is defined, fine-tune a downstream model:

Python

from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, RegressionTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=RegressionTask(num_targets=1),
    target_fn=days_with_long_calls_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
    metric_to_monitor="val_mae_0",
    metric_monitoring_mode=MetricMonitoringMode.MIN,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python

from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="mae", metric_name="MeanAbsoluteError"),
        MetricParams(alias="mse", metric_name="MeanSquaredError"),
        MetricParams(alias="r2", metric_name="R2Score"),
    ],
)

results = module.test(testing_params)

Prediction

Python

from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Recommended Metrics

Metric	Why it matters
MAE	Average absolute error — intuitive and robust to outliers.
RMSE	Penalises large errors more heavily than MAE.
R²	Proportion of variance explained by the model.
MAPE	Percentage-based error — useful for comparing across scales.

Production Tips

Tune the duration threshold for your market. 20 minutes is a reasonable default, but call behavior varies by region and demographic. Analyze your call duration distribution to find a meaningful threshold.
Distinguish between call types. If your data includes call categories (personal, business, conference), consider filtering to specific types for more targeted predictions.
Use predictions for plan recommendations. Subscribers predicted to have many long-call days may benefit from unlimited voice plans, while those predicted to have few may prefer data-heavy plans.
Normalise by plan type. Unlimited-plan subscribers naturally have more long calls. Consider training separate models or adding plan type as a feature.