Skip to content

Sensor Offline Detection

Task type: BinaryClassificationTask Industry: IoT / Manufacturing

Unplanned sensor downtime in manufacturing environments leads to blind spots in process monitoring, delayed fault detection, and potential safety incidents. By predicting which sensors are likely to experience extended offline periods, maintenance teams can schedule proactive replacements or firmware updates before critical monitoring gaps occur.

What makes this advanced? Sorted event gap detection — the target function sorts sensor signal events by timestamp, then iterates through consecutive pairs checking both the status value and the time gap between events to identify sustained offline periods.


Prerequisites

Before writing a target function you need:

  • A trained foundation model built on event data that includes the relevant data sources.
  • The monad library installed in your environment.
  • Data source(s): sensor_signal

Target Function

The target function tells monad how to label each entity for training. It receives four arguments:

Argument Type Description
history Events All events before the temporal split.
future Events All events after the temporal split.
attributes Attributes Static entity attributes.
ctx Dict Context dictionary containing SPLIT_TIMESTAMP, data mode, etc.

The function must return one of:

  • np.array([1], dtype=np.float32)positive case
  • np.array([0], dtype=np.float32)negative case
  • Noneexclude this entity from training

Full Example

Python
import numpy as np
from datetime import timedelta
from typing import Dict

from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window


# === Configuration ===
MAX_OFFLINE_HOURS = 12
TARGET_WINDOW_DAYS = 30
SENSOR_DATA_SOURCE = "sensor_signal"
STATUS_COLUMN = "status"

def sensor_offline_target_fn(
    history: Events,
    future: Events,
    attributes: Attributes,
    ctx: Dict,
) -> np.ndarray | None:
    """Predict if sensor stays offline for 12+ continuous hours."""

    max_offline = timedelta(hours=MAX_OFFLINE_HOURS).total_seconds()
    split_ts = ctx[SPLIT_TIMESTAMP]

    if has_incomplete_training_window(ctx, required_length=timedelta(days=TARGET_WINDOW_DAYS)):
        return None

    # 1. Trim future signals to the target window
    future = future.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS - 1, hours=12))
    signals = future[SENSOR_DATA_SOURCE]

    # 2. No signals at all = assume offline
    if signals.count() == 0:
        return np.array([1], dtype=np.float32)

    # 3. Sort events by timestamp with their status
    timestamps = signals.timestamps
    status = signals[STATUS_COLUMN]
    events = sorted(zip(timestamps, status))

    # 4. Check for offline gaps exceeding threshold
    for i in range(len(events) - 1):
        ts1, st1 = events[i]
        ts2, _ = events[i + 1]
        if st1 == "offline" and (ts2 - ts1) >= max_offline:
            return np.array([1], dtype=np.float32)

    return np.array([0], dtype=np.float32)

Step-by-Step Breakdown

① Trim future signals to the target window

Python
future = future.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS - 1, hours=12))
signals = future[SENSOR_DATA_SOURCE]

The future events are trimmed to a 30-day window starting from the split timestamp. This ensures the model only considers near-term sensor behavior for its prediction.

② Handle no-signal case

Python
if signals.count() == 0:
    return np.array([1], dtype=np.float32)

If no signals are received at all during the target window, the sensor is assumed to be offline for the entire period — a clear positive case.

③ Sort events by timestamp with their status

Python
timestamps = signals.timestamps
status = signals[STATUS_COLUMN]
events = sorted(zip(timestamps, status))

Events are paired with their status values and sorted chronologically. This is necessary because events may not arrive in strict timestamp order across distributed IoT systems.

④ Iterate and detect offline gaps

Python
for i in range(len(events) - 1):
    ts1, st1 = events[i]
    ts2, _ = events[i + 1]
    if st1 == "offline" and (ts2 - ts1) >= max_offline:
        return np.array([1], dtype=np.float32)

The function walks through consecutive event pairs. When an event has status "offline" and the next event arrives 12+ hours later, the sensor experienced a sustained offline period. Only the first event's status matters — the gap represents the duration the sensor stayed in that state.


Training

Once the target function is defined, fine-tune a downstream model:

Python
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping

from monad.ui.module import load_from_foundation_model, BinaryClassificationTask

module = load_from_foundation_model(
    checkpoint_path=Path("./foundation_model"),
    downstream_task=BinaryClassificationTask(),
    target_fn=sensor_offline_target_fn,
)

training_params = TrainingParams(
    checkpoint_dir=Path("./<this_model>"),
    learning_rate=1e-4,
    epochs=20,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
        MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
        MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
        MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
    ],
    metric_to_monitor="val_auroc_0",
    metric_monitoring_mode=MetricMonitoringMode.MAX,
    early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)

module.fit(training_params, seed=42)

Evaluation

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
    output_type=OutputType.DECODED,
    devices=[0],
    metrics=[
        MetricParams(alias="auroc", metric_name="AUROC"),
        MetricParams(alias="auprc", metric_name="AveragePrecision"),
        MetricParams(alias="recall", metric_name="Recall"),
    ],
)

results = module.test(testing_params)

Prediction

Python
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType

module = load_from_checkpoint(Path("./<this_model>"))

testing_params = TestingParams(
    local_save_location=Path("./predictions.tsv"),
    output_type=OutputType.DECODED,
    prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
    devices=[0],
)

predictions = module.predict(testing_params)

Metric Why it matters
AUROC Measures overall ranking quality.
AUPRC More informative when the positive class is rare.
Recall Proportion of actual positives caught.
Precision Proportion of predicted positives that are correct.
F1 Score Harmonic mean of precision and recall.

Production Tips

  1. Calibrate the offline threshold to your SLA. 12 hours is a reasonable default, but critical sensors in safety-relevant processes may need a much shorter threshold (e.g., 1-2 hours).
  2. Account for expected maintenance windows. Scheduled downtime should not be labeled as unexpected offline events. Filter out known maintenance periods before labeling, or add a maintenance calendar as an attribute.
  3. Handle clock drift in edge devices. IoT sensors often have imprecise clocks. Ensure timestamps are NTP-synchronised or add a tolerance margin to the gap threshold.
  4. Monitor class balance across sensor types. Battery-powered sensors naturally go offline more often than wired ones. Consider training separate models or adding sensor type as a feature.
  5. Validate with incident logs. Cross-reference predicted offline events against actual maintenance tickets to ensure the model is capturing genuine failures, not just signal noise.