Sensor Offline Detection
Task type: BinaryClassificationTask
Industry: IoT / Manufacturing
Unplanned sensor downtime in manufacturing environments leads to blind spots in process monitoring, delayed fault detection, and potential safety incidents. By predicting which sensors are likely to experience extended offline periods, maintenance teams can schedule proactive replacements or firmware updates before critical monitoring gaps occur.
What makes this advanced? Sorted event gap detection — the target function sorts sensor signal events by timestamp, then iterates through consecutive pairs checking both the status value and the time gap between events to identify sustained offline periods.
Prerequisites
Before writing a target function you need:
- A trained foundation model built on event data that includes the relevant data sources.
- The monad library installed in your environment.
- Data source(s):
sensor_signal
Target Function
The target function tells monad how to label each entity for training. It receives four arguments:
| Argument | Type | Description |
|---|---|---|
history |
Events |
All events before the temporal split. |
future |
Events |
All events after the temporal split. |
attributes |
Attributes |
Static entity attributes. |
ctx |
Dict |
Context dictionary containing SPLIT_TIMESTAMP, data mode, etc. |
The function must return one of:
np.array([1], dtype=np.float32)— positive casenp.array([0], dtype=np.float32)— negative caseNone— exclude this entity from training
Full Example
import numpy as np
from datetime import timedelta
from typing import Dict
from monad.ui.target_function import Events, Attributes
from monad.ui.target_function import SPLIT_TIMESTAMP
from monad.ui.target_function import has_incomplete_training_window
# === Configuration ===
MAX_OFFLINE_HOURS = 12
TARGET_WINDOW_DAYS = 30
SENSOR_DATA_SOURCE = "sensor_signal"
STATUS_COLUMN = "status"
def sensor_offline_target_fn(
history: Events,
future: Events,
attributes: Attributes,
ctx: Dict,
) -> np.ndarray | None:
"""Predict if sensor stays offline for 12+ continuous hours."""
max_offline = timedelta(hours=MAX_OFFLINE_HOURS).total_seconds()
split_ts = ctx[SPLIT_TIMESTAMP]
if has_incomplete_training_window(ctx, required_length=timedelta(days=TARGET_WINDOW_DAYS)):
return None
# 1. Trim future signals to the target window
future = future.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS - 1, hours=12))
signals = future[SENSOR_DATA_SOURCE]
# 2. No signals at all = assume offline
if signals.count() == 0:
return np.array([1], dtype=np.float32)
# 3. Sort events by timestamp with their status
timestamps = signals.timestamps
status = signals[STATUS_COLUMN]
events = sorted(zip(timestamps, status))
# 4. Check for offline gaps exceeding threshold
for i in range(len(events) - 1):
ts1, st1 = events[i]
ts2, _ = events[i + 1]
if st1 == "offline" and (ts2 - ts1) >= max_offline:
return np.array([1], dtype=np.float32)
return np.array([0], dtype=np.float32)
Step-by-Step Breakdown
① Trim future signals to the target window
future = future.interval_from(split_ts, timedelta(days=TARGET_WINDOW_DAYS - 1, hours=12))
signals = future[SENSOR_DATA_SOURCE]
The future events are trimmed to a 30-day window starting from the split timestamp. This ensures the model only considers near-term sensor behavior for its prediction.
② Handle no-signal case
If no signals are received at all during the target window, the sensor is assumed to be offline for the entire period — a clear positive case.
③ Sort events by timestamp with their status
timestamps = signals.timestamps
status = signals[STATUS_COLUMN]
events = sorted(zip(timestamps, status))
Events are paired with their status values and sorted chronologically. This is necessary because events may not arrive in strict timestamp order across distributed IoT systems.
④ Iterate and detect offline gaps
for i in range(len(events) - 1):
ts1, st1 = events[i]
ts2, _ = events[i + 1]
if st1 == "offline" and (ts2 - ts1) >= max_offline:
return np.array([1], dtype=np.float32)
The function walks through consecutive event pairs. When an event has status "offline" and the next event arrives 12+ hours later, the sensor experienced a sustained offline period. Only the first event's status matters — the gap represents the duration the sensor stayed in that state.
Training
Once the target function is defined, fine-tune a downstream model:
from pathlib import Path
from monad.ui.config import TrainingParams, MetricParams, MetricMonitoringMode
from monad.config.early_stopping import EarlyStopping
from monad.ui.module import load_from_foundation_model, BinaryClassificationTask
module = load_from_foundation_model(
checkpoint_path=Path("./foundation_model"),
downstream_task=BinaryClassificationTask(),
target_fn=sensor_offline_target_fn,
)
training_params = TrainingParams(
checkpoint_dir=Path("./<this_model>"),
learning_rate=1e-4,
epochs=20,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC", kwargs={"task": "binary"}),
MetricParams(alias="auprc", metric_name="AveragePrecision", kwargs={"task": "binary"}),
MetricParams(alias="recall", metric_name="Recall", kwargs={"task": "binary"}),
MetricParams(alias="precision", metric_name="Precision", kwargs={"task": "binary"}),
],
metric_to_monitor="val_auroc_0",
metric_monitoring_mode=MetricMonitoringMode.MAX,
early_stopping=EarlyStopping(min_delta=1e-4, patience=5),
)
module.fit(training_params, seed=42)
Evaluation
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, MetricParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
prediction_date=datetime(2024, 5, 1, tzinfo=timezone.utc),
output_type=OutputType.DECODED,
devices=[0],
metrics=[
MetricParams(alias="auroc", metric_name="AUROC"),
MetricParams(alias="auprc", metric_name="AveragePrecision"),
MetricParams(alias="recall", metric_name="Recall"),
],
)
results = module.test(testing_params)
Prediction
from pathlib import Path
from datetime import datetime, timezone
from monad.ui.module import load_from_checkpoint
from monad.ui.config import TestingParams, OutputType
module = load_from_checkpoint(Path("./<this_model>"))
testing_params = TestingParams(
local_save_location=Path("./predictions.tsv"),
output_type=OutputType.DECODED,
prediction_date=datetime(2024, 6, 1, tzinfo=timezone.utc),
devices=[0],
)
predictions = module.predict(testing_params)
Recommended Metrics
| Metric | Why it matters |
|---|---|
| AUROC | Measures overall ranking quality. |
| AUPRC | More informative when the positive class is rare. |
| Recall | Proportion of actual positives caught. |
| Precision | Proportion of predicted positives that are correct. |
| F1 Score | Harmonic mean of precision and recall. |
Production Tips
- Calibrate the offline threshold to your SLA. 12 hours is a reasonable default, but critical sensors in safety-relevant processes may need a much shorter threshold (e.g., 1-2 hours).
- Account for expected maintenance windows. Scheduled downtime should not be labeled as unexpected offline events. Filter out known maintenance periods before labeling, or add a maintenance calendar as an attribute.
- Handle clock drift in edge devices. IoT sensors often have imprecise clocks. Ensure timestamps are NTP-synchronised or add a tolerance margin to the gap threshold.
- Monitor class balance across sensor types. Battery-powered sensors naturally go offline more often than wired ones. Consider training separate models or adding sensor type as a feature.
- Validate with incident logs. Cross-reference predicted offline events against actual maintenance tickets to ensure the model is capturing genuine failures, not just signal noise.