An Alert Nobody Reads Is Worse Than No Alert At All
We audited a plant's Monitor configuration last year. They had 847 active alert rules. Eight hundred and forty-seven.
On a typical Monday morning, operators received 200+ alerts before lunch. Know how many they investigated? Three. Maybe four.
The rest? Dismissed. In bulk. Without reading them.
"I get so many alerts that I stopped looking. If something is really wrong, someone will call me."
That is alert fatigue. And it is the single most common failure mode of IoT monitoring deployments. Not bad sensors. Not wrong data. Too many alerts that cry wolf until nobody listens.
This post is about building an alerting system people trust. One that fires when it matters, stays quiet when it does not, and -- when it does fire -- drives immediate action all the way to a work order.
Who this is for: Maximo administrators configuring alert rules, maintenance managers designing escalation procedures, reliability engineers tuning thresholds, and operations teams drowning in notifications they have learned to ignore.
The Alert Processing Pipeline
Every alert in Monitor follows this path:
Data Stream ──► Rule Evaluation ──► Alert Generation ──► Notification ──► Action
│ │ │ │
Thresholds Severity Email/SMS Work Order
Patterns Grouping Webhook Automation
Anomalies Correlation Dashboard EscalationAlert Components
Every alert has six parts. Miss one and your alerting system has a gap.
Component — What It Is — Why It Matters
Trigger — The condition that fires the alert — Wrong trigger = false positives
Severity — Critical / High / Medium / Low — Drives response urgency
Message — What the operator reads — Must be actionable in 5 seconds
Context — Related data and metadata — Enables fast diagnosis
Action — What happens next — Without action, alerts are noise
Lifecycle — New -> Acknowledged -> Resolved — Tracks accountability
Alert Rule Types
Basic Threshold Alerts
The simplest and most common. A metric exceeds a value.
{
"name": "High Bearing Temperature",
"deviceType": "Motor",
"condition": {
"type": "threshold",
"metric": "bearing_temp",
"operator": "greaterThan",
"value": 85
},
"severity": "high",
"message": "Bearing temp ${value} C exceeds 85 C limit on ${deviceId}"
}Duration-Based Alerts
Only fire if the condition persists. This single feature eliminates more false positives than any other.
{
"name": "Sustained High Temperature",
"deviceType": "Motor",
"condition": {
"type": "duration",
"metric": "bearing_temp",
"operator": "greaterThan",
"value": 75,
"duration": "10m"
},
"severity": "high",
"message": "Bearing temp has exceeded 75 C for 10+ minutes on ${deviceId}"
}A temperature spike to 80 C for 30 seconds during startup? Not an alert. Sustained 80 C for 10 minutes under normal load? Absolutely an alert.
Rate of Change Alerts
Detect rapid changes that indicate something went wrong, even if the absolute value is not alarming yet.
{
"name": "Rapid Temperature Rise",
"deviceType": "Motor",
"condition": {
"type": "rateOfChange",
"metric": "bearing_temp",
"rate": 5,
"timeWindow": "5m",
"direction": "increasing"
},
"severity": "critical",
"message": "Temperature rising ${rate} C/min on ${deviceId} - potential runaway"
}Compound Alerts
Combine multiple conditions. This is where you catch specific failure signatures.
{
"name": "Bearing Failure Pattern",
"deviceType": "Motor",
"condition": {
"type": "compound",
"operator": "AND",
"conditions": [
{"metric": "vibration_rms", "operator": "greaterThan", "value": 4.5},
{"metric": "bearing_temp", "operator": "greaterThan", "value": 70},
{"metric": "current_draw", "operator": "greaterThan", "value": 15}
]
},
"severity": "critical",
"message": "Bearing failure pattern: vibration ${vibration_rms} mm/s, temp ${bearing_temp} C, current ${current_draw} A on ${deviceId}"
}High vibration alone? Could be a loose mounting bolt. High temperature alone? Could be ambient heat. High current alone? Could be a heavy load. All three together? That is a bearing on its way out.
Anomaly-Based Alerts
Trigger when the anomaly detection engine from Part 5 flags something:
{
"name": "Anomalous Behavior Detected",
"deviceType": "Motor",
"condition": {
"type": "threshold",
"metric": "anomaly_score",
"operator": "greaterThan",
"value": 0.8
},
"severity": "high",
"message": "Anomaly detected (score: ${value}) on ${deviceId} - investigate"
}Severity Classification
Not all alerts are created equal. Your severity framework determines who gets woken up at 2 AM.
The Severity Matrix
Level — Color — Response Time — Example
Critical — Red — Immediate — Safety hazard, imminent failure, equipment damage
High — Orange — Under 1 hour — Performance degradation, approaching limits
Medium — Yellow — Under 4 hours — Maintenance needed, efficiency loss
Low — Blue — Under 24 hours — Informational, scheduled attention
Prioritization: Severity x Asset Criticality
A high-severity alert on a backup pump is less urgent than a medium-severity alert on the only pump feeding the production line.
Asset Criticality
Low Medium High
┌──────────┬──────────┬──────────┐
Critical │ P2 │ P1 │ P1 │
├──────────┼──────────┼──────────┤
High │ P3 │ P2 │ P1 │
Severity ├──────────┼──────────┼──────────┤
Medium │ P4 │ P3 │ P2 │
├──────────┼──────────┼──────────┤
Low │ P5 │ P4 │ P3 │
└──────────┴──────────┴──────────┘
P1 = Respond within 15 minutes
P2 = Respond within 1 hour
P3 = Respond within 4 hours
P4 = Respond within 24 hours
P5 = Next scheduled reviewNotification Channels
Good for: medium and low severity, documentation, shift handoff.
{
"channel": "email",
"recipients": ["maintenance-team@company.com"],
"subject": "[${severity}] ${alertName} - ${deviceId}",
"body": "Device: ${deviceId}\nLocation: ${location}\nValue: ${value}\nTime: ${timestamp}\n\nPlease investigate."
}SMS
Good for: critical and high severity, on-call notifications.
{
"channel": "sms",
"recipients": ["+1234567890"],
"message": "CRITICAL: ${alertName} on ${deviceId}. Value: ${value}. Respond now.",
"severityFilter": ["critical"]
}Microsoft Teams
{
"channel": "webhook",
"url": "https://outlook.office.com/webhook/...",
"body": {
"@type": "MessageCard",
"themeColor": "${severityColor}",
"summary": "${alertName}",
"sections": [{
"activityTitle": "${alertName}",
"facts": [
{"name": "Device", "value": "${deviceId}"},
{"name": "Severity", "value": "${severity}"},
{"name": "Value", "value": "${value}"}
]
}],
"potentialAction": [{
"@type": "OpenUri",
"name": "View in Monitor",
"targets": [{"os": "default", "uri": "${dashboardUrl}"}]
}]
}
}PagerDuty
For organizations with on-call rotation:
{
"channel": "webhook",
"url": "https://events.pagerduty.com/v2/enqueue",
"body": {
"routing_key": "${PAGERDUTY_KEY}",
"event_action": "trigger",
"dedup_key": "${alertId}",
"payload": {
"summary": "${alertName} on ${deviceId}",
"severity": "${pagerduty_severity}",
"source": "Maximo Monitor"
}
}
}Escalation Procedures
Alerts without escalation are alerts without accountability.
Time-Based Escalation
{
"escalationPolicy": {
"levels": [
{
"delayMinutes": 0,
"channel": "email",
"recipients": ["operator-on-duty@company.com"]
},
{
"delayMinutes": 15,
"channel": "sms",
"recipients": ["+1234567890"],
"condition": "not_acknowledged"
},
{
"delayMinutes": 30,
"channel": "email",
"recipients": ["maintenance-supervisor@company.com"],
"condition": "not_acknowledged"
},
{
"delayMinutes": 60,
"channel": "sms",
"recipients": ["+0987654321"],
"condition": "not_resolved"
}
]
}
}The rule: If nobody acknowledges a critical alert within 15 minutes, it escalates. If nobody resolves it within 60 minutes, it escalates again. No exceptions.
Automated Actions
Work Order Creation in Maximo Manage
This is the closed loop. Sensor detects anomaly. Analytics confirm pattern. Alert fires. Work order lands in Manage. Technician gets dispatched.
No human had to copy data between systems. No one had to remember to create a work order. No failure fell through the cracks because someone was busy.
{
"action": "createWorkOrder",
"trigger": "alert_created",
"severityFilter": ["critical", "high"],
"workOrderTemplate": {
"description": "Alert: ${alertName}",
"longDescription": "Automated from Monitor.\n\nDevice: ${deviceId}\nMetric: ${metric}\nValue: ${value}\nThreshold: ${threshold}\nTime: ${timestamp}",
"workType": "CM",
"priority": "${mapSeverityToPriority}",
"assetNum": "${assetId}",
"location": "${location}",
"reportedBy": "MONITOR_SYSTEM",
"classificationId": "IOT_ALERT"
}
}Equipment Shutdown Commands
For safety-critical conditions, Monitor can send commands back to devices:
{
"action": "sendCommand",
"condition": {
"alertName": "Critical Overtemperature",
"severity": "critical"
},
"command": {
"deviceId": "${deviceId}",
"commandType": "emergency_stop",
"parameters": {
"reason": "Automated safety shutdown - temperature critical",
"operator": "MONITOR_SYSTEM"
}
}
}Alert Lifecycle Management
The Lifecycle
┌─────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐
│ New │───►│ Acknowledged │───►│ In Work │───►│ Resolved │
└─────────┘ └──────────────┘ └──────────┘ └──────────┘
│ │
└───────────────────────────────────────────────────┘
(Auto-resolve when condition clears)Every alert state transition should be tracked. Who acknowledged it? When? What was their assessment? Who resolved it? What was the root cause?
This data is gold for continuous improvement. Without it, you cannot measure MTTA (Mean Time to Acknowledge) or MTTR (Mean Time to Resolve).
Alert Suppression
During planned maintenance, suppress alerts for affected assets:
{
"suppression": {
"name": "Quarterly PM - Line 3",
"devices": ["MOTOR-031", "MOTOR-032", "PUMP-015"],
"startTime": "2026-03-15T22:00:00Z",
"endTime": "2026-03-16T06:00:00Z",
"reason": "Scheduled quarterly maintenance"
}
}Cascading Alert Correlation
When the power goes out, every device in the building goes offline. You do not need 500 "Device Offline" alerts. You need one "Power Loss" alert with the others suppressed.
{
"correlation": {
"rootAlert": "Power Loss Detected",
"suppressAlerts": ["Device Offline", "Communication Error", "Data Gap"],
"timeWindow": "5m",
"message": "Related alerts suppressed - upstream power loss"
}
}Eliminating Alert Fatigue
The Measurement
Track these metrics monthly:
- MTTA -- Mean Time to Acknowledge. Are operators engaging with alerts?
- MTTR -- Mean Time to Resolve. Are issues getting fixed?
- False positive rate -- What percentage of alerts were dismissed without action?
- Escalation rate -- What percentage of alerts had to escalate?
- Alert volume -- How many alerts per shift?
If your false positive rate is above 20%, your thresholds are wrong. If your MTTA is increasing month over month, operators are tuning out.
The Fixes
- Require duration. If a condition must persist for 10 minutes to be real, enforce that in the rule.
- Implement hysteresis. If the threshold is 85 C, do not clear the alert until the reading drops to 80 C. Otherwise you get alert-clear-alert-clear oscillation.
- Review monthly. Pull the alert analytics. Find the rules with the highest false positive rates. Tune or retire them.
- Differentiate by time of day. A temperature that is alarming during idle time might be normal during startup. Context-aware thresholds reduce noise.
- Correlate aggressively. One root cause alert with 10 suppressed downstream alerts is better than 11 independent alerts.
The 6 Commandments of Effective Alerting
- Every alert must have a clear response action. If the operator cannot do anything about it, it should not be an alert.
- Duration-based filtering is mandatory. Never alert on a single data point unless it is a safety-critical extreme.
- Severity must map to response time. If everything is critical, nothing is critical.
- Automate the handoff. Work order creation should not require copy-paste between Monitor and Manage.
- Measure your alerting system. MTTA, MTTR, and false positive rate are non-negotiable KPIs.
- Tune quarterly. Equipment ages, processes change, seasons shift. Your thresholds must keep pace.
What Comes Next
Your alerting system fires intelligently, routes to the right people, and creates work orders automatically. But Monitor does not exist in isolation.
In Part 7: Integration and APIs, we connect Monitor to the rest of your enterprise:
- REST API reference for device management and data queries
- Python SDK for programmatic access
- Webhook patterns for bidirectional integration
- Maximo Manage, data lake, and ERP integration patterns
- Real-time data streaming to external systems
Series Navigation
Part — Title
1 — Introduction to IBM Maximo Monitor
2 — Getting Started with Maximo Monitor
3 — Data Ingestion and Device Management
4 — Dashboards and Visualization
5 — Analytics and AI Integration
6 — Alerts and Automation (You are here)
8 — Best Practices and Case Studies
Built by practitioners. For practitioners. No fluff.
TheMaximoGuys -- Maximo expertise, delivered different.



