Escalation Policies
Policy Structure
Escalation policies define how incidents are escalated when they're not acknowledged or resolved within specified timeframes. Each policy has multiple layers, each with a target (user, group, or on-call schedule), notification method, and timeout. If the first layer doesn't respond, the escalation advances to the next layer automatically.
- DynamoDB:
StackFlow_EscalationPolicyandStackFlow_OnCallScheduletables provisioned - DynamoDB: At least one active on-call schedule with rotation members in
StackFlow_OnCallSchedule - SNS:
stackflow-escalation-alertstopic with SMS/email subscriptions for on-call engineers - EventBridge: Escalation timeout check rule on 10-minute schedule
On-Call Schedules
On-call schedules are configured in Workflows → Escalation → On-Call Schedules. Schedules support weekly rotations, override slots, and multiple simultaneous on-call assignees (primary + secondary). Schedule data is consumed by the escalation engine to determine who is currently on-call when an escalation fires.
{
"on_call_schedule": {
"name": "Platform Engineering On-Call",
"timezone": "America/New_York",
"rotation": {
"type": "weekly",
"start_day": "Monday",
"start_time": "09:00",
"participants": [
{"user_id": "usr_alice_johnson", "weeks": [1, 3, 5]},
{"user_id": "usr_bob_chen", "weeks": [2, 4]},
{"user_id": "usr_carol_davis", "weeks": [6, 7, 8]}
]
}
}
}
Escalation Triggers
Escalations can be triggered by: unacknowledged P1 incident (default: 15 minutes), SLA breach, manual escalation by agent, or automatic escalation by a workflow node. Triggers are evaluated every 60 seconds by the escalation timer job running within the StackFlowAPI Lambda.
Notification Methods
| Method | Delivery | Acknowledgment |
|---|---|---|
| SES | Reply with "ACK" or click portal link | |
| SMS | SNS (mobile push) | Reply with "ACK" |
| Webhook | HTTP POST to PagerDuty/OpsGenie | Via PagerDuty/OpsGenie UI |
| Portal | In-app notification | Click Acknowledge in portal |
Testing Escalation
Test escalation policies without creating real incidents using the policy test tool at Workflows → Escalation → Test Policy. Provide a test policy ID and a test incident (or use a dummy incident) and the test runner simulates the escalation timeline, showing which user would be notified at each stage without actually sending notifications.