SLA Alerts
SLA Alert Types
StackFlow generates three types of SLA alerts: warning alerts (when SLA elapsed time exceeds the configured threshold, default 75%), breach alerts (when SLA time expires without resolution), and recovery alerts (when a previously breached SLA is resolved for trend analysis). Each alert type can be routed independently to different recipients and channels.
- SNS Topic:
stackflow-sla-alertswith at least one email/SMS subscription active - EventBridge Rule:
stackflow-sla-checkon 5-minute schedule targetingStackFlowSLACheckerLambda - DynamoDB:
StackFlow_SLAInstanceGSI onstatusfor efficient breach detection queries - Lambda:
StackFlowSLACheckerwithsns:Publishonstackflow-sla-alertsARN
SNS Configuration
SLA alerts are published to two SNS topics: stackflow-sla-alerts (for warning-level alerts) and stackflow-breach-notifications (for breach-level alerts). Subscribe your preferred alerting endpoints to these topics in the AWS SNS console or via the StackFlow admin console.
aws sns subscribe --topic-arn arn:aws:sns:us-east-1:373544523367:stackflow-breach-notifications --protocol email --notification-endpoint oncall-itsm@your-org.com --region us-east-1
aws sns subscribe --topic-arn arn:aws:sns:us-east-1:373544523367:stackflow-breach-notifications --protocol https --notification-endpoint https://events.pagerduty.com/integration/{key}/enqueue --region us-east-1
Alert Suppression
Alert suppression prevents alert storms during known maintenance windows or mass incidents. Configure suppression rules in Admin → Notifications → SLA Alert Suppression. Suppression rules specify the time window, affected CI(s) or categories, and a justification. Suppressed alerts are logged but not delivered to external channels.
Escalation Chains
SLA breach escalation chains define who gets notified at each stage after an SLA breach. Stage 1 (immediate breach): assignee and group manager. Stage 2 (breach + 30 min): ITSM manager. Stage 3 (breach + 2 hours): IT leadership. Each stage sends notifications via all configured channels and adds a priority escalation work note to the record.
Dashboard
The SLA Alert Dashboard at Dashboards → SRE Metrics → SLA Alerts provides a real-time view of active SLA warnings and breaches. The dashboard includes a heat map by assignment group showing which groups have the most SLA risk, and a trend chart showing SLA compliance rate over the past 30 days. This data feeds the Executive Dashboard's IT performance KPIs.