v2026.1 Open Portal ↗
On this page

SLA Alerts

SLA Alert Types

StackFlow generates three types of SLA alerts: warning alerts (when SLA elapsed time exceeds the configured threshold, default 75%), breach alerts (when SLA time expires without resolution), and recovery alerts (when a previously breached SLA is resolved for trend analysis). Each alert type can be routed independently to different recipients and channels.

⚙️ Minimum Requirements
  • SNS Topic: stackflow-sla-alerts with at least one email/SMS subscription active
  • EventBridge Rule: stackflow-sla-check on 5-minute schedule targeting StackFlowSLAChecker Lambda
  • DynamoDB: StackFlow_SLAInstance GSI on status for efficient breach detection queries
  • Lambda: StackFlowSLAChecker with sns:Publish on stackflow-sla-alerts ARN

SNS Configuration

SLA alerts are published to two SNS topics: stackflow-sla-alerts (for warning-level alerts) and stackflow-breach-notifications (for breach-level alerts). Subscribe your preferred alerting endpoints to these topics in the AWS SNS console or via the StackFlow admin console.

aws sns subscribe   --topic-arn arn:aws:sns:us-east-1:373544523367:stackflow-breach-notifications   --protocol email   --notification-endpoint oncall-itsm@your-org.com   --region us-east-1

aws sns subscribe   --topic-arn arn:aws:sns:us-east-1:373544523367:stackflow-breach-notifications   --protocol https   --notification-endpoint https://events.pagerduty.com/integration/{key}/enqueue   --region us-east-1

Alert Suppression

Alert suppression prevents alert storms during known maintenance windows or mass incidents. Configure suppression rules in Admin → Notifications → SLA Alert Suppression. Suppression rules specify the time window, affected CI(s) or categories, and a justification. Suppressed alerts are logged but not delivered to external channels.

Use Suppression Carefully: Alert suppression silences real SLA breaches. Always document the business reason and set an end time for suppression windows. Permanent suppression rules should be reviewed quarterly.

Escalation Chains

SLA breach escalation chains define who gets notified at each stage after an SLA breach. Stage 1 (immediate breach): assignee and group manager. Stage 2 (breach + 30 min): ITSM manager. Stage 3 (breach + 2 hours): IT leadership. Each stage sends notifications via all configured channels and adds a priority escalation work note to the record.

Dashboard

The SLA Alert Dashboard at Dashboards → SRE Metrics → SLA Alerts provides a real-time view of active SLA warnings and breaches. The dashboard includes a heat map by assignment group showing which groups have the most SLA risk, and a trend chart showing SLA compliance rate over the past 30 days. This data feeds the Executive Dashboard's IT performance KPIs.