Major Incident Management
Major Incident Definition
A Major Incident (MI) is a high-impact, urgent P1 incident that affects a significant portion of the user base or a critical business service. StackFlow automatically escalates incidents to Major Incident status based on configurable criteria: P1 priority + no acknowledgment within 15 minutes, or P1 + more than 50 affected users detected via the CMDB dependency analysis.
- DynamoDB:
StackFlow_MajorIncidenttable withwarRoomIdattribute and GSI onstatus - SNS Topic:
stackflow-major-incident-alertswith subscriptions for all stakeholder groups - SES: Major incident communications require
major-incidents@stackflow-tech.comverified in SES - Role: Declaring a major incident requires
itsm_managerorsuper_adminJWT claim - Lambda:
StackFlowMajorIncidentNotifierdeployed with SESSendEmailpermission
War Room
When a Major Incident is declared, StackFlow creates a War Room — a dedicated collaboration space within the portal that consolidates all communication, technical updates, and action items in one place. The War Room includes a timeline of all events, a shared scratchpad for technical notes, and integration with your organization's chat platform (Slack/Teams).
curl -X POST https://your-instance.stackflow-tech.com/prod/api/major-incidents -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
"incident_id": "INC0001234",
"major_incident_manager": "usr_alice_johnson",
"bridge_link": "https://meet.google.com/abc-defg-hij",
"initial_impact_statement": "Production API gateway returning 503 for all users"
}'
Communication Bridge
The Communication Bridge section of the War Room tracks all conference bridge details and attendees. StackFlow can automatically post War Room updates to a designated Slack channel or Microsoft Teams channel, ensuring stakeholders receive real-time updates without needing to monitor the portal.
Stakeholder Updates
Pre-configured stakeholder communication templates ensure consistent messaging during a Major Incident. Templates are available for: Initial notification, 30-minute update, resolution notification, and post-incident summary. The AI Copilot can draft these communications based on the current incident state and timeline.
| Update Type | Trigger | Audience |
|---|---|---|
| Initial Notification | MI declared | IT leadership, affected dept heads |
| 30-Minute Update | Every 30 min | IT leadership |
| Status Page Update | On state change | All users (via status page) |
| Resolution Notification | Incident resolved | All stakeholders |
| PIR Summary | PIR complete (T+48h) | IT leadership, ITSM manager |
Post-Incident Review
A blameless Post-Incident Review (PIR) is automatically scheduled 48 hours after a Major Incident is resolved. The PIR template in StackFlow follows the Google SRE postmortem format and includes a pre-populated timeline built from the War Room activity log. AI-assisted PIR generation can create an initial draft PIR document within minutes of the incident being resolved.