v2026.1 Open Portal ↗
On this page

Problem Management

Problem vs Incident

In ITIL terminology, an Incident is a disruption to normal service that needs immediate resolution. A Problem is the underlying cause of one or more incidents. StackFlow maintains separate lifecycle workflows for each, with bidirectional linking that allows you to track which incidents were caused by a known problem.

⚙️ Minimum Requirements
  • DynamoDB: StackFlow_Problem table and StackFlow_KnownError table (KEDB) provisioned
  • IAM: StackFlowAPIRole with dynamodb:PutItem, dynamodb:UpdateItem on both tables
  • Aurora: stackflow.problem_incident_links join table migrated
  • Bedrock KB: BXJGG7PIPS active for AI-assisted root cause analysis suggestions

Problems are typically created reactively (from recurring incidents) or proactively (through trend analysis). The AI engine can automatically suggest creating a Problem record when it detects three or more incidents with similar root cause indicators within a 72-hour window.

Problem Lifecycle

StateDescription
NewProblem identified, root cause unknown
Under InvestigationRCA in progress
Known ErrorRoot cause identified, workaround or fix documented in KEDB
Fix in ProgressPermanent fix being developed (linked to a Change)
ResolvedPermanent fix deployed, no further incidents expected
ClosedConfirmed resolved after monitoring period

Root Cause Analysis

StackFlow provides a structured RCA workspace accessible from the Problem record. The RCA workspace includes a 5-Whys analysis tool, fishbone (Ishikawa) diagram builder, and a timeline view of all related incidents. The AI Copilot can assist with RCA by analyzing incident descriptions and suggesting potential root causes based on the knowledge base.

curl -X GET   "https://your-instance.stackflow-tech.com/prod/api/problems/PRB0000123/rca"   -H "Authorization: Bearer $TOKEN"
Neptune Integration: The RCA timeline uses the Neptune knowledge graph to display the dependency chain of affected CIs, helping identify whether the problem originated in infrastructure, application, or external dependencies.

Known Error Database

The Known Error Database (KEDB) stores documented problems with their workarounds. When an incident is created that matches a known error, the affected agent is immediately shown the workaround steps, reducing mean time to resolution (MTTR). KEDB entries are surfaced in the AI triage results and the AI Copilot sidebar.

KEDB FieldDescription
Error CodeUnique identifier (e.g., KE-DB-001)
SummaryBrief description of the known error
SymptomsObservable symptoms for matching
WorkaroundSteps to restore service without a permanent fix
FixPermanent resolution steps (if available)
Linked ProblemParent PRB record
Linked ChangeCHG record implementing the fix (if in progress)

Creating a Problem

curl -X POST https://your-instance.stackflow-tech.com/prod/api/problems   -H "Authorization: Bearer $TOKEN"   -H "Content-Type: application/json"   -d '{
    "short_description": "Aurora connection pool exhaustion under peak load",
    "related_incidents": ["INC0001234", "INC0001189", "INC0001045"],
    "category": "database",
    "assignment_group": "Platform Engineering",
    "priority": "P2"
  }'