v2026.1 Open Portal ↗
On this page

Agentic Graph RAG -- Architecture Overview

What is Agentic Cognitive Graph RAG?

StackFlow's Agentic Cognitive Graph RAG is the core intelligence engine that powers every AI feature on the platform. It combines five distinct knowledge sources -- a Neptune knowledge graph, Bedrock vector search, Redis semantic cache, DynamoDB procedural memory, and pattern clusters -- into a unified retrieval-augmented generation pipeline that provides contextually accurate, grounded answers to complex IT operations queries.

⚙️ Minimum Requirements
  • Neptune: stackflow-knowledge-graph cluster on engine 1.4.7.0, port 8182, IAM auth enabled, Serverless 1-8 NCU
  • Bedrock KB: BXJGG7PIPS status ACTIVE with at least one completed ingestion job and documents in S3
  • OpenSearch Serverless: Collection q3oso7unldm9p4xsqez4 in ACTIVE state with index stackflow-kb-index
  • ElastiCache Redis: stackflow-redis-prod with TLS, auth token at stackflow/redis/auth-token in Secrets Manager
  • DynamoDB Tables: StackFlow_ProceduralMemory, StackFlow_PatternCluster, StackFlow_AIExemplar all provisioned
  • DynamoDB: stackflow-ai-audit-log table with TTL on expiresAt for audit trail
  • IAM: StackFlowAPIRole with bedrock:InvokeModel, bedrock-agent-runtime:RetrieveAndGenerate, neptune-db:ReadDataViaQuery

Unlike traditional RAG systems that retrieve from a single vector store, StackFlow's cognitive pipeline routes each query through specialized retrieval paths based on intent classification. A CMDB topology question triggers a Neptune Gremlin traversal; a runbook lookup hits Bedrock KB; a repeated query is served instantly from Redis. The orchestrator assembles context from multiple sources simultaneously, then synthesizes a response using Claude via AWS Bedrock.

Why Graph + Vector? Vector search excels at semantic similarity ("find documents about Aurora connection issues") but cannot reason about relationships ("which services are affected if aurora-main-prod fails?"). Neptune's graph traversal fills this gap by encoding CI dependencies, service maps, and change relationships as first-class graph edges.

Pipeline Architecture

User Query / Alert / Incident
        │
        ▼
┌──────────────────────────────────────────────────────────┐
│                 AGENTIC ORCHESTRATOR                      │
│   (StackFlowAPI Lambda -- index.handler)                   │
│                                                          │
│  1. Intent Classification  →  Route to specialist agent  │
│  2. Entity Extraction      →  CI names, services, errors │
│  3. Context Assembly       →  Multi-source retrieval     │
│  4. Response Synthesis     →  LLM generation             │
│  5. Memory Update          →  Feedback loop              │
└───┬─────────────┬──────────────┬────────────────┬────────┘
    │             │              │                │
    ▼             ▼              ▼                ▼
Neptune       Bedrock KB      Redis           DynamoDB
Graph         Vector          Semantic        Memory
Traversal     Search          Cache           Stores
    │             │              │                │
(Gremlin)   (Titan Embed    (384-dim         (ProceduralMemory
    │         v2 1024d)      key hash)         PatternCluster
    │             │              │              AIExemplar)
    │             │              │                │
    └─────────────┴──────────────┴────────────────┘
                          │
                    CONTEXT WINDOW
                          │
                    Claude 3.x via
                    AWS Bedrock
                          │
                    Structured Response
                          │
              ┌───────────┴────────────┐
              │   Cache + Log          │
              │   Redis + ai-audit-log │
              └────────────────────────┘

Data Sources

SourceTechnologyDataQuery MethodLatency
Knowledge GraphNeptune 1.4.7 (Gremlin)CI relationships, service maps, topologyGremlin traversal10–50ms
Vector SearchBedrock KB BXJGG7PIPS + OpenSearch ServerlessRunbooks, KB articles, incident patternsSemantic similarity (hybrid)50–200ms
Semantic CacheRedis (TLS, auth) on cache.t4g.microTop-500 pre-embedded queriesSHA256 key hash1–5ms
Procedural MemoryDynamoDB StackFlow_ProceduralMemoryStep-by-step remediation proceduresPK lookup + GSI5–15ms
Pattern ClustersDynamoDB StackFlow_PatternClusterHistorical incident classificationsK-means cluster lookup5–20ms
ExemplarsDynamoDB StackFlow_AIExemplarHuman-approved resolution examplesIntent GSI + quality score20–80ms

Agent Roles

The orchestrator delegates to specialist agents based on the classified intent. Each agent has a specific retrieval strategy and prompt template optimised for its domain:

AgentIntent TypesPrimary SourcePrompt Template
Triage Agentincident_triage, alert_classifyPatternCluster + AIExemplarincident-triage-v1
Remediation Agentfix_suggestion, auto_remediateProceduralMemory + KBremediation-suggest-v1
Knowledge Agentkb_search, how_to, policy_lookupBedrock KB (hybrid)kb-rag-answer-v1
CMDB Agenttopology_query, blast_radius, dependencyNeptune graph traversalcmdb-graph-answer-v1
Compliance Agentpolicy_check, change_riskKB (policies/) + PatternClusterchange-risk-assessment-v1

Query Flow -- Step by Step

  1. Request Received: StackFlowAPI Lambda receives the query with tenantId, userId, and optional context (incidentId, sessionId).
  2. Cache Check: SHA256 hash of the normalized query is checked against Redis sf:cache:{hash}. Cache hit returns response in <5ms with fromCache: true.
  3. Intent Classification: Claude Haiku (fast, cheap) classifies the query into one of 15 intent types using prompt template intent-classify-v1.
  4. Entity Extraction: Named entities are extracted -- CI names (aurora-main-prod), error codes (ECONNREFUSED), service names (StackFlowAPI).
  5. Parallel Retrieval: Based on intent, the orchestrator fires parallel requests to relevant sources using Promise.allSettled(). Failures in individual sources do not abort the pipeline.
  6. Neptune Traversal (if CMDB intent): Gremlin query executes against stackflow-knowledge-graph with IAM auth signing. Returns CI relationships and service topology subgraph.
  7. Bedrock KB Retrieval (if knowledge intent): bedrock-agent-runtime:RetrieveAndGenerate with hybrid search (semantic + keyword) against KB BXJGG7PIPS. Returns ranked passages.
  8. Memory Lookup: ProceduralMemory and AIExemplar are queried for matching procedures and historical resolutions relevant to the current context.
  9. Context Assembly: Retrieved content is assembled into the LLM context window. Priority: exemplars > procedural memory > KB passages > graph context. Total context kept under 8,000 tokens.
  10. Response Generation: Claude 3.5 Sonnet (or router-selected model) generates the final response. The result is cached in Redis, logged to stackflow-ai-audit-log, and returned to the caller.

Configuration

The pipeline is configured via two DynamoDB tables:

{
  "tenantId": "tenant_001",
  "copilotEnabled": true,
  "triageEnabled": true,
  "remediationEnabled": true,
  "graphRagEnabled": true,
  "semanticCacheEnabled": true,
  "exemplarLearningEnabled": true,
  "maxContextTokens": 8000,
  "defaultModelId": "anthropic.claude-3-5-sonnet-20241022-v2:0",
  "fallbackModelId": "anthropic.claude-3-haiku-20240307-v1:0"
}
Tip: Enable graphRagEnabled only after the Neptune graph is populated with at least 100 CI vertices. Querying an empty graph adds latency with no benefit.