Agentic Graph RAG -- Architecture Overview
What is Agentic Cognitive Graph RAG?
StackFlow's Agentic Cognitive Graph RAG is the core intelligence engine that powers every AI feature on the platform. It combines five distinct knowledge sources -- a Neptune knowledge graph, Bedrock vector search, Redis semantic cache, DynamoDB procedural memory, and pattern clusters -- into a unified retrieval-augmented generation pipeline that provides contextually accurate, grounded answers to complex IT operations queries.
- Neptune:
stackflow-knowledge-graphcluster on engine 1.4.7.0, port 8182, IAM auth enabled, Serverless 1-8 NCU - Bedrock KB:
BXJGG7PIPSstatusACTIVEwith at least one completed ingestion job and documents in S3 - OpenSearch Serverless: Collection
q3oso7unldm9p4xsqez4inACTIVEstate with indexstackflow-kb-index - ElastiCache Redis:
stackflow-redis-prodwith TLS, auth token atstackflow/redis/auth-tokenin Secrets Manager - DynamoDB Tables:
StackFlow_ProceduralMemory,StackFlow_PatternCluster,StackFlow_AIExemplarall provisioned - DynamoDB:
stackflow-ai-audit-logtable with TTL onexpiresAtfor audit trail - IAM:
StackFlowAPIRolewithbedrock:InvokeModel,bedrock-agent-runtime:RetrieveAndGenerate,neptune-db:ReadDataViaQuery
Unlike traditional RAG systems that retrieve from a single vector store, StackFlow's cognitive pipeline routes each query through specialized retrieval paths based on intent classification. A CMDB topology question triggers a Neptune Gremlin traversal; a runbook lookup hits Bedrock KB; a repeated query is served instantly from Redis. The orchestrator assembles context from multiple sources simultaneously, then synthesizes a response using Claude via AWS Bedrock.
Pipeline Architecture
User Query / Alert / Incident
│
▼
┌──────────────────────────────────────────────────────────┐
│ AGENTIC ORCHESTRATOR │
│ (StackFlowAPI Lambda -- index.handler) │
│ │
│ 1. Intent Classification → Route to specialist agent │
│ 2. Entity Extraction → CI names, services, errors │
│ 3. Context Assembly → Multi-source retrieval │
│ 4. Response Synthesis → LLM generation │
│ 5. Memory Update → Feedback loop │
└───┬─────────────┬──────────────┬────────────────┬────────┘
│ │ │ │
▼ ▼ ▼ ▼
Neptune Bedrock KB Redis DynamoDB
Graph Vector Semantic Memory
Traversal Search Cache Stores
│ │ │ │
(Gremlin) (Titan Embed (384-dim (ProceduralMemory
│ v2 1024d) key hash) PatternCluster
│ │ │ AIExemplar)
│ │ │ │
└─────────────┴──────────────┴────────────────┘
│
CONTEXT WINDOW
│
Claude 3.x via
AWS Bedrock
│
Structured Response
│
┌───────────┴────────────┐
│ Cache + Log │
│ Redis + ai-audit-log │
└────────────────────────┘
Data Sources
| Source | Technology | Data | Query Method | Latency |
|---|---|---|---|---|
| Knowledge Graph | Neptune 1.4.7 (Gremlin) | CI relationships, service maps, topology | Gremlin traversal | 10–50ms |
| Vector Search | Bedrock KB BXJGG7PIPS + OpenSearch Serverless | Runbooks, KB articles, incident patterns | Semantic similarity (hybrid) | 50–200ms |
| Semantic Cache | Redis (TLS, auth) on cache.t4g.micro | Top-500 pre-embedded queries | SHA256 key hash | 1–5ms |
| Procedural Memory | DynamoDB StackFlow_ProceduralMemory | Step-by-step remediation procedures | PK lookup + GSI | 5–15ms |
| Pattern Clusters | DynamoDB StackFlow_PatternCluster | Historical incident classifications | K-means cluster lookup | 5–20ms |
| Exemplars | DynamoDB StackFlow_AIExemplar | Human-approved resolution examples | Intent GSI + quality score | 20–80ms |
Agent Roles
The orchestrator delegates to specialist agents based on the classified intent. Each agent has a specific retrieval strategy and prompt template optimised for its domain:
| Agent | Intent Types | Primary Source | Prompt Template |
|---|---|---|---|
| Triage Agent | incident_triage, alert_classify | PatternCluster + AIExemplar | incident-triage-v1 |
| Remediation Agent | fix_suggestion, auto_remediate | ProceduralMemory + KB | remediation-suggest-v1 |
| Knowledge Agent | kb_search, how_to, policy_lookup | Bedrock KB (hybrid) | kb-rag-answer-v1 |
| CMDB Agent | topology_query, blast_radius, dependency | Neptune graph traversal | cmdb-graph-answer-v1 |
| Compliance Agent | policy_check, change_risk | KB (policies/) + PatternCluster | change-risk-assessment-v1 |
Query Flow -- Step by Step
- Request Received: StackFlowAPI Lambda receives the query with tenantId, userId, and optional context (incidentId, sessionId).
- Cache Check: SHA256 hash of the normalized query is checked against Redis
sf:cache:{hash}. Cache hit returns response in <5ms withfromCache: true. - Intent Classification: Claude Haiku (fast, cheap) classifies the query into one of 15 intent types using prompt template
intent-classify-v1. - Entity Extraction: Named entities are extracted -- CI names (
aurora-main-prod), error codes (ECONNREFUSED), service names (StackFlowAPI). - Parallel Retrieval: Based on intent, the orchestrator fires parallel requests to relevant sources using
Promise.allSettled(). Failures in individual sources do not abort the pipeline. - Neptune Traversal (if CMDB intent): Gremlin query executes against
stackflow-knowledge-graphwith IAM auth signing. Returns CI relationships and service topology subgraph. - Bedrock KB Retrieval (if knowledge intent):
bedrock-agent-runtime:RetrieveAndGeneratewith hybrid search (semantic + keyword) against KB BXJGG7PIPS. Returns ranked passages. - Memory Lookup: ProceduralMemory and AIExemplar are queried for matching procedures and historical resolutions relevant to the current context.
- Context Assembly: Retrieved content is assembled into the LLM context window. Priority: exemplars > procedural memory > KB passages > graph context. Total context kept under 8,000 tokens.
- Response Generation: Claude 3.5 Sonnet (or router-selected model) generates the final response. The result is cached in Redis, logged to
stackflow-ai-audit-log, and returned to the caller.
Configuration
The pipeline is configured via two DynamoDB tables:
StackFlow_TenantAIConfig-- per-tenant settings including enabled agents, model preferences, and feature flagsStackFlow_AIModelRouter-- routing rules mapping intent types to specific Bedrock models with token limits and temperature overrides
{
"tenantId": "tenant_001",
"copilotEnabled": true,
"triageEnabled": true,
"remediationEnabled": true,
"graphRagEnabled": true,
"semanticCacheEnabled": true,
"exemplarLearningEnabled": true,
"maxContextTokens": 8000,
"defaultModelId": "anthropic.claude-3-5-sonnet-20241022-v2:0",
"fallbackModelId": "anthropic.claude-3-haiku-20240307-v1:0"
}
graphRagEnabled only after the Neptune graph is populated with at least 100 CI vertices. Querying an empty graph adds latency with no benefit.