Agentic Graph RAG -- Architecture Overview

What is Agentic Cognitive Graph RAG?

StackFlow's Agentic Cognitive Graph RAG is the core intelligence engine that powers every AI feature on the platform. It combines five distinct knowledge sources -- a Neptune knowledge graph, Bedrock vector search, Redis semantic cache, DynamoDB procedural memory, and pattern clusters -- into a unified retrieval-augmented generation pipeline that provides contextually accurate, grounded answers to complex IT operations queries.

⚙️ Minimum Requirements

Neptune: stackflow-knowledge-graph cluster on engine 1.4.7.0, port 8182, IAM auth enabled, Serverless 1-8 NCU
Bedrock KB: BXJGG7PIPS status ACTIVE with at least one completed ingestion job and documents in S3
OpenSearch Serverless: Collection q3oso7unldm9p4xsqez4 in ACTIVE state with index stackflow-kb-index
ElastiCache Redis: stackflow-redis-prod with TLS, auth token at stackflow/redis/auth-token in Secrets Manager
DynamoDB Tables: StackFlow_ProceduralMemory, StackFlow_PatternCluster, StackFlow_AIExemplar all provisioned
DynamoDB: stackflow-ai-audit-log table with TTL on expiresAt for audit trail
IAM: StackFlowAPIRole with bedrock:InvokeModel, bedrock-agent-runtime:RetrieveAndGenerate, neptune-db:ReadDataViaQuery

Unlike traditional RAG systems that retrieve from a single vector store, StackFlow's cognitive pipeline routes each query through specialized retrieval paths based on intent classification. A CMDB topology question triggers a Neptune Gremlin traversal; a runbook lookup hits Bedrock KB; a repeated query is served instantly from Redis. The orchestrator assembles context from multiple sources simultaneously, then synthesizes a response using Claude via AWS Bedrock.

Why Graph + Vector? Vector search excels at semantic similarity ("find documents about Aurora connection issues") but cannot reason about relationships ("which services are affected if aurora-main-prod fails?"). Neptune's graph traversal fills this gap by encoding CI dependencies, service maps, and change relationships as first-class graph edges.

Pipeline Architecture

User Query / Alert / Incident
        │
        ▼
┌──────────────────────────────────────────────────────────┐
│                 AGENTIC ORCHESTRATOR                      │
│   (StackFlowAPI Lambda -- index.handler)                   │
│                                                          │
│  1. Intent Classification  →  Route to specialist agent  │
│  2. Entity Extraction      →  CI names, services, errors │
│  3. Context Assembly       →  Multi-source retrieval     │
│  4. Response Synthesis     →  LLM generation             │
│  5. Memory Update          →  Feedback loop              │
└───┬─────────────┬──────────────┬────────────────┬────────┘
    │             │              │                │
    ▼             ▼              ▼                ▼
Neptune       Bedrock KB      Redis           DynamoDB
Graph         Vector          Semantic        Memory
Traversal     Search          Cache           Stores
    │             │              │                │
(Gremlin)   (Titan Embed    (384-dim         (ProceduralMemory
    │         v2 1024d)      key hash)         PatternCluster
    │             │              │              AIExemplar)
    │             │              │                │
    └─────────────┴──────────────┴────────────────┘
                          │
                    CONTEXT WINDOW
                          │
                    Claude 3.x via
                    AWS Bedrock
                          │
                    Structured Response
                          │
              ┌───────────┴────────────┐
              │   Cache + Log          │
              │   Redis + ai-audit-log │
              └────────────────────────┘

Data Sources

Source	Technology	Data	Query Method	Latency
Knowledge Graph	Neptune 1.4.7 (Gremlin)	CI relationships, service maps, topology	Gremlin traversal	10–50ms
Vector Search	Bedrock KB BXJGG7PIPS + OpenSearch Serverless	Runbooks, KB articles, incident patterns	Semantic similarity (hybrid)	50–200ms
Semantic Cache	Redis (TLS, auth) on cache.t4g.micro	Top-500 pre-embedded queries	SHA256 key hash	1–5ms
Procedural Memory	DynamoDB StackFlow_ProceduralMemory	Step-by-step remediation procedures	PK lookup + GSI	5–15ms
Pattern Clusters	DynamoDB StackFlow_PatternCluster	Historical incident classifications	K-means cluster lookup	5–20ms
Exemplars	DynamoDB StackFlow_AIExemplar	Human-approved resolution examples	Intent GSI + quality score	20–80ms

Agent Roles

The orchestrator delegates to specialist agents based on the classified intent. Each agent has a specific retrieval strategy and prompt template optimised for its domain:

Agent	Intent Types	Primary Source	Prompt Template
Triage Agent	incident_triage, alert_classify	PatternCluster + AIExemplar	incident-triage-v1
Remediation Agent	fix_suggestion, auto_remediate	ProceduralMemory + KB	remediation-suggest-v1
Knowledge Agent	kb_search, how_to, policy_lookup	Bedrock KB (hybrid)	kb-rag-answer-v1
CMDB Agent	topology_query, blast_radius, dependency	Neptune graph traversal	cmdb-graph-answer-v1
Compliance Agent	policy_check, change_risk	KB (policies/) + PatternCluster	change-risk-assessment-v1

Query Flow -- Step by Step

Request Received: StackFlowAPI Lambda receives the query with tenantId, userId, and optional context (incidentId, sessionId).
Cache Check: SHA256 hash of the normalized query is checked against Redis sf:cache:{hash}. Cache hit returns response in <5ms with fromCache: true.
Intent Classification: Claude Haiku (fast, cheap) classifies the query into one of 15 intent types using prompt template intent-classify-v1.
Entity Extraction: Named entities are extracted -- CI names (aurora-main-prod), error codes (ECONNREFUSED), service names (StackFlowAPI).
Parallel Retrieval: Based on intent, the orchestrator fires parallel requests to relevant sources using Promise.allSettled(). Failures in individual sources do not abort the pipeline.
Neptune Traversal (if CMDB intent): Gremlin query executes against stackflow-knowledge-graph with IAM auth signing. Returns CI relationships and service topology subgraph.
Bedrock KB Retrieval (if knowledge intent): bedrock-agent-runtime:RetrieveAndGenerate with hybrid search (semantic + keyword) against KB BXJGG7PIPS. Returns ranked passages.
Memory Lookup: ProceduralMemory and AIExemplar are queried for matching procedures and historical resolutions relevant to the current context.
Context Assembly: Retrieved content is assembled into the LLM context window. Priority: exemplars > procedural memory > KB passages > graph context. Total context kept under 8,000 tokens.
Response Generation: Claude 3.5 Sonnet (or router-selected model) generates the final response. The result is cached in Redis, logged to stackflow-ai-audit-log, and returned to the caller.

Configuration

The pipeline is configured via two DynamoDB tables:

StackFlow_TenantAIConfig -- per-tenant settings including enabled agents, model preferences, and feature flags
StackFlow_AIModelRouter -- routing rules mapping intent types to specific Bedrock models with token limits and temperature overrides

{
  "tenantId": "tenant_001",
  "copilotEnabled": true,
  "triageEnabled": true,
  "remediationEnabled": true,
  "graphRagEnabled": true,
  "semanticCacheEnabled": true,
  "exemplarLearningEnabled": true,
  "maxContextTokens": 8000,
  "defaultModelId": "anthropic.claude-3-5-sonnet-20241022-v2:0",
  "fallbackModelId": "anthropic.claude-3-haiku-20240307-v1:0"
}

Tip: Enable graphRagEnabled only after the Neptune graph is populated with at least 100 CI vertices. Querying an empty graph adds latency with no benefit.

← Previous

Exemplar Learning

Teaching AI from resolved cases

Neptune Graph Layer

Knowledge graph queries