Neptune Graph Layer
Neptune Cluster Details
The StackFlow knowledge graph runs on Amazon Neptune in us-east-1. The cluster is configured for serverless scaling, allowing it to handle both the steady-state queries from the agentic pipeline and burst loads during major incident discovery.
- Neptune Cluster:
stackflow-knowledge-graph.cluster-c6pq0smgmlri.us-east-1.neptune.amazonaws.comreachable on port 8182 from Lambda VPC - IAM Auth: Neptune IAM authentication enabled;
StackFlowNeptuneRoleattached to Lambda execution role - KMS: Neptune cluster encrypted with key
98d8c3a0-280f-4c1e-b1ff-0d4029120bdb - Graph Data: At least 100 CI vertices and 50 relationship edges for meaningful query results
- Seeder Lambda:
StackFlowNeptuneCMDBSeederhas run at least once to populate the graph from Aurora CI data
| Property | Value |
|---|---|
| Cluster Endpoint | stackflow-knowledge-graph.cluster-c6pq0smgmlri.us-east-1.neptune.amazonaws.com |
| Reader Endpoint | stackflow-knowledge-graph.cluster-ro-c6pq0smgmlri.us-east-1.neptune.amazonaws.com |
| Port | 8182 |
| Engine Version | Neptune 1.4.7.0 |
| Capacity | Serverless 1–8 NCU (auto-scaling) |
| Multi-AZ | Yes (us-east-1a, us-east-1b) |
| IAM Auth | Enabled (SigV4 signing required) |
| KMS Key | 98d8c3a0-280f-4c1e-b1ff-0d4029120bdb |
| TLS | Required (HTTPS on port 8182) |
Graph Schema
The Neptune graph uses a property graph model with the following vertex and edge types:
| Vertex Label | Key Properties | Description |
|---|---|---|
CI | ciId, name, type, status, criticality, tenantId | Configuration Item (server, DB, network device) |
Service | serviceId, name, tier, owner, slaTarget | Business service composed of multiple CIs |
Incident | incidentId, priority, status, createdAt | Active or historical incident records |
Change | changeId, type, status, scheduledAt | Change requests affecting CIs |
Application | appId, name, version, deployedAt | Application deployed on infrastructure |
User | userId, name, role, teamId | Engineers responsible for CIs |
| Edge Label | From → To | Description |
|---|---|---|
DEPENDS_ON | CI → CI | Upstream/downstream service dependency |
HOSTS | CI → Application | Server hosts an application |
TRIGGERS | CI → Incident | CI state change triggered an incident |
AFFECTS | Incident → Service | Incident impacts a business service |
RUNS_ON | Service → CI | Service runs on infrastructure CIs |
OWNS | User → CI | Engineer owns/is responsible for CI |
PART_OF | CI → Service | CI is a component of a service |
Gremlin Query Examples
// Find all CIs directly depended on by a specific CI (blast radius level 1)
g.V().has('CI', 'ciId', 'ci-prod-api-001')
.out('DEPENDS_ON')
.values('name', 'type', 'status')
// Find full blast radius for an incident (3 hops)
g.V().has('Incident', 'incidentId', 'INC0012345')
.out('AFFECTS')
.repeat(out('DEPENDS_ON')).times(3)
.dedup()
.project('id','name','type','criticality')
.by('ciId').by('name').by('type').by('criticality')
// Find services with most upstream dependencies (risk scoring)
g.V().hasLabel('Service')
.order().by(inE('DEPENDS_ON').count(), desc)
.limit(10)
.project('service','dep_count')
.by('name').by(inE('DEPENDS_ON').count())
// Shortest path between two CIs (impact chain)
g.V().has('CI','name','web-frontend')
.repeat(out().simplePath())
.until(has('name','aurora-main-prod'))
.path()
.limit(1)
// Find all CIs changed in last 24 hours that affect a service
g.V().has('Service','name','StackFlowAPI')
.in('RUNS_ON')
.in('AFFECTS').hasLabel('Change')
.has('scheduledAt', gte(System.currentTimeMillis() - 86400000))
.project('changeId','type','status','scheduledAt')
.by('changeId').by('type').by('status').by('scheduledAt')
// Get owner for all CIs in a service
g.V().has('Service','name','StackFlowAPI')
.in('RUNS_ON').in('PART_OF')
.in('OWNS').hasLabel('User')
.dedup()
.values('name','role','teamId')
// Count incidents per CI in last 30 days
g.V().hasLabel('CI')
.project('name','incidentCount')
.by('name')
.by(in('TRIGGERS').has('createdAt', gte(System.currentTimeMillis() - 2592000000)).count())
.order().by(select('incidentCount'), desc)
.limit(20)
// Find all CIs in DEGRADED status and their dependent services
g.V().has('CI','status','degraded')
.project('ci','affectedServices')
.by('name')
.by(out('PART_OF').hasLabel('Service').values('name').fold())
Lambda Integration
StackFlow Lambdas connect to Neptune using HTTPS with AWS SigV4 request signing. The Neptune endpoint requires IAM authentication -- plain Gremlin WebSocket connections without signing are rejected.
import { SignatureV4 } from '@aws-sdk/signature-v4';
import { Sha256 } from '@aws-crypto/sha256-js';
import { defaultProvider } from '@aws-sdk/credential-provider-node';
const NEPTUNE_ENDPOINT = process.env.NEPTUNE_ENDPOINT!;
const REGION = 'us-east-1';
export async function queryNeptune(gremlinQuery: string): Promise {
const url = `https://${NEPTUNE_ENDPOINT}:8182/gremlin`;
const body = JSON.stringify({ gremlin: gremlinQuery });
const signer = new SignatureV4({
credentials: defaultProvider(),
region: REGION,
service: 'neptune-db',
sha256: Sha256,
});
const request = {
method: 'POST',
hostname: NEPTUNE_ENDPOINT,
port: 8182,
path: '/gremlin',
protocol: 'https:',
headers: {
'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(body).toString(),
Host: `${NEPTUNE_ENDPOINT}:8182`,
},
body,
};
const signedRequest = await signer.sign(request);
const response = await fetch(url, {
method: 'POST',
headers: signedRequest.headers as Record,
body,
signal: AbortSignal.timeout(30000),
});
if (!response.ok) {
const err = await response.text();
throw new Error(`Neptune query failed: ${response.status} -- ${err}`);
}
const result = await response.json();
return result?.result?.data?.['@value'] ?? [];
}
Seeder Reference
The StackFlowNeptuneCMDBSeeder Lambda populates the Neptune graph by reading CI records from Aurora PostgreSQL and upserting them as graph vertices and edges. It runs after each cloud discovery cycle and on a daily full-sync schedule.
| Property | Value |
|---|---|
| Function Name | StackFlowNeptuneCMDBSeeder |
| Runtime | nodejs22.x, arm64, 1024MB, 120s timeout |
| Trigger | EventBridge rule stackflow-neptune-sync-daily (03:00 UTC) |
| Env: NEPTUNE_ENDPOINT | stackflow-knowledge-graph.cluster-c6pq0smgmlri.us-east-1.neptune.amazonaws.com |
| Env: PG_HOST | stackflow-main-prod.cluster-c6pq0smgmlri.us-east-1.rds.amazonaws.com |
# Manually trigger the Neptune seeder
aws lambda invoke --function-name StackFlowNeptuneCMDBSeeder --payload '{"mode": "full_sync"}' --region us-east-1 /tmp/seeder-output.json && cat /tmp/seeder-output.json
# Check last seeder run logs
aws logs filter-log-events --log-group-name /aws/lambda/StackFlowNeptuneCMDBSeeder --start-time $(date -d '24 hours ago' +%s000) --filter-pattern '"vertices_upserted"' --query 'events[*].message' --output text