v2026.1 Open Portal ↗
On this page

Neptune Graph Layer

Neptune Cluster Details

The StackFlow knowledge graph runs on Amazon Neptune in us-east-1. The cluster is configured for serverless scaling, allowing it to handle both the steady-state queries from the agentic pipeline and burst loads during major incident discovery.

⚙️ Minimum Requirements
  • Neptune Cluster: stackflow-knowledge-graph.cluster-c6pq0smgmlri.us-east-1.neptune.amazonaws.com reachable on port 8182 from Lambda VPC
  • IAM Auth: Neptune IAM authentication enabled; StackFlowNeptuneRole attached to Lambda execution role
  • KMS: Neptune cluster encrypted with key 98d8c3a0-280f-4c1e-b1ff-0d4029120bdb
  • Graph Data: At least 100 CI vertices and 50 relationship edges for meaningful query results
  • Seeder Lambda: StackFlowNeptuneCMDBSeeder has run at least once to populate the graph from Aurora CI data
PropertyValue
Cluster Endpointstackflow-knowledge-graph.cluster-c6pq0smgmlri.us-east-1.neptune.amazonaws.com
Reader Endpointstackflow-knowledge-graph.cluster-ro-c6pq0smgmlri.us-east-1.neptune.amazonaws.com
Port8182
Engine VersionNeptune 1.4.7.0
CapacityServerless 1–8 NCU (auto-scaling)
Multi-AZYes (us-east-1a, us-east-1b)
IAM AuthEnabled (SigV4 signing required)
KMS Key98d8c3a0-280f-4c1e-b1ff-0d4029120bdb
TLSRequired (HTTPS on port 8182)

Graph Schema

The Neptune graph uses a property graph model with the following vertex and edge types:

Vertex LabelKey PropertiesDescription
CIciId, name, type, status, criticality, tenantIdConfiguration Item (server, DB, network device)
ServiceserviceId, name, tier, owner, slaTargetBusiness service composed of multiple CIs
IncidentincidentId, priority, status, createdAtActive or historical incident records
ChangechangeId, type, status, scheduledAtChange requests affecting CIs
ApplicationappId, name, version, deployedAtApplication deployed on infrastructure
UseruserId, name, role, teamIdEngineers responsible for CIs
Edge LabelFrom → ToDescription
DEPENDS_ONCI → CIUpstream/downstream service dependency
HOSTSCI → ApplicationServer hosts an application
TRIGGERSCI → IncidentCI state change triggered an incident
AFFECTSIncident → ServiceIncident impacts a business service
RUNS_ONService → CIService runs on infrastructure CIs
OWNSUser → CIEngineer owns/is responsible for CI
PART_OFCI → ServiceCI is a component of a service

Gremlin Query Examples

// Find all CIs directly depended on by a specific CI (blast radius level 1)
g.V().has('CI', 'ciId', 'ci-prod-api-001')
  .out('DEPENDS_ON')
  .values('name', 'type', 'status')

// Find full blast radius for an incident (3 hops)
g.V().has('Incident', 'incidentId', 'INC0012345')
  .out('AFFECTS')
  .repeat(out('DEPENDS_ON')).times(3)
  .dedup()
  .project('id','name','type','criticality')
    .by('ciId').by('name').by('type').by('criticality')

// Find services with most upstream dependencies (risk scoring)
g.V().hasLabel('Service')
  .order().by(inE('DEPENDS_ON').count(), desc)
  .limit(10)
  .project('service','dep_count')
    .by('name').by(inE('DEPENDS_ON').count())

// Shortest path between two CIs (impact chain)
g.V().has('CI','name','web-frontend')
  .repeat(out().simplePath())
  .until(has('name','aurora-main-prod'))
  .path()
  .limit(1)

// Find all CIs changed in last 24 hours that affect a service
g.V().has('Service','name','StackFlowAPI')
  .in('RUNS_ON')
  .in('AFFECTS').hasLabel('Change')
  .has('scheduledAt', gte(System.currentTimeMillis() - 86400000))
  .project('changeId','type','status','scheduledAt')
    .by('changeId').by('type').by('status').by('scheduledAt')

// Get owner for all CIs in a service
g.V().has('Service','name','StackFlowAPI')
  .in('RUNS_ON').in('PART_OF')
  .in('OWNS').hasLabel('User')
  .dedup()
  .values('name','role','teamId')

// Count incidents per CI in last 30 days
g.V().hasLabel('CI')
  .project('name','incidentCount')
    .by('name')
    .by(in('TRIGGERS').has('createdAt', gte(System.currentTimeMillis() - 2592000000)).count())
  .order().by(select('incidentCount'), desc)
  .limit(20)

// Find all CIs in DEGRADED status and their dependent services
g.V().has('CI','status','degraded')
  .project('ci','affectedServices')
    .by('name')
    .by(out('PART_OF').hasLabel('Service').values('name').fold())

Lambda Integration

StackFlow Lambdas connect to Neptune using HTTPS with AWS SigV4 request signing. The Neptune endpoint requires IAM authentication -- plain Gremlin WebSocket connections without signing are rejected.

import { SignatureV4 } from '@aws-sdk/signature-v4';
import { Sha256 } from '@aws-crypto/sha256-js';
import { defaultProvider } from '@aws-sdk/credential-provider-node';

const NEPTUNE_ENDPOINT = process.env.NEPTUNE_ENDPOINT!;
const REGION = 'us-east-1';

export async function queryNeptune(gremlinQuery: string): Promise {
  const url = `https://${NEPTUNE_ENDPOINT}:8182/gremlin`;
  const body = JSON.stringify({ gremlin: gremlinQuery });

  const signer = new SignatureV4({
    credentials: defaultProvider(),
    region: REGION,
    service: 'neptune-db',
    sha256: Sha256,
  });

  const request = {
    method: 'POST',
    hostname: NEPTUNE_ENDPOINT,
    port: 8182,
    path: '/gremlin',
    protocol: 'https:',
    headers: {
      'Content-Type': 'application/json',
      'Content-Length': Buffer.byteLength(body).toString(),
      Host: `${NEPTUNE_ENDPOINT}:8182`,
    },
    body,
  };

  const signedRequest = await signer.sign(request);

  const response = await fetch(url, {
    method: 'POST',
    headers: signedRequest.headers as Record,
    body,
    signal: AbortSignal.timeout(30000),
  });

  if (!response.ok) {
    const err = await response.text();
    throw new Error(`Neptune query failed: ${response.status} -- ${err}`);
  }

  const result = await response.json();
  return result?.result?.data?.['@value'] ?? [];
}

Seeder Reference

The StackFlowNeptuneCMDBSeeder Lambda populates the Neptune graph by reading CI records from Aurora PostgreSQL and upserting them as graph vertices and edges. It runs after each cloud discovery cycle and on a daily full-sync schedule.

PropertyValue
Function NameStackFlowNeptuneCMDBSeeder
Runtimenodejs22.x, arm64, 1024MB, 120s timeout
TriggerEventBridge rule stackflow-neptune-sync-daily (03:00 UTC)
Env: NEPTUNE_ENDPOINTstackflow-knowledge-graph.cluster-c6pq0smgmlri.us-east-1.neptune.amazonaws.com
Env: PG_HOSTstackflow-main-prod.cluster-c6pq0smgmlri.us-east-1.rds.amazonaws.com
# Manually trigger the Neptune seeder
aws lambda invoke   --function-name StackFlowNeptuneCMDBSeeder   --payload '{"mode": "full_sync"}'   --region us-east-1   /tmp/seeder-output.json && cat /tmp/seeder-output.json

# Check last seeder run logs
aws logs filter-log-events   --log-group-name /aws/lambda/StackFlowNeptuneCMDBSeeder   --start-time $(date -d '24 hours ago' +%s000)   --filter-pattern '"vertices_upserted"'   --query 'events[*].message' --output text