v2026.1 Open Portal ↗
On this page

AWS Infrastructure

Infrastructure Summary

StackFlow's complete AWS infrastructure is deployed in us-east-1 under account 373544523367. All components run within a private VPC with no public IP exposure except through CloudFront and API Gateway endpoints. Every data store is encrypted with CMK mrk-bd842691514c4d74a02992b8dc11fe16.

⚙️ Minimum Requirements
  • Lambda: StackFlowAPI (nodejs22.x, arm64, 1792MB, 300s) in VPC vpc-0c4e3c18734dee8f7
  • Aurora PostgreSQL 16: stackflow-main-prod Multi-AZ cluster with IAM auth and encryption enabled
  • Neptune 1.4.7: stackflow-knowledge-graph cluster with IAM auth, Serverless 1-8 NCU
  • ElastiCache: stackflow-redis-prod (cache.t4g.micro) with TLS, auth token, Multi-AZ
  • KMS CMK: mrk-bd842691514c4d74a02992b8dc11fe16 enabled and key policy allowing all StackFlow roles
ResourceIdentifierPurpose
VPCvpc-0c4e3c18734dee8f7Network isolation for all StackFlow resources
Subnet 1asubnet-05eae5f255dec054fus-east-1a private subnet
Subnet 1bsubnet-03ab773ce82d704d1us-east-1b private subnet
Security Groupsg-0ada825cda6a75ed6StackFlow application tier SG
API Gatewayuazcuhdus2REST API entry point
CloudFrontE1UTZ9SVSR2WGVDocs site CDN

Lambda Functions

Function NameRuntimeMemoryPurpose
StackFlowAPInodejs22.x, arm641792 MBMain API handler (300s timeout)
StackFlowCacheWarmerpy3.12, arm64512 MBRedis + Bedrock cache pre-warming
StackFlowNeptuneCMDBSeedernodejs22.x512 MBSync CMDB to Neptune graph
StackFlowGitHubSyncnodejs22.x512 MBGitHub webhook handler
StackFlowSecretsRotationpy3.12256 MBRotate StackFlow-specific secrets
StackFlowGenericSecretRotationpy3.12256 MBRotate external API keys
StackFlowFieldKeyRotatorpy3.12512 MBRotate field-level encryption keys
StackFlowPatcherpy3.12512 MBApply schema and data patches

Database Layer

Aurora PostgreSQL 16 (Main)
  Endpoint: stackflow-main-prod.cluster-c6pq0smgmlri.us-east-1.rds.amazonaws.com
  Database: stackflow | Port: 5432

Aurora PostgreSQL 17 (Requirements)
  Endpoint: stackflow-req-prod.cluster-c6pq0smgmlri.us-east-1.rds.amazonaws.com
  Port: 5432

Neptune (Knowledge Graph)
  Endpoint: stackflow-knowledge-graph.cluster-c6pq0smgmlri.us-east-1.neptune.amazonaws.com
  Port: 8182 (WebSocket/Gremlin)

ElastiCache Redis (Cache)
  Endpoint: master.stackflow-redis-prod.mnzfvx.use1.cache.amazonaws.com
  Port: 6379 | TLS: Yes | Auth: Token

Networking

All StackFlow traffic flows through CloudFront → API Gateway → Lambda. The Lambda functions use VPC configuration to access Aurora, Neptune, and Redis within the private subnet. No data leaves the VPC except through NAT Gateway (for external API calls) and VPC Interface Endpoints (for AWS service API calls without internet exposure).

VPC Endpoints: StackFlow uses VPC Interface Endpoints for Bedrock, Secrets Manager, KMS, S3, SQS, and SNS to ensure all AWS API calls stay within the AWS network and do not traverse the public internet.

Queue Architecture

Four SQS FIFO queues handle different priority tiers of event processing: StackFlow-Events-Ingestion.fifo for general ITSM events (email-to-ticket, webhook ingestion), StackFlow-Remediation-P1.fifo for critical auto-remediation actions, StackFlow-Remediation-Standard.fifo for non-critical remediation, and StackFlow-Remediation-DLQ.fifo for failed messages requiring manual review. All queues use server-side encryption with the CMK.

Diagnostic Scripts

#!/usr/bin/env python3
"""Test Aurora PostgreSQL connectivity from within VPC."""
import psycopg2
import boto3, json

def get_secret(secret_id):
    sm = boto3.client('secretsmanager', region_name='us-east-1')
    return json.loads(sm.get_secret_value(SecretId=secret_id)['SecretString'])

creds = get_secret('stackflow/aurora-db-credentials')
try:
    conn = psycopg2.connect(
        host='stackflow-main-prod.cluster-c6pq0smgmlri.us-east-1.rds.amazonaws.com',
        database='stackflow', user=creds['username'], password=creds['password'],
        port=5432, connect_timeout=10
    )
    with conn.cursor() as cur:
        cur.execute("SELECT version();")
        print("Aurora version:", cur.fetchone()[0])
        cur.execute("SELECT count(*), state FROM pg_stat_activity GROUP BY state;")
        print("\nConnection pool:")
        for row in cur.fetchall():
            print(f"  {row[1]}: {row[0]} connections")
    conn.close()
    print("\nAurora connectivity: OK")
except Exception as e:
    print(f"Aurora connectivity FAILED: {e}")
# Invoke StackFlowAPI Lambda with a test health check payload
aws lambda invoke \
  --function-name StackFlowAPI \
  --payload '$(echo -n '"'"'{"path":"/prod/api/health","httpMethod":"GET","headers":{"Authorization":"Bearer test"}}'"'"' | base64)' \
  --cli-binary-format raw-in-base64-out \
  --region us-east-1 \
  /tmp/lambda-response.json
cat /tmp/lambda-response.json | python3 -m json.tool

# Check Lambda configuration
aws lambda get-function-configuration \
  --function-name StackFlowAPI \
  --query '{Runtime:Runtime,MemorySize:MemorySize,Timeout:Timeout,Arch:Architectures}' \
  --region us-east-1
# CloudWatch Insights -- API errors by path (last 1 hour)
# Run in CloudWatch Insights console against /aws/lambda/StackFlowAPI

fields @timestamp, @message
| filter @message like /ERROR|statusCode.*[45][0-9][0-9]/
| parse @message '"path":"*"' as path
| parse @message '"statusCode":*,' as statusCode
| stats count() as errorCount by path, statusCode
| sort errorCount desc
| limit 20

# Lambda cold starts (last 24 hours)
fields @timestamp, @message, @initDuration
| filter @type = "REPORT" and ispresent(@initDuration)
| stats avg(@initDuration) as avgColdStart, max(@initDuration) as maxColdStart, count() as count
| sort count desc