API & Lambda Errors
Error Classification
StackFlow API errors fall into client errors (4xx) and server errors (5xx). Client errors indicate issues with the request itself (bad parameters, missing auth, not found). Server errors indicate internal platform issues and require investigation. All 5xx errors generate a structured error response with a request_id that can be used to find the corresponding CloudWatch log entry.
- CloudWatch Logs:
/aws/lambda/StackFlowAPIlog group; set retention to 30 days minimum - X-Ray: AWS X-Ray tracing enabled on
StackFlowAPILambda and API Gatewayuazcuhdus2 - CloudWatch Alarms:
StackFlowAPI-ErrorsandStackFlowAPI-Throttlesalarms active with SNS notification - Lambda Concurrency: Reserved concurrency of at least 50 on
StackFlowAPIto prevent throttling spikes
{
"error": {
"code": "INTERNAL_SERVER_ERROR",
"message": "An unexpected error occurred",
"request_id": "7f3b2c1d-8e4a-4f2b-9c1d-2e3f4a5b6c7d",
"timestamp": "2026-05-18T14:23:11Z"
}
}
500 Internal Server Errors
| Symptom | Likely Cause | Diagnostic Step | Resolution |
|---|---|---|---|
| Consistent 500 on specific endpoint | Code bug or unhandled exception | Search CloudWatch logs by request_id | Check Lambda logs for stack trace, deploy fix |
| 500 with "Connection timeout" in logs | Aurora max_connections reached | Run SELECT count(*), state FROM pg_stat_activity GROUP BY state on Aurora | Reduce Lambda concurrency or increase max_connections |
| 500 with "Secret not found" in logs | Secrets Manager secret deleted or rotated incorrectly | Check secret ARN exists in Secrets Manager | Restore secret from backup or re-create with correct ARN |
| 500 with KMS AccessDeniedException | Lambda execution role missing KMS permission | Check IAM role policy for kms:Decrypt | Add kms:Decrypt permission for CMK to Lambda role |
503 Service Unavailable
| Symptom | Likely Cause | Diagnostic Step | Resolution |
|---|---|---|---|
| 503 from API Gateway | Lambda throttled (concurrent execution limit hit) | Check Lambda Throttles metric in CloudWatch | Request concurrency limit increase from AWS Support |
| 503 from CloudFront | API Gateway 5XX error rate high | Check CloudFront distribution error rate | Investigate underlying API Gateway/Lambda errors |
| 503 with "Circuit breaker open" | Too many consecutive errors tripped the breaker | Check StackFlow circuit breaker state in Redis | Wait for reset interval or manually reset via admin API |
Cold Start Issues
Lambda cold starts add 1-4 seconds of latency to the first request after a Lambda instance is created. StackFlow mitigates cold starts via the StackFlowCacheWarmer Lambda which pings the API every 4 minutes to keep instances warm. However, traffic spikes can cause new instances to spin up with cold starts.
aws cloudwatch get-metric-statistics --namespace AWS/Lambda --metric-name InitDuration --dimensions Name=FunctionName,Value=StackFlowAPI --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) --period 300 --statistics p99 --region us-east-1
Timeout Errors
The StackFlowAPI Lambda has a 300-second timeout. Requests exceeding this timeout return a 504 Gateway Timeout from API Gateway. Common causes of timeouts include: slow Bedrock API responses, Neptune graph traversals on very large graphs (10k+ nodes without proper indexing), and Aurora queries missing indexes on commonly filtered columns.
aws logs filter-log-events --log-group-name /aws/lambda/StackFlowAPI --filter-pattern "Task timed out" --start-time $(date -d '1 hour ago' +%s000) --region us-east-1