CMDB & Discovery

Discovery Failures

CMDB discovery failures prevent accurate CI data and can impact impact analysis, change risk scoring, and AI triage accuracy. Most discovery failures are caused by network connectivity issues, credential problems, or cloud API rate limiting. Check the Discovery Logs in CMDB → Discovery → Discovery Logs for error details.

⚙️ Minimum Requirements

CloudWatch Logs: /aws/lambda/StackFlowCloudDiscovery and /aws/lambda/StackFlowNeptuneCMDBSeeder log groups
DynamoDB: StackFlow_DiscoveryJob table accessible to check last discovery run status and errors
Neptune: Graph query endpoint accessible from diagnostic tooling within VPC
IAM: StackFlowDiscoveryRole cross-account trust active in all connected accounts

Agent Issues

Symptom	Likely Cause	Diagnostic Step	Resolution
Agent last heartbeat > 30 min	Agent service stopped or network issue	`systemctl status stackflow-agent` on target host	Restart agent service; check firewall rules for HTTPS outbound
Agent showing wrong hostname	Hostname changed after agent install	Check `/etc/stackflow-agent/config.yaml` for override	Update hostname_override in config; restart agent
Agent token expired	Token revoked or rotated	Check token status in Admin → Discovery → Agent Tokens	Generate new token, update config.yaml on target host
Hardware inventory incomplete	Agent lacks root/admin access	Check agent logs for "Permission denied" errors	Run agent as root (Linux) or SYSTEM (Windows)

Cloud Discovery Issues

Symptom	Likely Cause	Diagnostic Step	Resolution
AWS account shows 0 CIs	Cross-account role trust policy wrong	Test role assumption: `aws sts assume-role --role-arn arn:aws:iam::{account}:role/StackFlowDiscoveryRole --role-session-name test`	Fix trust policy external ID condition
Azure discovery fails with 401	Azure service principal secret expired	Check expiry in Azure Portal → App Registrations → Certificates & Secrets	Rotate client secret, update Secrets Manager entry
Partial AWS discovery (some services missing)	Missing IAM permissions for specific services	Check Lambda CloudWatch logs for AccessDenied during discovery	Add missing read permissions to StackFlowDiscoveryPolicy

Discovery Logs: Detailed discovery logs are in CloudWatch under /aws/lambda/StackFlowAPI filtered by discovery. Each discovery run logs start, per-service counts, any errors, and total CI count at completion.

Neptune Sync Issues

aws lambda invoke   --function-name StackFlowNeptuneCMDBSeeder   --payload '{"mode": "delta", "dry_run": true}'   --region us-east-1   output.json
cat output.json

If the seeder Lambda is failing, check its CloudWatch logs. Common issues include Neptune connection timeouts (check Neptune cluster status), Gremlin serialization errors on malformed CI data, and Lambda timeout when syncing very large CI sets (reduce batch size in seeder configuration).

CMDB Data Quality

Data quality issues manifest as incorrect impact analysis results, wrong CI counts in dashboards, or duplicate CI records. The CMDB health dashboard at CMDB → Health shows: duplicate detection results, orphaned CIs (no relationships), stale CIs (no update in 30+ days), and missing mandatory attributes. Address these proactively to maintain AI analysis quality.

← Previous

AI & Bedrock Errors

Model errors, throttling, RAG failures

Email & Notifications

SES, routing, and template errors