v2026.1 Open Portal ↗
On this page

AI & Bedrock Errors

Error Taxonomy

AI and Bedrock errors in StackFlow fall into three categories: model invocation errors (API-level failures from Bedrock), retrieval errors (RAG pipeline failures when fetching context), and generation quality issues (model returns but response is poor quality). Only the first two categories are system errors; quality issues require prompt template tuning.

⚙️ Minimum Requirements
  • CloudWatch Logs: AI errors visible in /aws/lambda/StackFlowAPI with filter ERROR.*Bedrock|Neptune|Redis
  • DynamoDB: stackflow-ai-audit-log table accessible for querying failed AI interactions
  • Bedrock: Service quotas visible in AWS console; check model invocation limits for anthropic.claude-3-5-sonnet-20241022-v2:0
  • CloudWatch Alarms: BedrockThrottling alarm active on metric StackFlow/AI:ModelThrottleCount

Model Errors

SymptomLikely CauseDiagnostic StepResolution
ValidationException from BedrockInput exceeds model context windowLog input token count before API callReduce context size — limit exemplars to 2, truncate description
AccessDeniedException from BedrockLambda role not granted model accessCheck Lambda IAM role for bedrock:InvokeModel permissionAdd bedrock:InvokeModel for specific model ARN to Lambda role
ModelNotReadyExceptionModel is being updated or reloaded by AWSWait 60s, check AWS Service Health dashboardRetry request; configure automatic fallback to alternative model
ResourceNotFoundException for KBKB ID misconfigured or KB deletedVerify KB BXJGG7PIPS exists: aws bedrock-agent get-knowledge-base --knowledge-base-id BXJGG7PIPSCorrect KB ID in system properties

Throttling & Quotas

Bedrock has model-specific invocation rate limits (Requests Per Minute, RPM). ThrottlingException indicates the RPM limit has been exceeded. Check the AI Observability dashboard for request rate trends, and review the model router's routing rules to ensure expensive models are reserved for complex tasks.

aws cloudwatch get-metric-statistics   --namespace AWS/Bedrock   --metric-name ThrottledRequests   --dimensions Name=ModelId,Value=anthropic.claude-3-sonnet-20240229-v1:0   --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)   --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ)   --period 300 --statistics Sum   --region us-east-1
Quota Increases: Request Bedrock quota increases via the AWS Service Quotas console. Specify your expected peak RPM and the business use case. AWS typically responds within 2-3 business days for Bedrock quota increases.

RAG Failures

SymptomLikely CauseDiagnostic StepResolution
AI returns "I don't have information" for known articlesKB not synced recentlyCheck last ingestion job status in Bedrock consoleTrigger manual sync via bedrock-agent start-ingestion-job
KB returns irrelevant resultsSimilarity threshold too lowTest retrieval directly using bedrock-agent-runtime retrieve APIIncrease minimum similarity threshold in RAG config
New articles not appearing in searchS3 sync not triggering Bedrock ingestionCheck S3 event notification on knowledge base bucketVerify S3 event notification is configured to trigger KB sync

Semantic Cache Issues

redis-cli -h master.stackflow-redis-prod.mnzfvx.use1.cache.amazonaws.com   -p 6379 -a "$REDIS_AUTH_TOKEN" --tls   --scan --pattern "t:TENANT_ID:ai:cache:*" | wc -l

redis-cli -h master.stackflow-redis-prod.mnzfvx.use1.cache.amazonaws.com   -p 6379 -a "$REDIS_AUTH_TOKEN" --tls   INFO stats | grep keyspace_hits
redis-cli ... INFO stats | grep keyspace_misses

If the cache hit rate is unexpectedly low (<20%), check whether the similarity threshold is set too high (above 0.95), whether TTLs are too short, or whether Redis memory pressure is causing premature eviction. Increase Redis node size or reduce TTLs for less critical cache entries to free memory for AI response caching.