AI Observability
Metrics Overview
The AI Observability module provides a unified view of all AI model interactions across the StackFlow platform. Every Bedrock API call is instrumented with CloudWatch metrics and structured logs, enabling cost attribution, performance analysis, and anomaly detection at the model, module, and tenant level.
- DynamoDB:
stackflow-ai-audit-logtable with TTL onexpiresAtattribute (90-day retention) - CloudWatch: Custom namespace
StackFlow/AIwith metrics:ModelInvocations,TokensUsed,CacheHitRate,AvgLatencyMs - IAM:
StackFlowAPIRolewithcloudwatch:PutMetricDataon namespaceStackFlow/AI - Feature Flag:
ai_observabilityenabled inStackFlow_FeatureFlagfor the tenant
The Observability dashboard is accessible at AI → Observability and refreshes every 60 seconds. Historical data is retained for 90 days in CloudWatch and indefinitely in S3 for compliance archives.
Cost Tracking
| Metric | Unit | Description |
|---|---|---|
| Input tokens | Tokens/request | Tokens in prompt + context |
| Output tokens | Tokens/request | Tokens in model response |
| Cost per request | USD | Calculated from model pricing table |
| Daily spend | USD | Rolling 24h aggregate per tenant |
| Cache savings | USD | Cost avoided via semantic cache hits |
| Model efficiency ratio | % | Useful output tokens / total tokens billed |
Model Performance
Model performance metrics track the quality and speed of AI responses. Time to First Token (TTFT) measures perceived latency. Total generation time tracks throughput. Response quality is measured via user feedback (thumbs up/down on AI Copilot responses) and downstream outcome tracking (was an AI-suggested assignment group correct?).
# Query AI usage metrics via CloudWatch
aws cloudwatch get-metric-statistics --namespace StackFlow/AI --metric-name InputTokens --dimensions Name=Model,Value=claude-3-sonnet Name=Module,Value=incident_triage --start-time 2026-05-11T00:00:00Z --end-time 2026-05-18T00:00:00Z --period 86400 --statistics Sum --region us-east-1
Usage by Module
The Usage by Module breakdown shows token consumption and cost attributed to each StackFlow module. This helps identify which features are driving AI costs and where optimization efforts should be focused. Typical distribution: AI Copilot (40%), Incident Triage (25%), KB Generation (20%), Other (15%).
Alerting
CloudWatch Alarms are configured for: daily spend exceeding 90% of budget, P95 latency above 5 seconds (Sonnet) or 10 seconds (Opus), error rate above 1%, and cache hit rate below 20%. Alerts are routed to the stackflow-security-findings SNS topic, which fans out to the on-call team and the StackFlow admin console notification center.