AI Observability

Metrics Overview

The AI Observability module provides a unified view of all AI model interactions across the StackFlow platform. Every Bedrock API call is instrumented with CloudWatch metrics and structured logs, enabling cost attribution, performance analysis, and anomaly detection at the model, module, and tenant level.

⚙️ Minimum Requirements

DynamoDB: stackflow-ai-audit-log table with TTL on expiresAt attribute (90-day retention)
CloudWatch: Custom namespace StackFlow/AI with metrics: ModelInvocations, TokensUsed, CacheHitRate, AvgLatencyMs
IAM: StackFlowAPIRole with cloudwatch:PutMetricData on namespace StackFlow/AI
Feature Flag: ai_observability enabled in StackFlow_FeatureFlag for the tenant

The Observability dashboard is accessible at AI → Observability and refreshes every 60 seconds. Historical data is retained for 90 days in CloudWatch and indefinitely in S3 for compliance archives.

Cost Tracking

Metric	Unit	Description
Input tokens	Tokens/request	Tokens in prompt + context
Output tokens	Tokens/request	Tokens in model response
Cost per request	USD	Calculated from model pricing table
Daily spend	USD	Rolling 24h aggregate per tenant
Cache savings	USD	Cost avoided via semantic cache hits
Model efficiency ratio	%	Useful output tokens / total tokens billed

Model Performance

Model performance metrics track the quality and speed of AI responses. Time to First Token (TTFT) measures perceived latency. Total generation time tracks throughput. Response quality is measured via user feedback (thumbs up/down on AI Copilot responses) and downstream outcome tracking (was an AI-suggested assignment group correct?).

# Query AI usage metrics via CloudWatch
aws cloudwatch get-metric-statistics   --namespace StackFlow/AI   --metric-name InputTokens   --dimensions Name=Model,Value=claude-3-sonnet Name=Module,Value=incident_triage   --start-time 2026-05-11T00:00:00Z   --end-time 2026-05-18T00:00:00Z   --period 86400   --statistics Sum   --region us-east-1

Usage by Module

The Usage by Module breakdown shows token consumption and cost attributed to each StackFlow module. This helps identify which features are driving AI costs and where optimization efforts should be focused. Typical distribution: AI Copilot (40%), Incident Triage (25%), KB Generation (20%), Other (15%).

Cache Impact: The semantic cache typically reduces AI Copilot token usage by 40-60% after the system has warmed up over the first 24-48 hours. Monitor the cache hit rate in the Observability dashboard — rates below 30% may indicate the cache TTL is too short or the query diversity is too high.

Alerting

CloudWatch Alarms are configured for: daily spend exceeding 90% of budget, P95 latency above 5 seconds (Sonnet) or 10 seconds (Opus), error rate above 1%, and cache hit rate below 20%. Alerts are routed to the stackflow-security-findings SNS topic, which fans out to the on-call team and the StackFlow admin console notification center.

← Previous

AI Workflows

AI-powered automation designer

Semantic Cache

Redis-backed query caching