Model Router
Routing Architecture
The StackFlow Model Router is a middleware layer between the application and AI providers that intelligently selects the optimal model for each request. Routing decisions consider: task type, required quality level, cost budget remaining, provider health, and current latency. The router runs synchronously within the StackFlowAPI Lambda before each Bedrock API call.
- DynamoDB:
StackFlow_AIModelRoutertable with routing rules; PKrouterId, GSI onintentType - DynamoDB:
StackFlow_AIProviderrecords referenced by router rules must bestatus: active - Redis: Router decisions cached under
sf:router:{intentHash}with TTL 600s - Lambda Env Var:
MODEL_ROUTER_TABLE=StackFlow_AIModelRouterset inStackFlowAPI
Routing Rules
Routing rules are evaluated in priority order. The first matching rule determines the model selection. Rules can match on request metadata (task_type, priority, source_module), context (user role, tenant plan), and system state (budget usage, provider health).
{
"routing_rules": [
{
"priority": 1,
"name": "P1 Incident Triage - High Quality",
"condition": {"task_type": "incident_triage", "incident_priority": "P1"},
"model": "claude-3-sonnet",
"provider": "bedrock",
"max_tokens": 4096
},
{
"priority": 2,
"name": "Classification Tasks - Fast & Cheap",
"condition": {"task_type": "classification"},
"model": "claude-3-haiku",
"provider": "bedrock",
"max_tokens": 512
},
{
"priority": 3,
"name": "Complex Analysis - Opus",
"condition": {"task_type": "rca_analysis", "complexity": "high"},
"model": "claude-3-opus",
"provider": "bedrock",
"max_tokens": 8192
},
{
"priority": 999,
"name": "Default",
"condition": {},
"model": "claude-3-haiku",
"provider": "bedrock",
"max_tokens": 2048
}
]
}
Cost Optimization
The model router tracks daily and monthly token spend per tenant. As the daily budget approaches 80%, the router automatically downgrades requests to cheaper models (Haiku instead of Sonnet) while logging the downgrade for review. At 95% daily budget, only critical requests (P1 triage, active copilot sessions) are processed; non-critical tasks are queued.
Latency vs Quality Tradeoffs
Different use cases have different latency tolerances. The AI Copilot requires low latency (<2 seconds for first token) to maintain conversational feel. Background tasks like article generation can tolerate 10-30 seconds. The router uses streaming for interactive use cases and batch processing for background tasks, selecting models accordingly.
Router Metrics
Router metrics are visible in AI → Observability → Model Router. Key metrics include: requests per model, cache hit rate, downgrade frequency, fallback activation count, and cost per task type. These metrics help optimize routing rules and identify opportunities for cost reduction.