AI Engine Overview
AI Architecture
StackFlow's cognitive AI engine is built on Amazon Bedrock, providing managed access to foundation models including Anthropic Claude (default), Amazon Titan, and Llama 3. The model router selects the optimal model for each request based on task type, latency requirements, and cost budget. All model interactions are logged to the AI Observability module for cost tracking and performance analysis.
- DynamoDB:
StackFlow_TenantAIConfigwith at least one active AI configuration record - DynamoDB:
StackFlow_AIModelRouterwith routing rules for at least one intent type - Bedrock: At least one model enabled in account
373544523367(recommended:anthropic.claude-3-5-sonnet-20241022-v2:0) - IAM:
StackFlowAPIRolewithbedrock:InvokeModelandbedrock-agent-runtime:RetrieveAndGenerate - Redis:
stackflow-redis-prodaccessible for semantic cache; auth token in Secrets Managerstackflow/redis/auth-token
The AI engine integrates with every module in StackFlow — from incident triage to article generation to workflow automation. It uses the Bedrock Knowledge Base (BXJGG7PIPS) for grounded responses and the exemplar learning system for few-shot context injection.
Model Selection
| Model | Use Case | Latency | Cost |
|---|---|---|---|
| Claude 3 Haiku | Classification, simple triage, cache-warmer | Fast (<1s) | Low |
| Claude 3 Sonnet | General AI tasks, copilot, article generation | Medium (1-3s) | Medium |
| Claude 3 Opus | Complex RCA, major incident analysis, code gen | Slow (3-8s) | High |
| Titan Embeddings v2 | Vector embedding generation | Very fast | Very low |
AI Use Cases
StackFlow deploys AI across the entire platform. In ITSM, the AI engine classifies incoming incidents, suggests assignments, drafts resolution notes, and generates PIR summaries. In the Knowledge Base, it generates articles from incident history and improves existing content. In Cloud Management, it analyzes cost anomalies and generates optimization recommendations. In workflows, it acts as a dynamic router and decision node.
Token Budgets
Token budgets control AI spending per tenant and per request type. Budgets are configured in AI → Settings → Token Budgets and enforced by the model router. When a budget is approached, the router automatically downgrades to a cheaper model. When a budget is exceeded, requests are queued and processed when the budget resets (hourly, daily, or monthly budgets are supported).
{
"token_budgets": {
"copilot_per_session": 50000,
"incident_triage_per_incident": 2000,
"article_generation_per_article": 10000,
"daily_total_tenant": 1000000,
"monthly_total_tenant": 20000000
}
}
Guardrails
StackFlow enforces AI guardrails to prevent prompt injection, data leakage, and inappropriate responses. All user-provided input is sanitized before inclusion in prompts. System prompts include strict instructions to refuse requests outside the ITSM domain. Bedrock Guardrails are configured to filter harmful content, PII, and sensitive information from model outputs.