v2026.1 Open Portal ↗
On this page

AI Engine Overview

AI Architecture

StackFlow's cognitive AI engine is built on Amazon Bedrock, providing managed access to foundation models including Anthropic Claude (default), Amazon Titan, and Llama 3. The model router selects the optimal model for each request based on task type, latency requirements, and cost budget. All model interactions are logged to the AI Observability module for cost tracking and performance analysis.

⚙️ Minimum Requirements
  • DynamoDB: StackFlow_TenantAIConfig with at least one active AI configuration record
  • DynamoDB: StackFlow_AIModelRouter with routing rules for at least one intent type
  • Bedrock: At least one model enabled in account 373544523367 (recommended: anthropic.claude-3-5-sonnet-20241022-v2:0)
  • IAM: StackFlowAPIRole with bedrock:InvokeModel and bedrock-agent-runtime:RetrieveAndGenerate
  • Redis: stackflow-redis-prod accessible for semantic cache; auth token in Secrets Manager stackflow/redis/auth-token

The AI engine integrates with every module in StackFlow — from incident triage to article generation to workflow automation. It uses the Bedrock Knowledge Base (BXJGG7PIPS) for grounded responses and the exemplar learning system for few-shot context injection.

Model Selection

ModelUse CaseLatencyCost
Claude 3 HaikuClassification, simple triage, cache-warmerFast (<1s)Low
Claude 3 SonnetGeneral AI tasks, copilot, article generationMedium (1-3s)Medium
Claude 3 OpusComplex RCA, major incident analysis, code genSlow (3-8s)High
Titan Embeddings v2Vector embedding generationVery fastVery low

AI Use Cases

StackFlow deploys AI across the entire platform. In ITSM, the AI engine classifies incoming incidents, suggests assignments, drafts resolution notes, and generates PIR summaries. In the Knowledge Base, it generates articles from incident history and improves existing content. In Cloud Management, it analyzes cost anomalies and generates optimization recommendations. In workflows, it acts as a dynamic router and decision node.

Token Budgets

Token budgets control AI spending per tenant and per request type. Budgets are configured in AI → Settings → Token Budgets and enforced by the model router. When a budget is approached, the router automatically downgrades to a cheaper model. When a budget is exceeded, requests are queued and processed when the budget resets (hourly, daily, or monthly budgets are supported).

{
  "token_budgets": {
    "copilot_per_session": 50000,
    "incident_triage_per_incident": 2000,
    "article_generation_per_article": 10000,
    "daily_total_tenant": 1000000,
    "monthly_total_tenant": 20000000
  }
}
Semantic Cache: The semantic cache significantly reduces token consumption. When a semantically similar query has been answered recently (within the cache TTL), the cached response is returned without calling Bedrock. Cache hit rates typically reach 40-60% after 24 hours of operation.

Guardrails

StackFlow enforces AI guardrails to prevent prompt injection, data leakage, and inappropriate responses. All user-provided input is sanitized before inclusion in prompts. System prompts include strict instructions to refuse requests outside the ITSM domain. Bedrock Guardrails are configured to filter harmful content, PII, and sensitive information from model outputs.