AI Engine Overview

AI Architecture

StackFlow's cognitive AI engine is built on Amazon Bedrock, providing managed access to foundation models including Anthropic Claude (default), Amazon Titan, and Llama 3. The model router selects the optimal model for each request based on task type, latency requirements, and cost budget. All model interactions are logged to the AI Observability module for cost tracking and performance analysis.

⚙️ Minimum Requirements

DynamoDB: StackFlow_TenantAIConfig with at least one active AI configuration record
DynamoDB: StackFlow_AIModelRouter with routing rules for at least one intent type
Bedrock: At least one model enabled in account 373544523367 (recommended: anthropic.claude-3-5-sonnet-20241022-v2:0)
IAM: StackFlowAPIRole with bedrock:InvokeModel and bedrock-agent-runtime:RetrieveAndGenerate
Redis: stackflow-redis-prod accessible for semantic cache; auth token in Secrets Manager stackflow/redis/auth-token

The AI engine integrates with every module in StackFlow — from incident triage to article generation to workflow automation. It uses the Bedrock Knowledge Base (BXJGG7PIPS) for grounded responses and the exemplar learning system for few-shot context injection.

Model Selection

Model	Use Case	Latency	Cost
Claude 3 Haiku	Classification, simple triage, cache-warmer	Fast (<1s)	Low
Claude 3 Sonnet	General AI tasks, copilot, article generation	Medium (1-3s)	Medium
Claude 3 Opus	Complex RCA, major incident analysis, code gen	Slow (3-8s)	High
Titan Embeddings v2	Vector embedding generation	Very fast	Very low

AI Use Cases

StackFlow deploys AI across the entire platform. In ITSM, the AI engine classifies incoming incidents, suggests assignments, drafts resolution notes, and generates PIR summaries. In the Knowledge Base, it generates articles from incident history and improves existing content. In Cloud Management, it analyzes cost anomalies and generates optimization recommendations. In workflows, it acts as a dynamic router and decision node.

Token Budgets

Token budgets control AI spending per tenant and per request type. Budgets are configured in AI → Settings → Token Budgets and enforced by the model router. When a budget is approached, the router automatically downgrades to a cheaper model. When a budget is exceeded, requests are queued and processed when the budget resets (hourly, daily, or monthly budgets are supported).

{
  "token_budgets": {
    "copilot_per_session": 50000,
    "incident_triage_per_incident": 2000,
    "article_generation_per_article": 10000,
    "daily_total_tenant": 1000000,
    "monthly_total_tenant": 20000000
  }
}

Semantic Cache: The semantic cache significantly reduces token consumption. When a semantically similar query has been answered recently (within the cache TTL), the cached response is returned without calling Bedrock. Cache hit rates typically reach 40-60% after 24 hours of operation.

Guardrails

StackFlow enforces AI guardrails to prevent prompt injection, data leakage, and inappropriate responses. All user-provided input is sanitized before inclusion in prompts. System prompts include strict instructions to refuse requests outside the ITSM domain. Bedrock Guardrails are configured to filter harmful content, PII, and sensitive information from model outputs.

← Previous

RAG Configuration

Bedrock KB and vector search setup

AI Provider Configuration

Model providers and API keys