AI Endpoints

Overview

The StackFlow AI endpoints expose the Bedrock-powered cognitive engine via REST. These endpoints are powered by Amazon Bedrock and the StackFlow Knowledge Base (BXJGG7PIPS). All AI responses include model_used, tokens_used, and latency_ms for observability. Responses are semantically cached in Redis to reduce cost — cache hits return in under 50ms.

Semantic Cache: AI endpoint responses are cached in ElastiCache Redis using vector similarity. A query within 0.95 cosine similarity of a cached query returns the cached response. Cache TTL is 1 hour for /ask and 30 minutes for /analyze.

POST /api/ai/ask

Ask a natural language question against the Knowledge Base using RAG. Returns a cited answer grounded in your KB articles.

curl -X POST   https://your-instance.stackflow-tech.com/prod/api/ai/ask   -H "Authorization: Bearer $TOKEN"   -H "Content-Type: application/json"   -d '{
    "question": "What is the procedure for a P1 major incident?",
    "context_record": {"type": "incident", "id": "INC0001234"},
    "max_kb_results": 5,
    "model": "claude-3-sonnet"
  }'

{
  "answer": "For a P1 major incident, follow these steps:\n1. Declare a major incident in StackFlow...\n2. Assemble the war room via Slack #incident-p1 channel...",
  "citations": [
    {"article_id": "KB0000456", "title": "Major Incident Procedure", "relevance": 0.97}
  ],
  "model_used": "anthropic.claude-3-sonnet-20240229-v1:0",
  "cache_hit": false,
  "tokens_used": 1240,
  "latency_ms": 1850
}

POST /api/ai/analyze

Analyze a specific ITSM record and return AI insights: suggested root cause, resolution steps, related records, and risk assessment.

curl -X POST   https://your-instance.stackflow-tech.com/prod/api/ai/analyze   -H "Authorization: Bearer $TOKEN"   -H "Content-Type: application/json"   -d '{
    "record_type": "incident",
    "record_id": "INC0001234",
    "analysis_type": "root_cause"
  }'

{
  "record_id": "INC0001234",
  "analysis_type": "root_cause",
  "findings": {
    "probable_root_cause": "Aurora connection pool exhaustion due to missing MAX_POOL_SIZE env var",
    "confidence": 0.87,
    "resolution_steps": [
      "Set MAX_POOL_SIZE=10 in StackFlowAPI Lambda environment variables",
      "Deploy Lambda update and monitor connection count for 15 minutes"
    ],
    "related_problem": "PRB0000123",
    "similar_past_incidents": ["INC0001180", "INC0001195"],
    "estimated_resolution_minutes": 30
  },
  "model_used": "anthropic.claude-3-sonnet-20240229-v1:0",
  "tokens_used": 2140,
  "latency_ms": 2300
}

POST /api/ai/exemplars

Retrieve few-shot exemplars (past resolved cases similar to the current record) for agent training and AI context injection.

curl -X POST   https://your-instance.stackflow-tech.com/prod/api/ai/exemplars   -H "Authorization: Bearer $TOKEN"   -H "Content-Type: application/json"   -d '{
    "record_type": "incident",
    "description": "Aurora database connection timeouts under load",
    "category": "database",
    "top_k": 5
  }'

{
  "exemplars": [
    {
      "incident_id": "INC0001180",
      "description": "Aurora connection pool exhausted after Lambda concurrency spike",
      "resolution": "Increased MAX_POOL_SIZE, deployed Lambda update",
      "resolution_time_min": 25,
      "similarity_score": 0.93
    }
  ],
  "count": 3
}

POST /api/ai/suggest

Get AI suggestions for incident categorization, assignment, and priority based on the short description.

curl -X POST   https://your-instance.stackflow-tech.com/prod/api/ai/suggest   -H "Authorization: Bearer $TOKEN"   -H "Content-Type: application/json"   -d '{
    "text": "Redis cache hit rate dropped to 5% - all app requests slow",
    "suggest_for": ["category", "priority", "assignment_group"]
  }'

{
  "suggestions": {
    "category": {"value": "cache", "confidence": 0.95},
    "subcategory": {"value": "elasticache", "confidence": 0.91},
    "priority": {"value": "P1", "confidence": 0.88, "reasoning": "Cache miss affects all users"},
    "assignment_group": {"value": "Platform Engineering", "confidence": 0.92}
  }
}

Model Selection

Model Alias	Bedrock Model ID	Best For	Relative Cost
`claude-3-haiku`	anthropic.claude-3-haiku-20240307-v1:0	Simple Q&A, classification	Low
`claude-3-sonnet`	anthropic.claude-3-sonnet-20240229-v1:0	Balanced — default for most tasks	Medium
`claude-3-opus`	anthropic.claude-3-opus-20240229-v1:0	Complex analysis, long documents	High

Rate Limits & Costs

Endpoint	Rate Limit	Cache TTL
`/api/ai/ask`	60 req/min per user	1 hour (semantic)
`/api/ai/analyze`	30 req/min per user	30 min (record-keyed)
`/api/ai/exemplars`	120 req/min per user	15 min
`/api/ai/suggest`	120 req/min per user	5 min

Field Reference

Field	Type	Description
`model`	string	Optional: claude-3-haiku, claude-3-sonnet (default), claude-3-opus
`max_kb_results`	integer	/ask only: number of KB chunks to retrieve (default 5, max 20)
`context_record`	object	Optional: {type, id} of related ITSM record for context enrichment
`tokens_used`	integer	Response: total Bedrock tokens consumed
`cache_hit`	boolean	Response: true if served from semantic cache
`latency_ms`	integer	Response: total API latency in milliseconds

Inbound and outbound webhooks