AI Endpoints
Overview
The StackFlow AI endpoints expose the Bedrock-powered cognitive engine via REST. These endpoints are powered by Amazon Bedrock and the StackFlow Knowledge Base (BXJGG7PIPS). All AI responses include model_used, tokens_used, and latency_ms for observability. Responses are semantically cached in Redis to reduce cost — cache hits return in under 50ms.
POST /api/ai/ask
Ask a natural language question against the Knowledge Base using RAG. Returns a cited answer grounded in your KB articles.
curl -X POST https://your-instance.stackflow-tech.com/prod/api/ai/ask -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
"question": "What is the procedure for a P1 major incident?",
"context_record": {"type": "incident", "id": "INC0001234"},
"max_kb_results": 5,
"model": "claude-3-sonnet"
}'
{
"answer": "For a P1 major incident, follow these steps:\n1. Declare a major incident in StackFlow...\n2. Assemble the war room via Slack #incident-p1 channel...",
"citations": [
{"article_id": "KB0000456", "title": "Major Incident Procedure", "relevance": 0.97}
],
"model_used": "anthropic.claude-3-sonnet-20240229-v1:0",
"cache_hit": false,
"tokens_used": 1240,
"latency_ms": 1850
}
POST /api/ai/analyze
Analyze a specific ITSM record and return AI insights: suggested root cause, resolution steps, related records, and risk assessment.
curl -X POST https://your-instance.stackflow-tech.com/prod/api/ai/analyze -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
"record_type": "incident",
"record_id": "INC0001234",
"analysis_type": "root_cause"
}'
{
"record_id": "INC0001234",
"analysis_type": "root_cause",
"findings": {
"probable_root_cause": "Aurora connection pool exhaustion due to missing MAX_POOL_SIZE env var",
"confidence": 0.87,
"resolution_steps": [
"Set MAX_POOL_SIZE=10 in StackFlowAPI Lambda environment variables",
"Deploy Lambda update and monitor connection count for 15 minutes"
],
"related_problem": "PRB0000123",
"similar_past_incidents": ["INC0001180", "INC0001195"],
"estimated_resolution_minutes": 30
},
"model_used": "anthropic.claude-3-sonnet-20240229-v1:0",
"tokens_used": 2140,
"latency_ms": 2300
}
POST /api/ai/exemplars
Retrieve few-shot exemplars (past resolved cases similar to the current record) for agent training and AI context injection.
curl -X POST https://your-instance.stackflow-tech.com/prod/api/ai/exemplars -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
"record_type": "incident",
"description": "Aurora database connection timeouts under load",
"category": "database",
"top_k": 5
}'
{
"exemplars": [
{
"incident_id": "INC0001180",
"description": "Aurora connection pool exhausted after Lambda concurrency spike",
"resolution": "Increased MAX_POOL_SIZE, deployed Lambda update",
"resolution_time_min": 25,
"similarity_score": 0.93
}
],
"count": 3
}
POST /api/ai/suggest
Get AI suggestions for incident categorization, assignment, and priority based on the short description.
curl -X POST https://your-instance.stackflow-tech.com/prod/api/ai/suggest -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
"text": "Redis cache hit rate dropped to 5% - all app requests slow",
"suggest_for": ["category", "priority", "assignment_group"]
}'
{
"suggestions": {
"category": {"value": "cache", "confidence": 0.95},
"subcategory": {"value": "elasticache", "confidence": 0.91},
"priority": {"value": "P1", "confidence": 0.88, "reasoning": "Cache miss affects all users"},
"assignment_group": {"value": "Platform Engineering", "confidence": 0.92}
}
}
Model Selection
| Model Alias | Bedrock Model ID | Best For | Relative Cost |
|---|---|---|---|
claude-3-haiku | anthropic.claude-3-haiku-20240307-v1:0 | Simple Q&A, classification | Low |
claude-3-sonnet | anthropic.claude-3-sonnet-20240229-v1:0 | Balanced — default for most tasks | Medium |
claude-3-opus | anthropic.claude-3-opus-20240229-v1:0 | Complex analysis, long documents | High |
Rate Limits & Costs
| Endpoint | Rate Limit | Cache TTL |
|---|---|---|
/api/ai/ask | 60 req/min per user | 1 hour (semantic) |
/api/ai/analyze | 30 req/min per user | 30 min (record-keyed) |
/api/ai/exemplars | 120 req/min per user | 15 min |
/api/ai/suggest | 120 req/min per user | 5 min |
Field Reference
| Field | Type | Description |
|---|---|---|
model | string | Optional: claude-3-haiku, claude-3-sonnet (default), claude-3-opus |
max_kb_results | integer | /ask only: number of KB chunks to retrieve (default 5, max 20) |
context_record | object | Optional: {type, id} of related ITSM record for context enrichment |
tokens_used | integer | Response: total Bedrock tokens consumed |
cache_hit | boolean | Response: true if served from semantic cache |
latency_ms | integer | Response: total API latency in milliseconds |