v2026.1 Open Portal ↗
On this page

RAG Configuration

Bedrock KB Overview

StackFlow's Retrieval-Augmented Generation (RAG) pipeline is built on Amazon Bedrock Knowledge Bases. The Knowledge Base ID is BXJGG7PIPS (named StackFlow-KnowledgeBase). Documents are embedded using Amazon Titan Embeddings v2 at 1024 dimensions and stored in an OpenSearch Serverless collection for sub-100ms semantic search.

⚙️ Minimum Requirements
  • Bedrock KB: BXJGG7PIPS with status ACTIVE and at least one completed ingestion job
  • S3 Bucket: stackflow-kb-documents-373544523367 with at least one document in runbooks/, policies/, or sla/ prefix
  • OpenSearch: Serverless collection q3oso7unldm9p4xsqez4 in ACTIVE state
  • Embedding Model: amazon.titan-embed-text-v2:0 enabled in Bedrock for account 373544523367
  • IAM: StackFlowBedrockKBRole with s3:GetObject on the KB documents bucket

The RAG pipeline operates in two phases: ingestion (documents → embeddings → OpenSearch) and retrieval (query → embedding → vector search → context assembly → LLM generation). Both phases are managed by Bedrock and require no custom infrastructure.

Embedding Configuration

SettingValueNotes
Embedding ModelAmazon Titan Embeddings v21024-dimensional vectors
Vector DatabaseOpenSearch ServerlessManaged by Bedrock
KB IDBXJGG7PIPSus-east-1 region
Sync FrequencyAutomatic (S3 event-driven)New articles indexed within 60s
Maximum DocumentsUnlimited (OpenSearch Serverless)Storage costs scale with corpus size

Chunking Strategy

Documents are chunked before embedding to fit within the model's context window. StackFlow uses a hierarchical chunking strategy: first splitting by H2 heading boundaries, then by paragraph, with a maximum chunk size of 1024 tokens and a 128-token overlap between chunks. This preserves semantic coherence while ensuring each chunk can be retrieved independently.

import boto3

bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')

# Trigger a sync of the knowledge base data source
response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId='BXJGG7PIPS',
    dataSourceId='YOUR_DATA_SOURCE_ID',
    description='Manual sync after bulk article update'
)
print(f"Ingestion job ID: {response['ingestionJob']['ingestionJobId']}")

Retrieval Settings

The retrieval configuration controls how many chunks are returned per query and the minimum relevance score threshold. StackFlow defaults to retrieving the top 5 chunks with a minimum similarity score of 0.7. These can be tuned per use case — the AI Copilot uses top 10 chunks, while the auto-triage engine uses top 3 for faster response times.

Score Threshold: Setting the minimum similarity score too high (above 0.85) may result in no results being returned for novel queries. Setting it too low (below 0.6) may introduce irrelevant context that confuses the LLM. The default 0.7 threshold is calibrated for StackFlow's typical knowledge base content.

Testing RAG Quality

import boto3

bedrock_rt = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

# Test retrieval quality directly
response = bedrock_rt.retrieve(
    knowledgeBaseId='BXJGG7PIPS',
    retrievalQuery={'text': 'How do I reset a user password in Cognito?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID'
        }
    }
)

for result in response['retrievalResults']:
    print(f"Score: {result['score']:.3f} | {result['content']['text'][:100]}...")