RAG Configuration
Bedrock KB Overview
StackFlow's Retrieval-Augmented Generation (RAG) pipeline is built on Amazon Bedrock Knowledge Bases. The Knowledge Base ID is BXJGG7PIPS (named StackFlow-KnowledgeBase). Documents are embedded using Amazon Titan Embeddings v2 at 1024 dimensions and stored in an OpenSearch Serverless collection for sub-100ms semantic search.
- Bedrock KB:
BXJGG7PIPSwith statusACTIVEand at least one completed ingestion job - S3 Bucket:
stackflow-kb-documents-373544523367with at least one document inrunbooks/,policies/, orsla/prefix - OpenSearch: Serverless collection
q3oso7unldm9p4xsqez4inACTIVEstate - Embedding Model:
amazon.titan-embed-text-v2:0enabled in Bedrock for account373544523367 - IAM:
StackFlowBedrockKBRolewiths3:GetObjecton the KB documents bucket
The RAG pipeline operates in two phases: ingestion (documents → embeddings → OpenSearch) and retrieval (query → embedding → vector search → context assembly → LLM generation). Both phases are managed by Bedrock and require no custom infrastructure.
Embedding Configuration
| Setting | Value | Notes |
|---|---|---|
| Embedding Model | Amazon Titan Embeddings v2 | 1024-dimensional vectors |
| Vector Database | OpenSearch Serverless | Managed by Bedrock |
| KB ID | BXJGG7PIPS | us-east-1 region |
| Sync Frequency | Automatic (S3 event-driven) | New articles indexed within 60s |
| Maximum Documents | Unlimited (OpenSearch Serverless) | Storage costs scale with corpus size |
Chunking Strategy
Documents are chunked before embedding to fit within the model's context window. StackFlow uses a hierarchical chunking strategy: first splitting by H2 heading boundaries, then by paragraph, with a maximum chunk size of 1024 tokens and a 128-token overlap between chunks. This preserves semantic coherence while ensuring each chunk can be retrieved independently.
import boto3
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')
# Trigger a sync of the knowledge base data source
response = bedrock_agent.start_ingestion_job(
knowledgeBaseId='BXJGG7PIPS',
dataSourceId='YOUR_DATA_SOURCE_ID',
description='Manual sync after bulk article update'
)
print(f"Ingestion job ID: {response['ingestionJob']['ingestionJobId']}")
Retrieval Settings
The retrieval configuration controls how many chunks are returned per query and the minimum relevance score threshold. StackFlow defaults to retrieving the top 5 chunks with a minimum similarity score of 0.7. These can be tuned per use case — the AI Copilot uses top 10 chunks, while the auto-triage engine uses top 3 for faster response times.
Testing RAG Quality
import boto3
bedrock_rt = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
# Test retrieval quality directly
response = bedrock_rt.retrieve(
knowledgeBaseId='BXJGG7PIPS',
retrievalQuery={'text': 'How do I reset a user password in Cognito?'},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 5,
'overrideSearchType': 'HYBRID'
}
}
)
for result in response['retrievalResults']:
print(f"Score: {result['score']:.3f} | {result['content']['text'][:100]}...")