RAG Configuration

Bedrock KB Overview

StackFlow's Retrieval-Augmented Generation (RAG) pipeline is built on Amazon Bedrock Knowledge Bases. The Knowledge Base ID is BXJGG7PIPS (named StackFlow-KnowledgeBase). Documents are embedded using Amazon Titan Embeddings v2 at 1024 dimensions and stored in an OpenSearch Serverless collection for sub-100ms semantic search.

⚙️ Minimum Requirements

Bedrock KB: BXJGG7PIPS with status ACTIVE and at least one completed ingestion job
S3 Bucket: stackflow-kb-documents-373544523367 with at least one document in runbooks/, policies/, or sla/ prefix
OpenSearch: Serverless collection q3oso7unldm9p4xsqez4 in ACTIVE state
Embedding Model: amazon.titan-embed-text-v2:0 enabled in Bedrock for account 373544523367
IAM: StackFlowBedrockKBRole with s3:GetObject on the KB documents bucket

The RAG pipeline operates in two phases: ingestion (documents → embeddings → OpenSearch) and retrieval (query → embedding → vector search → context assembly → LLM generation). Both phases are managed by Bedrock and require no custom infrastructure.

Embedding Configuration

Setting	Value	Notes
Embedding Model	Amazon Titan Embeddings v2	1024-dimensional vectors
Vector Database	OpenSearch Serverless	Managed by Bedrock
KB ID	`BXJGG7PIPS`	us-east-1 region
Sync Frequency	Automatic (S3 event-driven)	New articles indexed within 60s
Maximum Documents	Unlimited (OpenSearch Serverless)	Storage costs scale with corpus size

Chunking Strategy

Documents are chunked before embedding to fit within the model's context window. StackFlow uses a hierarchical chunking strategy: first splitting by H2 heading boundaries, then by paragraph, with a maximum chunk size of 1024 tokens and a 128-token overlap between chunks. This preserves semantic coherence while ensuring each chunk can be retrieved independently.

import boto3

bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')

# Trigger a sync of the knowledge base data source
response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId='BXJGG7PIPS',
    dataSourceId='YOUR_DATA_SOURCE_ID',
    description='Manual sync after bulk article update'
)
print(f"Ingestion job ID: {response['ingestionJob']['ingestionJobId']}")

Retrieval Settings

The retrieval configuration controls how many chunks are returned per query and the minimum relevance score threshold. StackFlow defaults to retrieving the top 5 chunks with a minimum similarity score of 0.7. These can be tuned per use case — the AI Copilot uses top 10 chunks, while the auto-triage engine uses top 3 for faster response times.

Score Threshold: Setting the minimum similarity score too high (above 0.85) may result in no results being returned for novel queries. Setting it too low (below 0.6) may introduce irrelevant context that confuses the LLM. The default 0.7 threshold is calibrated for StackFlow's typical knowledge base content.

Testing RAG Quality

import boto3

bedrock_rt = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

# Test retrieval quality directly
response = bedrock_rt.retrieve(
    knowledgeBaseId='BXJGG7PIPS',
    retrievalQuery={'text': 'How do I reset a user password in Cognito?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID'
        }
    }
)

for result in response['retrievalResults']:
    print(f"Score: {result['score']:.3f} | {result['content']['text'][:100]}...")

← Previous

Knowledge Studio

AI-assisted article authoring

AI Engine Overview

Cognitive AI platform architecture