Building AI-Powered Search with RAG and Vector Databases
Introduction
Traditional keyword-based search is becoming obsolete as users expect more intelligent, context-aware results. Retrieval-Augmented Generation (RAG) combined with vector databases offers a powerful solution for building AI-powered search systems that understand meaning, not just keywords.
In this guide, we'll build a practical RAG system that can search through documents using semantic similarity and generate contextual responses.
Understanding RAG Architecture
RAG works by combining two key components:
- Retrieval: Finding relevant documents using vector similarity search
- Generation: Using an LLM to generate responses based on retrieved context
The process involves converting text into embeddings (numerical representations), storing them in a vector database, and then using similarity search to find relevant content before generating responses.
Setting Up the Vector Database
We'll use Pinecone as our vector database, but you can substitute with alternatives like Weaviate or Chroma:
import { PineconeClient } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
// Initialize Pinecone
const pinecone = new PineconeClient();
await pinecone.init({
apiKey: process.env.PINECONE_API_KEY!,
environment: process.env.PINECONE_ENVIRONMENT!,
});
// Get index
const index = pinecone.Index('knowledge-base');
// Initialize embeddings
const embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY!,
});Document Processing and Ingestion
Before searching, we need to process and store our documents. Here's how to chunk and embed documents:
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { Document } from 'langchain/document';
class DocumentProcessor {
private textSplitter: RecursiveCharacterTextSplitter;
constructor() {
this.textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
}
async processDocuments(documents: string[]): Promise {
const chunks: Document[] = [];
for (const [index, doc] of documents.entries()) {
const splits = await this.textSplitter.splitText(doc);
splits.forEach((chunk, chunkIndex) => {
chunks.push(new Document({
pageContent: chunk,
metadata: {
docId: index,
chunkId: chunkIndex,
source: `doc_${index}_chunk_${chunkIndex}`
}
}));
});
}
return chunks;
}
async ingestToVectorStore(documents: Document[]) {
await PineconeStore.fromDocuments(
documents,
embeddings,
{
pineconeIndex: index,
namespace: 'knowledge-base',
}
);
}
} Implementing the RAG Search Engine
Now let's create the core RAG functionality that retrieves relevant documents and generates responses:
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { PromptTemplate } from 'langchain/prompts';
class RAGSearchEngine {
private vectorStore: PineconeStore;
private llm: ChatOpenAI;
private promptTemplate: PromptTemplate;
constructor() {
this.vectorStore = new PineconeStore(embeddings, {
pineconeIndex: index,
namespace: 'knowledge-base',
});
this.llm = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY!,
modelName: 'gpt-3.5-turbo',
temperature: 0.2,
});
this.promptTemplate = PromptTemplate.fromTemplate(`
Use the following context to answer the user's question. If you cannot find the answer in the context, say so clearly.
Context:
{context}
Question: {question}
Answer:
`);
}
async search(query: string, topK: number = 5) {
// Retrieve relevant documents
const relevantDocs = await this.vectorStore.similaritySearch(query, topK);
// Combine context from retrieved documents
const context = relevantDocs
.map(doc => doc.pageContent)
.join('\n\n');
// Generate response using LLM
const prompt = await this.promptTemplate.format({
context,
question: query,
});
const response = await this.llm.predict(prompt);
return {
answer: response,
sources: relevantDocs.map(doc => doc.metadata),
relevanceScores: relevantDocs.map((_, index) => (topK - index) / topK)
};
}
}Adding Advanced Features
Enhance your RAG system with these advanced capabilities:
Hybrid Search
Combine semantic and keyword search for better results:
class HybridSearchEngine extends RAGSearchEngine {
async hybridSearch(query: string, alpha: number = 0.7) {
// Semantic search
const semanticResults = await this.vectorStore.similaritySearch(query, 10);
// Keyword search (simplified - use proper full-text search in production)
const keywordResults = semanticResults.filter(doc =>
doc.pageContent.toLowerCase().includes(query.toLowerCase())
);
// Combine results with weighting
const hybridResults = this.combineResults(
semanticResults,
keywordResults,
alpha
);
return hybridResults.slice(0, 5);
}
private combineResults(semantic: any[], keyword: any[], alpha: number) {
// Implementation for combining and ranking results
return semantic; // Simplified
}
}Performance Optimization
Implement caching and optimization strategies:
import NodeCache from 'node-cache';
class OptimizedRAGEngine extends RAGSearchEngine {
private cache: NodeCache;
constructor() {
super();
this.cache = new NodeCache({ stdTTL: 3600 }); // 1 hour cache
}
async search(query: string, topK: number = 5) {
const cacheKey = `search:${query}:${topK}`;
const cached = this.cache.get(cacheKey);
if (cached) {
return cached;
}
const result = await super.search(query, topK);
this.cache.set(cacheKey, result);
return result;
}
}Best Practices and Considerations
- Chunk Size: Experiment with different chunk sizes (500-1500 tokens) based on your content
- Overlap: Use 10-20% overlap between chunks to maintain context
- Metadata: Store rich metadata for better filtering and source attribution
- Monitoring: Track query performance and user satisfaction
- Cost Management: Cache embeddings and implement rate limiting
Conclusion
RAG with vector databases enables powerful semantic search capabilities that go far beyond traditional keyword matching. By combining retrieval with generation, you can build systems that understand context and provide intelligent, source-backed responses.
Start with a simple implementation and gradually add features like hybrid search, caching, and advanced filtering based on your specific use case requirements.
Related Posts
Building Your First RAG System: Combining LLMs with Custom Knowledge
Learn to build a Retrieval-Augmented Generation system that combines OpenAI's GPT with your own documents for accurate, context-aware responses.
Building a RAG-Powered Documentation Assistant with Vector Search
Learn to build an intelligent documentation assistant using RAG architecture, vector embeddings, and semantic search for better developer experience.
Building AI-Powered Chat Interfaces with RAG and Vector Search
Learn to build intelligent chat applications using Retrieval-Augmented Generation and vector databases for contextual responses.