Building AI-Powered Search with RAG and Vector Databases

Introduction

Traditional keyword-based search is becoming obsolete as users expect more intelligent, context-aware results. Retrieval-Augmented Generation (RAG) combined with vector databases offers a powerful solution for building AI-powered search systems that understand meaning, not just keywords.

In this guide, we'll build a practical RAG system that can search through documents using semantic similarity and generate contextual responses.

Understanding RAG Architecture

RAG works by combining two key components:

Retrieval: Finding relevant documents using vector similarity search
Generation: Using an LLM to generate responses based on retrieved context

The process involves converting text into embeddings (numerical representations), storing them in a vector database, and then using similarity search to find relevant content before generating responses.

Setting Up the Vector Database

We'll use Pinecone as our vector database, but you can substitute with alternatives like Weaviate or Chroma:

import { PineconeClient } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';

// Initialize Pinecone
const pinecone = new PineconeClient();
await pinecone.init({
 apiKey: process.env.PINECONE_API_KEY!,
 environment: process.env.PINECONE_ENVIRONMENT!,
});

// Get index
const index = pinecone.Index('knowledge-base');

// Initialize embeddings
const embeddings = new OpenAIEmbeddings({
 openAIApiKey: process.env.OPENAI_API_KEY!,
});

Document Processing and Ingestion

Before searching, we need to process and store our documents. Here's how to chunk and embed documents:

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { Document } from 'langchain/document';

class DocumentProcessor {
 private textSplitter: RecursiveCharacterTextSplitter;
 
 constructor() {
 this.textSplitter = new RecursiveCharacterTextSplitter({
 chunkSize: 1000,
 chunkOverlap: 200,
 });
 }

 async processDocuments(documents: string[]): Promise {
 const chunks: Document[] = [];
 
 for (const [index, doc] of documents.entries()) {
 const splits = await this.textSplitter.splitText(doc);
 
 splits.forEach((chunk, chunkIndex) => {
 chunks.push(new Document({
 pageContent: chunk,
 metadata: {
 docId: index,
 chunkId: chunkIndex,
 source: `doc_${index}_chunk_${chunkIndex}`
 }
 }));
 });
 }
 
 return chunks;
 }

 async ingestToVectorStore(documents: Document[]) {
 await PineconeStore.fromDocuments(
 documents,
 embeddings,
 {
 pineconeIndex: index,
 namespace: 'knowledge-base',
 }
 );
 }
}

Implementing the RAG Search Engine

Now let's create the core RAG functionality that retrieves relevant documents and generates responses:

import { ChatOpenAI } from 'langchain/chat_models/openai';
import { PromptTemplate } from 'langchain/prompts';

class RAGSearchEngine {
 private vectorStore: PineconeStore;
 private llm: ChatOpenAI;
 private promptTemplate: PromptTemplate;

 constructor() {
 this.vectorStore = new PineconeStore(embeddings, {
 pineconeIndex: index,
 namespace: 'knowledge-base',
 });
 
 this.llm = new ChatOpenAI({
 openAIApiKey: process.env.OPENAI_API_KEY!,
 modelName: 'gpt-3.5-turbo',
 temperature: 0.2,
 });

 this.promptTemplate = PromptTemplate.fromTemplate(`
 Use the following context to answer the user's question. If you cannot find the answer in the context, say so clearly.
 
 Context:
 {context}
 
 Question: {question}
 
 Answer:
 `);
 }

 async search(query: string, topK: number = 5) {
 // Retrieve relevant documents
 const relevantDocs = await this.vectorStore.similaritySearch(query, topK);
 
 // Combine context from retrieved documents
 const context = relevantDocs
 .map(doc => doc.pageContent)
 .join('\n\n');
 
 // Generate response using LLM
 const prompt = await this.promptTemplate.format({
 context,
 question: query,
 });
 
 const response = await this.llm.predict(prompt);
 
 return {
 answer: response,
 sources: relevantDocs.map(doc => doc.metadata),
 relevanceScores: relevantDocs.map((_, index) => (topK - index) / topK)
 };
 }
}

Adding Advanced Features

Enhance your RAG system with these advanced capabilities:

Hybrid Search

Combine semantic and keyword search for better results:

class HybridSearchEngine extends RAGSearchEngine {
 async hybridSearch(query: string, alpha: number = 0.7) {
 // Semantic search
 const semanticResults = await this.vectorStore.similaritySearch(query, 10);
 
 // Keyword search (simplified - use proper full-text search in production)
 const keywordResults = semanticResults.filter(doc => 
 doc.pageContent.toLowerCase().includes(query.toLowerCase())
 );
 
 // Combine results with weighting
 const hybridResults = this.combineResults(
 semanticResults, 
 keywordResults, 
 alpha
 );
 
 return hybridResults.slice(0, 5);
 }

 private combineResults(semantic: any[], keyword: any[], alpha: number) {
 // Implementation for combining and ranking results
 return semantic; // Simplified
 }
}

Performance Optimization

Implement caching and optimization strategies:

import NodeCache from 'node-cache';

class OptimizedRAGEngine extends RAGSearchEngine {
 private cache: NodeCache;

 constructor() {
 super();
 this.cache = new NodeCache({ stdTTL: 3600 }); // 1 hour cache
 }

 async search(query: string, topK: number = 5) {
 const cacheKey = `search:${query}:${topK}`;
 const cached = this.cache.get(cacheKey);
 
 if (cached) {
 return cached;
 }
 
 const result = await super.search(query, topK);
 this.cache.set(cacheKey, result);
 
 return result;
 }
}

Best Practices and Considerations

Chunk Size: Experiment with different chunk sizes (500-1500 tokens) based on your content
Overlap: Use 10-20% overlap between chunks to maintain context
Metadata: Store rich metadata for better filtering and source attribution
Monitoring: Track query performance and user satisfaction
Cost Management: Cache embeddings and implement rate limiting

Conclusion

RAG with vector databases enables powerful semantic search capabilities that go far beyond traditional keyword matching. By combining retrieval with generation, you can build systems that understand context and provide intelligent, source-backed responses.

Start with a simple implementation and gradually add features like hybrid search, caching, and advanced filtering based on your specific use case requirements.