Building AI-Powered Search with RAG and Vector Databases
Introduction
Traditional keyword-based search is rapidly being replaced by AI-powered semantic search that understands context and meaning. Retrieval-Augmented Generation (RAG) combined with vector databases enables applications to provide highly relevant, contextual responses by searching through vast amounts of data intelligently.
In this guide, we'll build a practical RAG system that can search through documentation and provide AI-generated answers with proper source attribution.
Understanding RAG Architecture
RAG works by combining two key components:
- Retrieval: Finding relevant documents using vector similarity search
- Generation: Using an LLM to generate answers based on retrieved context
The process flows like this:
- Convert documents into vector embeddings and store them
- Convert user queries into embeddings
- Find similar document chunks using vector search
- Pass relevant chunks to an LLM for answer generation
Setting Up the Tech Stack
We'll use:
- Pinecone: Vector database for storing embeddings
- OpenAI: For embeddings and text generation
- LangChain: To orchestrate the RAG pipeline
- Node.js: Backend implementation
First, install the required packages:
npm install @pinecone-database/pinecone openai langchain @langchain/openai @langchain/pinecone dotenvDocument Processing and Embedding
Start by processing documents and converting them to embeddings:
import { PineconeStore } from '@langchain/pinecone';
import { OpenAIEmbeddings } from '@langchain/openai';
import { Pinecone } from '@pinecone-database/pinecone';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import fs from 'fs';
class DocumentProcessor {
constructor() {
this.pinecone = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
this.embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
});
}
async processDocuments(filePath, indexName) {
// Read and split documents
const text = fs.readFileSync(filePath, 'utf8');
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await splitter.createDocuments([text]);
// Create Pinecone index
const index = this.pinecone.Index(indexName);
// Store in vector database
await PineconeStore.fromDocuments(docs, this.embeddings, {
pineconeIndex: index,
maxConcurrency: 5,
});
console.log(`Processed ${docs.length} document chunks`);
}
}Building the RAG Query System
Now implement the retrieval and generation logic:
import { ChatOpenAI } from '@langchain/openai';
import { RetrievalQAChain } from 'langchain/chains';
import { PromptTemplate } from '@langchain/core/prompts';
class RAGSearchEngine {
constructor(indexName) {
this.llm = new ChatOpenAI({
modelName: 'gpt-3.5-turbo',
temperature: 0.2,
openAIApiKey: process.env.OPENAI_API_KEY,
});
this.embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
});
this.vectorStore = new PineconeStore(this.embeddings, {
pineconeIndex: new Pinecone().Index(indexName),
});
this.setupChain();
}
setupChain() {
const prompt = PromptTemplate.fromTemplate(`
Use the following context to answer the question. If you cannot find the answer in the context, say "I don't have enough information to answer that question."
Context: {context}
Question: {question}
Answer: Let me help you with that based on the available information.
`);
this.chain = RetrievalQAChain.fromLLM(
this.llm,
this.vectorStore.asRetriever({
k: 4, // Retrieve top 4 most similar chunks
searchType: 'similarity',
}),
{
prompt,
returnSourceDocuments: true,
}
);
}
async search(query) {
try {
const result = await this.chain.call({ query });
return {
answer: result.text,
sources: result.sourceDocuments.map(doc => ({
content: doc.pageContent,
metadata: doc.metadata,
})),
};
} catch (error) {
console.error('Search error:', error);
throw new Error('Failed to process search query');
}
}
}Creating the API Endpoint
Build an Express.js API to serve the RAG functionality:
import express from 'express';
import cors from 'cors';
import { DocumentProcessor } from './document-processor.js';
import { RAGSearchEngine } from './rag-engine.js';
const app = express();
const searchEngine = new RAGSearchEngine('documentation-index');
app.use(cors());
app.use(express.json());
app.post('/api/search', async (req, res) => {
try {
const { query } = req.body;
if (!query) {
return res.status(400).json({ error: 'Query is required' });
}
const result = await searchEngine.search(query);
res.json({
success: true,
data: {
answer: result.answer,
sources: result.sources,
timestamp: new Date().toISOString(),
},
});
} catch (error) {
res.status(500).json({
success: false,
error: error.message,
});
}
});
app.listen(3000, () => {
console.log('RAG API server running on port 3000');
});Optimization Strategies
To improve your RAG system's performance:
- Chunk Size Optimization: Experiment with different chunk sizes (500-1500 tokens) based on your content
- Embedding Models: Consider using specialized embedding models for domain-specific content
- Hybrid Search: Combine vector search with traditional keyword search for better coverage
- Response Caching: Cache frequent queries to reduce API costs and improve response times
- Metadata Filtering: Add metadata filters to narrow search scope by document type, date, or category
Monitoring and Evaluation
Implement basic evaluation metrics:
class RAGEvaluator {
async evaluateResponse(query, answer, sources) {
return {
relevanceScore: this.calculateRelevance(query, sources),
sourceCount: sources.length,
responseLength: answer.length,
hasValidSources: sources.length > 0,
};
}
calculateRelevance(query, sources) {
// Simple keyword overlap metric
const queryWords = query.toLowerCase().split(' ');
const sourceText = sources.map(s => s.content).join(' ').toLowerCase();
const overlap = queryWords.filter(word =>
sourceText.includes(word)
).length;
return overlap / queryWords.length;
}
}Conclusion
RAG with vector databases transforms how applications handle search and information retrieval. This implementation provides a solid foundation that you can extend with features like multi-modal search, real-time document updates, and advanced filtering capabilities.
The key to successful RAG systems lies in proper document chunking, relevant context retrieval, and continuous optimization based on user feedback and evaluation metrics.
Related Posts
Building AI-Powered Web Applications with ChatGPT API Integration
Learn how to seamlessly integrate ChatGPT API into your web applications with practical examples and best practices.
Building Smart AI Agents with LangChain and Node.js: A Practical Guide
Learn how to create intelligent AI agents that can reason, use tools, and maintain context using LangChain and Node.js.
Building Intelligent Web Applications with Retrieval-Augmented Generation (RAG)
Learn how to implement RAG systems to create AI-powered web applications that provide accurate, context-aware responses using your own data.