Building AI-Powered Search with RAG and Vector Databases

Introduction

Traditional keyword-based search is rapidly being replaced by AI-powered semantic search that understands context and meaning. Retrieval-Augmented Generation (RAG) combined with vector databases enables applications to provide highly relevant, contextual responses by searching through vast amounts of data intelligently.

In this guide, we'll build a practical RAG system that can search through documentation and provide AI-generated answers with proper source attribution.

Understanding RAG Architecture

RAG works by combining two key components:

Retrieval: Finding relevant documents using vector similarity search
Generation: Using an LLM to generate answers based on retrieved context

The process flows like this:

Convert documents into vector embeddings and store them
Convert user queries into embeddings
Find similar document chunks using vector search
Pass relevant chunks to an LLM for answer generation

Setting Up the Tech Stack

We'll use:

Pinecone: Vector database for storing embeddings
OpenAI: For embeddings and text generation
LangChain: To orchestrate the RAG pipeline
Node.js: Backend implementation

First, install the required packages:

npm install @pinecone-database/pinecone openai langchain @langchain/openai @langchain/pinecone dotenv

Document Processing and Embedding

Start by processing documents and converting them to embeddings:

import { PineconeStore } from '@langchain/pinecone';
import { OpenAIEmbeddings } from '@langchain/openai';
import { Pinecone } from '@pinecone-database/pinecone';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import fs from 'fs';

class DocumentProcessor {
 constructor() {
 this.pinecone = new Pinecone({
 apiKey: process.env.PINECONE_API_KEY,
 });
 this.embeddings = new OpenAIEmbeddings({
 openAIApiKey: process.env.OPENAI_API_KEY,
 });
 }

 async processDocuments(filePath, indexName) {
 // Read and split documents
 const text = fs.readFileSync(filePath, 'utf8');
 const splitter = new RecursiveCharacterTextSplitter({
 chunkSize: 1000,
 chunkOverlap: 200,
 });
 
 const docs = await splitter.createDocuments([text]);
 
 // Create Pinecone index
 const index = this.pinecone.Index(indexName);
 
 // Store in vector database
 await PineconeStore.fromDocuments(docs, this.embeddings, {
 pineconeIndex: index,
 maxConcurrency: 5,
 });
 
 console.log(`Processed ${docs.length} document chunks`);
 }
}

Building the RAG Query System

Now implement the retrieval and generation logic:

import { ChatOpenAI } from '@langchain/openai';
import { RetrievalQAChain } from 'langchain/chains';
import { PromptTemplate } from '@langchain/core/prompts';

class RAGSearchEngine {
 constructor(indexName) {
 this.llm = new ChatOpenAI({
 modelName: 'gpt-3.5-turbo',
 temperature: 0.2,
 openAIApiKey: process.env.OPENAI_API_KEY,
 });
 
 this.embeddings = new OpenAIEmbeddings({
 openAIApiKey: process.env.OPENAI_API_KEY,
 });
 
 this.vectorStore = new PineconeStore(this.embeddings, {
 pineconeIndex: new Pinecone().Index(indexName),
 });
 
 this.setupChain();
 }
 
 setupChain() {
 const prompt = PromptTemplate.fromTemplate(`
 Use the following context to answer the question. If you cannot find the answer in the context, say "I don't have enough information to answer that question."
 
 Context: {context}
 
 Question: {question}
 
 Answer: Let me help you with that based on the available information.
 `);
 
 this.chain = RetrievalQAChain.fromLLM(
 this.llm,
 this.vectorStore.asRetriever({
 k: 4, // Retrieve top 4 most similar chunks
 searchType: 'similarity',
 }),
 {
 prompt,
 returnSourceDocuments: true,
 }
 );
 }
 
 async search(query) {
 try {
 const result = await this.chain.call({ query });
 
 return {
 answer: result.text,
 sources: result.sourceDocuments.map(doc => ({
 content: doc.pageContent,
 metadata: doc.metadata,
 })),
 };
 } catch (error) {
 console.error('Search error:', error);
 throw new Error('Failed to process search query');
 }
 }
}

Creating the API Endpoint

Build an Express.js API to serve the RAG functionality:

import express from 'express';
import cors from 'cors';
import { DocumentProcessor } from './document-processor.js';
import { RAGSearchEngine } from './rag-engine.js';

const app = express();
const searchEngine = new RAGSearchEngine('documentation-index');

app.use(cors());
app.use(express.json());

app.post('/api/search', async (req, res) => {
 try {
 const { query } = req.body;
 
 if (!query) {
 return res.status(400).json({ error: 'Query is required' });
 }
 
 const result = await searchEngine.search(query);
 
 res.json({
 success: true,
 data: {
 answer: result.answer,
 sources: result.sources,
 timestamp: new Date().toISOString(),
 },
 });
 } catch (error) {
 res.status(500).json({
 success: false,
 error: error.message,
 });
 }
});

app.listen(3000, () => {
 console.log('RAG API server running on port 3000');
});

Optimization Strategies

To improve your RAG system's performance:

Chunk Size Optimization: Experiment with different chunk sizes (500-1500 tokens) based on your content
Embedding Models: Consider using specialized embedding models for domain-specific content
Hybrid Search: Combine vector search with traditional keyword search for better coverage
Response Caching: Cache frequent queries to reduce API costs and improve response times
Metadata Filtering: Add metadata filters to narrow search scope by document type, date, or category

Monitoring and Evaluation

Implement basic evaluation metrics:

class RAGEvaluator {
 async evaluateResponse(query, answer, sources) {
 return {
 relevanceScore: this.calculateRelevance(query, sources),
 sourceCount: sources.length,
 responseLength: answer.length,
 hasValidSources: sources.length > 0,
 };
 }
 
 calculateRelevance(query, sources) {
 // Simple keyword overlap metric
 const queryWords = query.toLowerCase().split(' ');
 const sourceText = sources.map(s => s.content).join(' ').toLowerCase();
 
 const overlap = queryWords.filter(word => 
 sourceText.includes(word)
 ).length;
 
 return overlap / queryWords.length;
 }
}

Conclusion

RAG with vector databases transforms how applications handle search and information retrieval. This implementation provides a solid foundation that you can extend with features like multi-modal search, real-time document updates, and advanced filtering capabilities.

The key to successful RAG systems lies in proper document chunking, relevant context retrieval, and continuous optimization based on user feedback and evaluation metrics.