Building AI-Powered Search with RAG and Vector Databases: A Practical Guide
Introduction
Retrieval-Augmented Generation (RAG) has revolutionized how we build intelligent search systems. By combining the power of large language models with custom knowledge bases through vector databases, we can create applications that provide contextually relevant answers from our own data. In this guide, we'll build a practical RAG system that you can implement in your web applications.
Understanding RAG Architecture
RAG works by first retrieving relevant documents from a vector database based on semantic similarity, then using those documents as context for an LLM to generate accurate responses. This approach solves the limitation of LLMs having outdated training data and enables them to work with your specific domain knowledge.
The typical RAG workflow consists of:
- Document ingestion and chunking
- Embedding generation and storage
- Query processing and similarity search
- Context augmentation and LLM generation
Setting Up the Vector Database
We'll use Pinecone as our vector database for this implementation. First, let's set up the environment and dependencies:
npm install @pinecone-database/pinecone openai langchain tiktokenCreate a Pinecone client and initialize the index:
import { PineconeClient } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
const pinecone = new PineconeClient();
const initializePinecone = async () => {
await pinecone.init({
environment: process.env.PINECONE_ENVIRONMENT!,
apiKey: process.env.PINECONE_API_KEY!,
});
const indexName = 'rag-search-index';
const index = pinecone.Index(indexName);
return index;
};Document Processing and Embedding
The quality of your RAG system heavily depends on how you process and chunk your documents. Here's a robust approach for handling various document types:
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { Document } from 'langchain/document';
class DocumentProcessor {
private textSplitter: RecursiveCharacterTextSplitter;
private embeddings: OpenAIEmbeddings;
constructor() {
this.textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
this.embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
});
}
async processDocument(content: string, metadata: any) {
const docs = await this.textSplitter.createDocuments(
[content],
[metadata]
);
const vectors = [];
for (let i = 0; i < docs.length; i++) {
const embedding = await this.embeddings.embedQuery(docs[i].pageContent);
vectors.push({
id: `${metadata.id}-chunk-${i}`,
values: embedding,
metadata: {
...docs[i].metadata,
text: docs[i].pageContent,
chunk: i
}
});
}
return vectors;
}
}Implementing Semantic Search
Now let's create the search functionality that retrieves relevant context based on user queries:
class RAGSearchEngine {
private index: any;
private embeddings: OpenAIEmbeddings;
private processor: DocumentProcessor;
constructor(index: any) {
this.index = index;
this.embeddings = new OpenAIEmbeddings();
this.processor = new DocumentProcessor();
}
async addDocuments(documents: { content: string; metadata: any }[]) {
for (const doc of documents) {
const vectors = await this.processor.processDocument(
doc.content,
doc.metadata
);
await this.index.upsert({
upsertRequest: {
vectors
}
});
}
}
async search(query: string, topK: number = 5) {
const queryEmbedding = await this.embeddings.embedQuery(query);
const searchResponse = await this.index.query({
queryRequest: {
vector: queryEmbedding,
topK,
includeMetadata: true,
includeValues: false
}
});
return searchResponse.matches?.map(match => ({
content: match.metadata?.text,
score: match.score,
metadata: match.metadata
})) || [];
}
}Integrating with OpenAI for Response Generation
The final step is combining our retrieved context with OpenAI's API to generate intelligent responses:
import { Configuration, OpenAIApi } from 'openai';
class RAGResponseGenerator {
private openai: OpenAIApi;
private searchEngine: RAGSearchEngine;
constructor(searchEngine: RAGSearchEngine) {
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
});
this.openai = new OpenAIApi(configuration);
this.searchEngine = searchEngine;
}
async generateResponse(query: string) {
// Retrieve relevant context
const searchResults = await this.searchEngine.search(query, 3);
// Prepare context from search results
const context = searchResults
.map(result => result.content)
.join('\n\n');
const prompt = `
Context information is below.
---------------------
${context}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: ${query}
Answer:`;
const response = await this.openai.createCompletion({
model: 'gpt-3.5-turbo-instruct',
prompt,
max_tokens: 500,
temperature: 0.1
});
return {
answer: response.data.choices[0].text?.trim(),
sources: searchResults.map(r => r.metadata),
confidence: Math.min(...searchResults.map(r => r.score))
};
}
}Best Practices and Optimization
When implementing RAG systems in production, consider these optimization strategies:
- Chunk Size Optimization: Experiment with different chunk sizes based on your content type. Technical documentation might need smaller chunks (500-800 tokens), while narrative content works better with larger chunks (1000-1500 tokens).
- Metadata Enrichment: Include relevant metadata like document type, creation date, and topic tags to improve retrieval accuracy.
- Hybrid Search: Combine vector search with traditional keyword search for better results across different query types.
- Query Preprocessing: Implement query expansion and reformulation to handle ambiguous or poorly formed user queries.
- Response Caching: Cache frequently asked questions and their responses to reduce API costs and improve response times.
Monitoring and Evaluation
To ensure your RAG system performs well, implement these monitoring practices:
// Simple evaluation metrics
class RAGEvaluator {
static evaluateRetrievalRelevance(query: string, results: any[], threshold: number = 0.8) {
const relevantResults = results.filter(r => r.score >= threshold);
return relevantResults.length / results.length;
}
static async evaluateAnswerQuality(question: string, answer: string, context: string) {
// Use a separate LLM call to evaluate answer quality
// This is a simplified version - consider using dedicated evaluation frameworks
const evaluationPrompt = `
Rate the quality of this answer (1-5):
Question: ${question}
Context: ${context}
Answer: ${answer}
Rating:`;
// Return rating logic here
}
}Conclusion
RAG systems represent a powerful approach to building intelligent search and Q&A systems that leverage your custom data. By combining vector databases with large language models, you can create applications that provide accurate, contextually relevant responses while maintaining transparency through source attribution. Start with this foundation and iterate based on your specific use case requirements and user feedback.
Related Posts
Building Intelligent Web Applications with the ChatGPT API: A Practical Guide
Learn how to integrate ChatGPT API into your web applications with practical examples and best practices for creating intelligent user experiences.
Building Smart Web Apps with Vector Databases: A Developer's Guide to Semantic Search
Learn how to integrate vector databases into your web applications to build powerful semantic search and AI-powered features.
Building Smart Web Apps with ChatGPT API: A Complete Integration Guide
Learn how to integrate ChatGPT API into your web applications with practical examples and best practices.