// Glossary · technical

RAG (Retrieval Augmented Generation)

Also: retrieval-augmented generation

AI technique where the model retrieves relevant context from a document store before generating, so answers stay grounded in your data instead of hallucinating.

Retrieval Augmented Generation is the architecture that lets a language model answer questions from your private data without retraining the model itself. The flow is straightforward. A user asks a question. The system embeds the question into a vector and searches a vector database for the most semantically relevant chunks of your documents. The retrieved chunks get passed to the language model as context, along with the original question. The model generates an answer grounded in the retrieved material, often with citations back to source documents. The result is a model that answers questions about your company knowledge base, support history, or product documentation without ever having seen the data during training.

RAG matters because it solves the two biggest problems with raw language models in production. First, the hallucination problem: an ungrounded model generates plausible-sounding but wrong answers when it does not know the actual fact. With RAG, the model sees the actual fact in the retrieved context and quotes from it. Second, the freshness problem: a model trained on 2023 data does not know your product launched a new feature last week. With RAG, you index the new documentation and the model knows about it immediately. This is the foundation of every internal AI copilot and KB-trained AI that funded teams deploy for support and ops workflows.

A production RAG system involves more than vector search. Document chunking strategy decides what gets retrieved. Embedding model choice affects retrieval quality. Reranking sits between retrieval and generation to prioritize the best chunks. Evaluation infrastructure measures whether answers stay grounded over time. For sensitive data, the entire stack runs on infrastructure you control, often paired with a local LLM so customer data never leaves your perimeter. The AI Ops Department and AI Support Department both ship RAG-backed copilots as part of standard delivery, against your knowledge base rather than a generic public corpus.

// Examples
  • An internal copilot indexes 4,200 Notion pages and answers employee questions with citations back to the source, removing 60% of repeated questions to the ops team.
  • A support deflection layer runs RAG against the help center and handles 41% of tier-1 tickets without human handoff, all answers traceable to source articles.
  • A sales engineering copilot indexes 380 past RFP responses and drafts new RFP answers in 8 minutes that previously took 4 hours of manual research.
// Common questions
How is RAG different from fine-tuning?
Fine-tuning trains a model further on your data so it learns the patterns. RAG keeps the model unchanged and retrieves your data at query time. RAG is faster to set up, easier to update (just re-index), and better for factual recall. Fine-tuning is better for style transfer and domain-specific reasoning. Many production systems use both: fine-tune for style, RAG for facts.
What chunk size works best for RAG?
Depends on the document type and the questions you expect. 200 to 500 token chunks work well for FAQ and policy documents. 800 to 1,200 token chunks work better for long-form technical documentation where context matters. Overlapping chunks (50 to 100 tokens of overlap) prevent answers from being split across boundaries.
Do I need a vector database for RAG?
For anything beyond a small prototype, yes. Pinecone, Weaviate, Qdrant, and pgvector are common choices. Below 10,000 documents you can run vector search in memory with FAISS or even Postgres. Above that, a dedicated vector database gives you the latency and concurrency you need for production traffic.
Can RAG eliminate hallucinations entirely?
No, but it reduces them substantially. The model can still misinterpret retrieved context, ignore it in favor of training data, or generate over-confidently when retrieval returns weak matches. Good RAG systems include retrieval quality scoring, citation requirements, and evaluation infrastructure to catch failures, not just trust the model to behave.
// Related terms
// Ready to ship?

EOI runs fractional AI departments for funded teams under 50. Sales, Content, Ops, Support. Live in 14 days on a monthly retainer.