Twenty RAG techniques
⚠️ Work in progress
Retrieval-Augmented Generation (RAG) blends retrieval systems with LLMs in order to extend language models beyond their training knowledge. By leveraging a knowledge base or external dataset, RAG enhances the relevance and accuracy of generative AI outputs, making it invaluable for applications such as customer support, content creation, and research assistance.
But like any technology, the effectiveness of RAG depends on how it’s implemented. Small tweaks and strategic optimizations can transform a functional RAG system into an exceptional one. This article explores twenty practical and advanced techniques to refine a RAG pipeline.
Concepts and dimensions
The ingredients of a RAG solution are fairly simple and if you take a step back you can observe the following elements:
- documents
- chunks
- metadata (doc, chunk, node…)
- entities (links, graphs…)
Each of these elements can be tuned or improved via diverse techniques. Together they form a structural dimension along which you can experiment.
There are also three processing dimensions in a RAG pipeline:
- ingestion
- query
- feedback (optional)
Finally, you can also play with the more technical aspects in a pipeline:
- chunking strategy (semantic, size…)
- vector comparisons, embedding models
- indexing techniques and (re)ranking (BM25, neural reranking…)
The twenty techniques highlighted below can all be place within these three dimensions. I call them dimensions because each has a certain level of complexity and sometimes incremental. Whether you need one or the other is more an art than science, many factors can play a role in deciding what to use:
- the business context
- the corpus or repository
- the budget
- the accuracy of the end-result
- the type of user-experience
- the time allowed for processing queries
- how dynamic the corpus is and the necessity for updates
- whether knowledge has to be structured (hierarchic, graph, visualizations…)
You don’t necessarily need a knowledge graph, ontology and agentic frameworks to be successful. Many RAG projects can do without a feedback pipeline or explainable AI. What matters is to understand the business case and the options at your disposal.
I Document compressors
II Header augmentation (aka out of context chunks, contextual chunk headers or CCH)
- LangChain implementation
- Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation
- Searching for Best Practices in Retrieval-Augmented Generation
- Financial Report Chunking for Effective Retrieval Augmented Generation
- Introducing a new hyper-parameter for RAG: Context Window Utilization
- RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation
III Query augmentation (aka query expansion)
IV Hypothetical document embedding
V Graph RAG
- An extensive compilation of graph RAG research papers
- Microsoft Graph RAG
- Formerly Neo4j GenAI but now called Neo4j GraphRAG for Python
- Nano Graph RAG
- LightRAG
- Fast Graph RAG
- TrustGraph
VI Adaptive RAG
- Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
- Self-adaptive Multimodal Retrieval-Augmented Generation (SAM-RAG)
- MBA-RAG: A Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity
- CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
- SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
- RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering
- LangGraph implementation
- LlamaIndex implementation
VII Context Enrichment
VIII Corrective RAG (aka CRAG)
IX Explainable RAG
X Fusion retrieval
XI Hierarchical RAG
XII Propositional chunking
Dense retrieval performance is significantly impacted by the choice of retrieval unit, particularly when using propositions, which are atomic expressions encapsulating distinct factoids. Fine-grained retrieval units, such as propositions, outperform passage-level units in retrieval tasks and improve downstream QA tasks. Propositional chunking involves breaking down text into atomic units called propositions, each representing a distinct fact or idea.
XIII Query rewriting
XIV Raptor
XV Relevant segment extraction
Relevant Segment Extraction (RSE) is an optional (but strongly recommended) post-processing step that takes clusters of relevant chunks and intelligently combines them into longer sections of text that we call segments. These segments provide better context to the LLM than any individual chunk can.
XVI Reliable RAG
The “Reliable-RAG” method improves RAG by incorporating layers of validation and refinement to enhance the accuracy and relevance of retrieved information. The method incorporates checks for document relevance, hallucination prevention, and highlights the exact segments used in generating the final response.