Twenty RAG techniques

LLM
GraphAI
Approaches to improved RAG.
Published

January 2, 2025

⚠️ Work in progress

Retrieval-Augmented Generation (RAG) blends retrieval systems with LLMs in order to extend language models beyond their training knowledge. By leveraging a knowledge base or external dataset, RAG enhances the relevance and accuracy of generative AI outputs, making it invaluable for applications such as customer support, content creation, and research assistance.

But like any technology, the effectiveness of RAG depends on how it’s implemented. Small tweaks and strategic optimizations can transform a functional RAG system into an exceptional one. This article explores twenty practical and advanced techniques to refine a RAG pipeline.

Concepts and dimensions

The ingredients of a RAG solution are fairly simple and if you take a step back you can observe the following elements:

  • documents
  • chunks
  • metadata (doc, chunk, node…)
  • entities (links, graphs…)

Each of these elements can be tuned or improved via diverse techniques. Together they form a structural dimension along which you can experiment.

There are also three processing dimensions in a RAG pipeline:

  • ingestion
  • query
  • feedback (optional)

Finally, you can also play with the more technical aspects in a pipeline:

  • chunking strategy (semantic, size…)
  • vector comparisons, embedding models
  • indexing techniques and (re)ranking (BM25, neural reranking…)

The twenty techniques highlighted below can all be place within these three dimensions. I call them dimensions because each has a certain level of complexity and sometimes incremental. Whether you need one or the other is more an art than science, many factors can play a role in deciding what to use:

  • the business context
  • the corpus or repository
  • the budget
  • the accuracy of the end-result
  • the type of user-experience
  • the time allowed for processing queries
  • how dynamic the corpus is and the necessity for updates
  • whether knowledge has to be structured (hierarchic, graph, visualizations…)

You don’t necessarily need a knowledge graph, ontology and agentic frameworks to be successful. Many RAG projects can do without a feedback pipeline or explainable AI. What matters is to understand the business case and the options at your disposal.

I Document compressors

II Header augmentation (aka out of context chunks, contextual chunk headers or CCH)

III Query augmentation (aka query expansion)

IV Hypothetical document embedding

V Graph RAG

VI Adaptive RAG

VII Context Enrichment

VIII Corrective RAG (aka CRAG)

IX Explainable RAG

X Fusion retrieval

XI Hierarchical RAG

XII Propositional chunking

Dense retrieval performance is significantly impacted by the choice of retrieval unit, particularly when using propositions, which are atomic expressions encapsulating distinct factoids. Fine-grained retrieval units, such as propositions, outperform passage-level units in retrieval tasks and improve downstream QA tasks. Propositional chunking involves breaking down text into atomic units called propositions, each representing a distinct fact or idea.

XIII Query rewriting

XIV Raptor

XV Relevant segment extraction

Relevant Segment Extraction (RSE) is an optional (but strongly recommended) post-processing step that takes clusters of relevant chunks and intelligently combines them into longer sections of text that we call segments. These segments provide better context to the LLM than any individual chunk can.

XVI Reliable RAG

The “Reliable-RAG” method improves RAG by incorporating layers of validation and refinement to enhance the accuracy and relevance of retrieved information. The method incorporates checks for document relevance, hallucination prevention, and highlights the exact segments used in generating the final response.

XVII Reranking

XVIII Feedback loop

XIX Self RAG

XX Semantic chunking