Twenty RAG techniques

LLM

GraphAI

Approaches to improved RAG.

Published

January 2, 2025

⚠️ Work in progress

Retrieval-Augmented Generation (RAG) blends retrieval systems with LLMs in order to extend language models beyond their training knowledge. By leveraging a knowledge base or external dataset, RAG enhances the relevance and accuracy of generative AI outputs, making it invaluable for applications such as customer support, content creation, and research assistance.

But like any technology, the effectiveness of RAG depends on how it’s implemented. Small tweaks and strategic optimizations can transform a functional RAG system into an exceptional one. This article explores twenty practical and advanced techniques to refine a RAG pipeline.

Concepts and dimensions

The ingredients of a RAG solution are fairly simple and if you take a step back you can observe the following elements:

documents
chunks
metadata (doc, chunk, node…)
entities (links, graphs…)

Each of these elements can be tuned or improved via diverse techniques. Together they form a structural dimension along which you can experiment.

There are also three processing dimensions in a RAG pipeline:

ingestion
query
feedback (optional)

Finally, you can also play with the more technical aspects in a pipeline:

chunking strategy (semantic, size…)
vector comparisons, embedding models
indexing techniques and (re)ranking (BM25, neural reranking…)

The twenty techniques highlighted below can all be place within these three dimensions. I call them dimensions because each has a certain level of complexity and sometimes incremental. Whether you need one or the other is more an art than science, many factors can play a role in deciding what to use:

the business context
the corpus or repository
the budget
the accuracy of the end-result
the type of user-experience
the time allowed for processing queries
how dynamic the corpus is and the necessity for updates
whether knowledge has to be structured (hierarchic, graph, visualizations…)

You don’t necessarily need a knowledge graph, ontology and agentic frameworks to be successful. Many RAG projects can do without a feedback pipeline or explainable AI. What matters is to understand the business case and the options at your disposal.

Take a look at our article regarding QA generation for DeepEval which highlights the process of evaluating a RAG solution.

I Document compressors

II Header augmentation (aka out of context chunks, contextual chunk headers or CCH)

III Query augmentation (aka query expansion)

IV Hypothetical document embedding

V Graph RAG

An extensive compilation of graph RAG research papers
Microsoft Graph RAG
Formerly Neo4j GenAI but now called Neo4j GraphRAG for Python
Nano Graph RAG
LightRAG
Fast Graph RAG
TrustGraph

VI Adaptive RAG

This techniques introduces a classification of a given question prior to one of several specialized RAG pipelines. For instance, a question can be cataloged as “factual” or “opinion” and for each an appropriate prompt handles the question accordingly. This can also mean that the question is multiplied into multiple question. If it’s an “opinion” this would involve asking the LLM first what aspects are relevant and generating a separate question for each perspective.

In essence, this is prompt tuning and not so much RAG tuning but the difference is sublte and can lead to better answers.

VII Context Enrichment

Extracting Metadata for Better Document Indexing and Understanding

VIII Corrective RAG (aka CRAG)

IX Explainable RAG

X Fusion retrieval

XI Hierarchical RAG

XII Propositional chunking

Dense retrieval performance is significantly impacted by the choice of retrieval unit, particularly when using propositions, which are atomic expressions encapsulating distinct factoids. Fine-grained retrieval units, such as propositions, outperform passage-level units in retrieval tasks and improve downstream QA tasks. Propositional chunking involves breaking down text into atomic units called propositions, each representing a distinct fact or idea.

XIII Query rewriting

XIV Raptor

XV Relevant segment extraction

Relevant Segment Extraction (RSE) is an optional (but strongly recommended) post-processing step that takes clusters of relevant chunks and intelligently combines them into longer sections of text that we call segments. These segments provide better context to the LLM than any individual chunk can.

Introducing Relevant Segment Extraction (RSE)

XVI Reliable RAG

The “Reliable-RAG” method improves RAG by incorporating layers of validation and refinement to enhance the accuracy and relevance of retrieved information. The method incorporates checks for document relevance, hallucination prevention, and highlights the exact segments used in generating the final response.