Legal RAG

Business

Opinion

Graphs

Some insights from a legal AI project

The legal domain (in any language or culture) fits very well the RAG approach because the texts are fairly flat, ie. with very little tables, graphics or formulas. As such, it has gained a lot of attention from tech people on the one hand and from a rather non-tech legal community in search of innovation, on the other. From a tech perspective it’s the ideal knowledge base you can ingest without too much hassle. From the lawyer point of view it’s a way to speed up research and gather inisghts more quickly, thus giving firms a competitive advantage.

The Creyten AI project was developed in a span of around 8 months and used a mix of stable tools:

MongoDB Atlas as a vector store
LangGraph and LangChain as chat AI framework
PostGres for metadata
Azure blob store for chunks and documents
FastAPI middleware with a Vue front-end
OpenAI as LLM.

Nothin fancy, maybe Atlas is somewhat unusual but we could have taken PineCone or Weviate or any other vector DB really. Atlas proved sufficient, stable and affordable considering that many vector database vendors at the time were in flux.

The biggest hurdles in this type of RAG project are related to ingestion:

converting pdf’s to clean text
scraping web data
the myriad of pdf-to-markdown tools
OCR tech and imagery packages
organization of vectors, chunks and docs.

In retrospecs, the lesson learned here is to adopt a workflow engine early on rather than using notebooks and Python scripts. You inevitably need to experiment and combine things within notebooks, but it should be converted as soon as it’s stable into repeatable workflows.

One thing I am proud of is the fact that there was an emphasis on evaluation and metric-based comparison of models and tools. Discarding gut feelings and personal preferences altogether. Just like workflows, this can be an initial investment but you get rewarded later on. The team also went into genetic algorithms due to the fact that the space of parameters quickly becomes too vast to use grid methods.

Since my core business is all about graphs and my personal interest is in knowledge graph extraction (see Knwl) you might wonder why we did not go there. Though I am convinced that graph RAG is better than classic RAG, I always give my consulting advice as balanced as possible with an understanding of cost and other factors:

graph RAG is costly in every sense
it demands a lot more from a team in terms of tech know-how and learning curve
it quickly escalates into ontologies and data organization challenges
the benefit is limited unless you really want something out of the ordinary.

We would have taken double the time to achieve things with graph RAG and my view is supported by others in the field (just search a bit on YouTube). That said, we did venture somewhat into ontology and taxonomy. We assumed, incorrectly, that the end-user would benefit from selecting categories or to refine chat interactions via segments. The reality is that users are lazy and chat has made things worse. They expect an answer with as little input as needed. To some extent, users over-estimate AI to figure everything out. They don’t want to scroll and think, just get an answer and move on. On a higher level, AI makes people think even less than before. So, the initial intention to extract and organize things within an ontology was abandoned. I witnessed the same in other consulting projects, the investment cost (into ontologies) is not worth it unless you have a big multi-national with very long term vision. You need domain experts and taxonomists, you need time and money to make it happen and I have seen many projects put on hold due to the costs it involves.

The best LLM depends on the data at hand and our metric indicated OpenAI was the best choice. In fact OpenAI was (at that point in time) also the best approach to convert pdf’s to markdown. The batch processing is, however, where it gets a little annoying. The offline processing in batch is cost effective but unless you have things in workflow engine it’s a burden to manually keep track of what’s processed and refresh the view. If you do things with scripts and notebooks, this is were you wished you had things in Prefect, Airflow, Kestra or whatever platform.

Once all is in place you discover the final challenge: scaling AI ain’t easy. Running a one-to-one chat is simple but serving hundreds or thousands at the same time (preferrably with streams) is another cattle of fish. A typical chat stream takes minutes and things like FastAPI or web services in general are made to detach as quickly as possible. Things like web-sockets clash with load-balancing, state management and whatnot. If you browse a bit (especially on LinkedIn) you will discover that many companies struggle with scale. There are for sure solutions but the cost can be gigantic and companies (correctly) don’t see a bot as business critical. So, the main insight here is that an AI project is not easily scalable and if you wait untile the end, you will discover that this aspect stretches your project schedule more than expected.

Of course, there is a bit a disclaimer here. By the time you read this, the AI world has changed overnight. Google rolls out ready to use RAG solutions and the Copilot vibe coding flux means that a project like the one I describe here, quickly becomes outdated. Technology is cheap but like any art or craft, the wisdom you gain from hard work is priceless. Understanding and consulting advice can’t be gained from vibe coding or off-the-shelf solutions. In this sense, no matter how the giants roll out AI solutions, this project was invaluable for me as a consultant.