projects / conversational-rag

Grounded RAG for Technical Documentation

We built a RAG system over private technical documentation so people can ask normal questions instead of digging through menus, manuals, and internal pages. Things like “what changed in this version”, “where is this configured”, or “what does this object do”. It's still evolving, but the core behaviour is already useful: retrieve the right material, answer from that material, and cite where it came from.

The problem

The product has a lot of surface area. Documentation is spread across versions, sections, and a few different source types, and normal search only really works when you already know the right term. Most of the time people don't. They have a rough intent, not the exact keyword, and that gap is where the friction was.

What we wanted was something closer to a technical interface in plain language. Not a general chatbot. More like a system that can take a vague question, find the most relevant source material, and answer in a way that is still tied back to the docs.

How it works

At a high level, it is still standard RAG: chunk the documentation, embed it, retrieve the nearest matches for a question, and give those retrieved chunks to the model as context for the answer.

The useful part was getting the details right. We index documentation by version, pull in more than one source type, filter weak matches, and rerank results before generation so the best chunks are more likely to end up in front of the model. The answer is supposed to come from retrieved context, with citations, not from the model filling gaps on its own.

We also ended up building two paths through the system: a fast retrieval path for normal use, and a deeper evaluation path where we can decompose a question, grade retrieved chunks, retry retrieval, and verify whether the final answer is actually supported by the sources.

What we learned in practice

The model was not the hard part. The hard part was everything around it.

Keeping the index aligned with changing source material matters more than most of the prompt work. If the retrieval layer is stale, incomplete, or slightly off, the final answer looks confident but gets worse in ways that are hard to notice quickly. A lot of the real engineering work ended up being ingestion, chunking, routing, and evaluation rather than “AI” in the flashy sense.

The other lesson was that retrieval quality is rarely one problem. It is usually several small ones stacked together: the wrong chunk boundaries, terms users ask for not matching the wording in the docs, one source type dominating another, or a decent result appearing just low enough in the ranking to miss the context window.

Where it is now

The system is working well for grounded question answering over the core documentation. It is especially good when the question can be answered directly from retrieved material and shown with clear citations.

What is still being improved is the part people casually call “conversation”. Follow-up questions across turns are harder than first-pass answers, and retrieval ranking still has edge cases. We are also spending time on evaluation and operator feedback so we can see where the system actually helps and where it still falls short.

This is a company project, so I cannot share code or internal screenshots. What I can say is that it has already changed how people look for information. Once there is a natural-language layer on top of the docs, the old interface starts to feel much more rigid than you realised.