Architecture

Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) is a method for building AI systems that retrieve relevant documents, records, or data from a controlled knowledge base before generating a response — ensuring that outputs are grounded in specific, current, authoritative sources rather than relying on what the model learned during training.

Why RAG exists: the hallucination problem

Large language models are trained on large corpora of text and can generate fluent, plausible-sounding responses to almost any question. The problem is that their knowledge is frozen at a training cutoff, they have no access to proprietary organisational data, and they will sometimes generate confident-sounding responses that are factually wrong — a behaviour known as hallucination.

For enterprise use cases — answering questions about internal policies, processing documents from specific suppliers, generating reports grounded in actual operational data — hallucination is not a minor inconvenience. It is a production blocker. RAG addresses this directly: rather than asking the model to generate from memory, the system first retrieves the relevant source material and then generates a response grounded in that material, with citations.

How RAG works in an enterprise context

In production, RAG involves several steps. Organisational knowledge sources — SharePoint libraries, PDF documents, database records, CRM notes — are indexed into a retrieval system, typically using vector embeddings. When a user submits a query, the system retrieves the most relevant passages from the index. The language model then generates a response based on those passages, citing the sources it used.

The key enterprise requirements are access control (users should only retrieve from sources they are authorised to see), freshness (the index should stay current as source documents update), and citation (responses should reference the source so the output can be verified).

RAG versus fine-tuning

A common question is whether to use RAG or to fine-tune a model on organisational data. Fine-tuning bakes knowledge into the model weights — it improves the model's general behaviour in a domain but does not give it access to specific, current documents. RAG retrieves from a live knowledge base — it can reference a document updated this morning.

For most enterprise use cases involving proprietary knowledge (policies, contracts, product data, customer records), RAG is the appropriate architecture. Fine-tuning is more relevant for changing the model's style, tone, or task-specific capabilities — not for grounding it in organisational content.

Next step

See how GenOS puts this into production for enterprise teams.

Book a demo
All terms