The rapid adoption of AI tools has brought about AI fragmentation where the sheer variety of tools and their inconsistent implementation lead to challenges in quality and accuracy. One approach to addressing this issue is Retrieval Augmented Generation (RAG), which grounds AI models in reliable data sources to reduce the risk of hallucinations and improve trustworthiness.
However, while RAG offers a promising step forward, it introduces its own complexities, particularly in ensuring that responses are accurate, complete, and well-sequenced.
Does RAG actually solve hallucinations?
Being aware of the risk of “hallucinations” from foundation AI models, a common strategy is to “ground” the AI model using supplied data (the uploaded files and web links in the examples above). This approach, called “Retrieval Augmented Generation” (RAG) provides transparency about how the AI came up with its results by providing references.
By providing a level of transparency, RAG helps users build trust in AI responses . The answers not only look and sound credible but also include reference links back to the source. However, this trust can be misplaced. Just because an answer sounds credible, and has a link to a source, doesn’t mean the answer is the best possible (most accurate) answer to the question at hand.
Another key issue is the sequencing or completeness of the provided response. Most basic RAG implementations use a technique call semantic search, where the AI model is used to encode the question into a format called a vector, which can be used to find blocks of text with a similar meaning using a nearest neighbour search algorithm. This powerful technique moves way past traditional keyword or fuzzy matching, enabling use cases such as multi-lingual search across the same set of content. However, it doesn’t provide any guidance on how search results relate to each other, which can cause key steps in “How do I…” questions to be missed or returned out of order.
Working with tables
Another area where AI question-answering accuracy varies greatly is when structured information is deeply embedded in documents or content – for example, when answers depend on tables or figures in a complex report.
The ability of the AI to interpret the information in a table depends how the data is passed to the AI. Broadly, there are 3 options:
Again, the technique used and the capabilities of the underlying model to accurately identify and extract the needed data have a massive impact on results. Numerous benchmarks test different GenAI multimodal and vision models for OCR & table extraction accuracy, with scores ranging from 42%-91% depending on the model used.
Similarly, using OCR results can confuse the AI if the layout and semantics of the document or table are lost. Consider a page formatted with columns of text: OCR results by themselves will provide a line of text read across all the columns on the page, creating “sentences” made from words in different paragraphs. For tables, this can result in fragmented or clipped sets of data, with boundaries between cells, rows, or columns in the table being lost.
In comparison, pre-extracting tables (and, if needed, validating with human-in-the-loop) using the latest computer vision-powered IDP models typically achieve 97%+ extraction accuracy for table data. Likewise, computer vision can interpret document layouts to provide coherent “chunks” of text to LLMs, regardless of paragraphs, sections or page boundaries. This approach provides content to the AI structured in a way that is easy for it to consume, ensuring the visual layout of tables and documents is considered.
Again, for consistently and accuracy, the RAG process needs to both correctly locate the most appropriate table to use *and* provide the results in a format that corresponds to the original table layout and intent.
Can’t the AI just figure it out?
Another consideration is how much content or data is provided to the AI to work with.
With growing “context windows”, Large Language Models can accept more content or data to work with. However, bigger isn’t always better as this article from Chroma Research explores. A phenomenon called “Context Rot” means we should be cautious about just giving ever larger sets of inputs to an LLM – the AI will get much better results from a smaller curated set of data. If you give the AI too much data to work with, it can become “lost in the middle” as it tries to process all the data provided.
RAG provides one way to address this, since it gathers curated results from the Knowledge Base search. However, it also creates a dependency on the Knowledge Base to return the right results; otherwise, the whole downstream agent process is built on sand.
Going beyond basic RAG for AI accuracy
As context windows grow, more input does not guarantee better answers. The dependency shifts to the Knowledge Base. If retrieval is off, reference links can mask gaps, steps may be missed or out of order, and tables passed as images or noisy OCR can trip models. Accuracy rests on curated knowledge, precise retrieval, and completeness and sequencing, not just the model.
A practical path is a centrally governed Knowledge Base paired with sequence and layout aware retrieval, structured extraction of tabular data via IDP with human-in-the-loop validation, and continuous accuracy measurement. With TotalAgility, organisations can centralise and curate content, standardise retrieval patterns including leveraging multi-step search and “graph” relationships between entities or concepts, extract and validate tables, and monitor outcomes while remaining model agnostic, turning experimentation into reliable, auditable answers.