As enterprises embrace AI at an unprecedented pace, the problem of AI fragmentation continues to pose significant challenges. The inconsistent implementation of tools, coupled with siloed approaches to data, undermines AI accuracy and creates risks for scalability and trustworthiness.
To overcome these challenges, organizations must shift their focus toward building a unified foundation for AI through centralized governance, robust knowledge management, and systematic accuracy measurement. By addressing fragmentation at its core, businesses can unlock the full potential of AI while ensuring consistent, reliable outcomes.
Mitigating the risks of unreliable AI accuracy
There are many factors that affect the accuracy of an AI agent’s answers, including its implementation and the underlying technology and/or the techniques used. Yes: a few enterprises I've spoken with have solved the challenge of mitigating these risks.
In a recent webinar about creating AI Agents, I asked attendees how they measured the accuracy of AI responses or the impact of different prompt wording. Eleven percent of respondents selected “blind trust in AI”, although this faith in the ability of the AI to provide an optimal response is misplaced.
At Tungsten, we recently undertook some testing of how different combinations of AI models, RAG approaches, and ways of structuring datasets perform by benchmarking an agent’s understanding of complex software product concepts through certification exams. We also took the opportunity to test some popular search agents in the market that have access to the entire internet to find answers to questions.
The variation in results were huge! I recorded over a 30% difference in accuracy across different agents and knowledge retrieval approaches for the same questions using the same data sets, with some big-name vendors falling far below the expectations set by their marketing!
I think the reason they get away with such bold claims is that if the responses look credible, they, are often accepted as the truth – even if they are only partially correct, or worse, complete hallucinations!
So how can we start to systematically address the challenges of a 30%+ variation in AI accuracy, without “throwing the baby out with the bathwater” and completely missing the benefits that AI can offer?
Create an AI centre of excellence
A great first step is to create a centre of excellence, steering committee, or task force to not just drive AI adoption across the enterprise, but to define policies that ensure the use of AI supports the company’s goals and regulatory compliance. This means:
- Detail approved technologies that have passed IT, legal and security due diligence
- Align the types of available AI services to the needs of different user populations
- Support user access to AI while meeting the company’s governance requirements
- Ensure best practices are captured and standardised
- Establish clear guidance covering when AI can, shouldn’t or must not be used
- Encourage cross-functional collaboration
- Foster and share AI talent
- Define AI technology, data and reporting standards
- Monitor adoption, benchmarking AI performance and tracking success/failures
Measure, measure, and measure again
Once the CoE has been established, a key task is to test the effectiveness of different AI technologies and approaches to meet the company’s objectives.
With the pace of AI innovation accelerating, there is a continual flow of new models, vendors and design patterns competing for attention.
Ongoing monitoring of AI performance, using a combination of automated testing, benchmarking and HITL sampling of responses, is key to establishing baseline performance against which improvements from new technology or approaches can be measured.
This provides a framework to both identify phenomenon such as model drift, where the accuracy of predictions produced from new inputs “drifts” from the performance seen during training, and to test the potential benefits of newer LLMs or different RAG techniques.
While requiring additional configuration and setup, using techniques like result reranking, using semantic or layout chunking, prompt rewriting, multi-step RAG, tweaking the number of responses returned, or allowing secondary searches to expand the result set (e.g., following graph connections between chunks) can dramatically improve RAG accuracy for question-answering. Identifying what works best for your company’s use cases is essential to avoid every department reinventing the wheel.
Based on our tests, it is possible to improve the base “semantic search” RAG accuracy for question-answering by more than 20% simply by tweaking some of the parameters described above. Some of these are essentially “quick fixes,” such as switching to a best-practice knowledge base template for agents, instead of relying on hand-crafted RAG and search prompts. Leveraging an automation platform that encodes that best practise should also save significant implementation time and cost compared to a home-grown solution built with code and open-source tools for chunking and related tasks.
Consolidate
An emerging strategy to address the lack of suitable content and data for AI projects and to address limitations in consumer AI tools that limit the amount of supporting data that can be provided (either for RAG or directly in the context window) is to provide ready-made agents that are optimised for knowledge discovery or retrieval.
For example, there are many AI chat and agent use cases that use access to company data, operating procedures or policies:
- Product support agents – answering technical or usage questions based on documentation and support knowledge bases.
- Pricing and licensing bots – providing accurate quotes, terms, and regional variations based on internal pricing policies.
- Sales enablement assistants – generating outreach emails or RFP responses using curated product and compliance information.
- Legal and contract analysis agents – reviewing clauses, identifying risks, and summarising terms using approved legal templates and guidelines.
- Demo coaches – guiding sales engineers through product capabilities and competitive positioning during live demos.
- InfoSec and compliance chatbots – responding to security questionnaires or audits using validated internal documentation.
- Internal helpdesk agents – assisting employees with HR, IT, or operational queries using company policies and procedures.
- Training and onboarding assistants – helping new hires understand tools, workflows, and company culture using curated onboarding materials.
Rather than relying on each citizen developer to find the right base documents and data to use, and leaving them responsible for keeping this information up to date or ensuring quality standards are met, a single enterprise knowledge base can be provided that is easy to include in the agents created by citizen developers, while centrally managing the tasks of maintaining, updating, and validating responses.
Architecting for simplicity and reuse
To truly overcome AI fragmentation, enterprises must move beyond tactical fixes and embrace architectural thinking. This means providing access to centralised knowledge bases optimised for accurate retrieval and proactively managing them to stay up to date – making it easy for citizen developers to create agents that have a reliable source of knowledge.
Expecting citizen developers to understand and connect to APIs creates a barrier to adoption that will hold back AI transformation. Luckily, a new standard is emerging that provides a unified framework for how AI agents access tools, data, and knowledge – one that supports both current needs and future growth.
Model Connection Protocol (MCP) provides a standardised way for agents to access tools, knowledge bases, APIs, other agents and enterprise systems in a consistent, well governed way. It creates a Service-Oriented Architecture for Agents to discover and use services without needing to expose and manage the complexities of wiring together point-to-point APIs.
When combined with a centrally managed, consolidated, best practice knowledge base, this isn’t just about technical plumbing; it’s about creating agent memory, where every AI interaction is grounded in the same trusted source of truth.
By consolidating and curating the storage of AI-ready content in a knowledge base that embeds best practices content chunking, graphing and retrieval approaches, and then exposing it via MCP, we ensure every agent – whether embedded in Teams, Copilot, GPTs, or any other 3rd party AI chat environment – delivers consistent, accurate, and auditable responses.
Curate for quality, not quantity
This insight has profound implications for enterprise AI strategy. Rather than chasing bigger context windows or brute-force RAG implementations, we should focus on content quality, structure, and relevance. Investing in intelligent document processing (IDP), layout-aware chunking, and multi-step RAG techniques to provide optimised knowledge discovery can yield dramatic improvements in accuracy – often with minimal additional cost. Combined with well-governed and managed knowledge bases – where content is kept up to date and responses are continually monitored to meet quality and accuracy standards – this approach minimises risks while maximising the benefits of AI.
Longer term, this means writing documents with AI reuse in mind – the clearer your contracts, policies, product descriptions or documentation are, the easier they will be for AI Agents to leverage.
In our tests, switching to a best-practice knowledge base template improved accuracy by over 20%, and using advanced retrieval techniques with careful LLM selection pushed performance even higher. These aren’t theoretical gains – they’re real, measurable improvements that directly impact automation success rates, customer satisfaction, and trust in AI.
Build the business case for unified knowledge
AI fragmentation isn’t just a technical issue - it’s a strategic risk. Every time an agent gives a different answer to the same question, it erodes trust, introduces liability, and undermines the promise of AI.
The cost of inconsistent AI responses isn’t just reputational – it’s operational. A 15% drop in automation accuracy can mean thousands of hours of manual rework, lower CSAT scores, and reduced adoption of AI tools across the enterprise. Worse, it can lead to a proliferation of “silver bullet” workarounds that become production dependencies, putting the business at further risk.
That’s why it’s essential to factor AI measurement and accuracy into the business case. By investing in a unified knowledge platform - one that supports curated content, advanced retrieval, and consistent governance – we can unlock the full value of AI while mitigating its risks.
Building AI trust on a unified foundation
AI trust depends on reducing AI fragmentation and improving AI accuracy. The way forward is a single, governed knowledge base, standardised access to tools and data (for example via MCP), and continuous measurement to preserve completeness, sequencing, and explainability.