3 Vital Insights for Fraud Heads & CTOs on LLMs in AI Onboarding

May 29
4 min read

By SprintHive | May 2026

Every financial institution in South Africa right now is either using LLM in its digital onboarding stack, evaluating it, or being sold it. The universal answer to "Can You Teach LLM to Understand Bank Statements?" is yes. What that “yes” means in practice varies enormously, and for banks, lenders, or telco’s, the gap between a confident “yes” and a production-grade “yes” is where the risk lives. The technology stack sitting upstream of your fraud controls is not a procurement decision; it is a risk decision. And in a market where every service provider will tell you they use LLM for digital customer onboarding, the only question that protects you is the one that asks how.

AI-generated payslips, deepfake onboarding attempts, and synthetic identities constructed from real data are now production-level threats. The KPIs that define a Head of Fraud's performance need to reflect that reality: prevention ROI alongside detection rates, legitimate customer impact alongside fraud loss, and document processing accuracy as a first-order input to fraud signal quality.

SprintHive's 2026 white paper, Can You Teach AI to Understand Bank Statements?, tested every major frontier AI model on real South African bank statements from production environments. The findings produced three insights that every Head of Fraud and CTO should understand before signing off on any AI-powered onboarding infrastructure.

Insight one: Hallucination is structural, not incidental. LLM models of this type do not retrieve verified facts. They predict statistically likely outputs. When a bank statement is clean and representative, prediction and accuracy look the same. When it is a degraded mobile photograph, a modified PDF, or a layout the model has not seen before, the model does not flag uncertainty; it generates a confident wrong answer. In the white paper, every frontier model tested made the same transaction misclassification error on a real South African bank statement, and none of them knew they were wrong.

While OpenAI confirmed in 2025 that hallucination is a permanent property of transformer models, the weakness is not the AI itself, but the lack of a secondary verification layer. For fraud detection, a signal is only as good as the data it is generated from. If the LLM reading the bank statement is fabricating transaction categories, the anomaly detection layer is working from corrupted input. You are not detecting fraud; you are detecting patterns in hallucinated data.

Insight two: Frontier Model Performance & Cost. The white paper found that frontier models take an average of 8.5 minutes per bank statement and cost between R29 and R219 per extraction. At 65,000 statements per month, that is over R1.9 million for document reading alone, before validation, human review, or compliance costs. SprintHive also tested whether a custom-trained Vision-Language Model, built specifically on South African bank statement data, could close this gap. It reduced cost and latency compared to frontier models, but it did not solve the core problem. A fraudster probing a system for its response time still has a significant window in which nothing is being checked if any single model is the only line of defence. Real-time fraud decisioning requires a pipeline that balances the contextual intelligence of AI with the speed and certainty of deterministic validation.

Insight three: Trust requires architecture. SprintHive trained their own specialist Vision-Language Model exclusively on South African bank statement data. The result showed that even a purpose-built, domain-trained model suffers from the same fabrication patterns and generalisation failures as frontier models when documents drift from training conditions. The conclusion was not that LLM is the wrong tool. It is that no single model, however well trained, can replace the need for a consensus and validation architecture around it. A custom model is a better candidate agent within a larger system, not a replacement for the system itself. The answer is an architecture where the AI's output is cross-referenced against verification rules, like verifying that Opening Balance plus Credits minus Debits equals Closing Balance, to catch what any single model will inevitably get wrong.

A natural response to these findings is to build internally. Your data is proprietary, and your formats are known. SprintHive tested this assumption and found it creates a high "Maintenance Tax." Bank-specific training improved scores initially, but exposed deeper failures when banks updated their layouts. However, for a sufficiently large institution, the choice to build is not a failure of logic, but a question of engineering focus. Every engineering month spent maintaining a document extraction system is a month not spent on the proprietary fraud models and credit decisioning logic that actually differentiate a financial services provider.

The National Credit Act (NCA) closes the argument. It is not enough to build something that works in testing. You must demonstrate to the National Credit Regulator that it worked correctly on every document, with a full audit trail. Whether you build or buy, the investment isn't just in the "AI"—it’s in the validation architecture that makes that demonstration possible.

The relevant metric in this architecture is not cost per page. It is cost per error. In a regulated lending environment, under the National Credit Act, with NCR audit scrutiny on every income verification decision, those two numbers lead to very different outcomes.

For a Head of Fraud evaluating service providers, this is the question that cuts through the noise, not "do you use AI?" but "what happens when your AI is wrong, and how does your system know?" Watch SprintHive CEO Dirk le Roux unpack the “Can you teach AI to read bank statements” white paper, or download the full white paper at sprinthive.com.