Fintech teams budget their LLM costs carefully. They estimate request volume, multiply by token count, apply current pricing, add a safety margin. Then fraud detection goes live and the bill is four times the estimate.
This happens consistently enough that it's a pattern, not a surprise. Fintech LLM workloads have three properties that make standard cost estimates unreliable: volume scales with every transaction rather than every user, compliance constraints limit which optimizations you can safely use, and document-heavy use cases carry token counts that are an order of magnitude higher than typical chat or classification tasks.
1. Fraud detection is the top cost driver in fintech LLM stacks — it runs on every transaction, not just flagged ones, and teams consistently underestimate volume by 10–50x.
2. PCI-DSS and GDPR don't prevent caching — they constrain how you cache. The right architecture separates PII from cached responses and invalidates by customer ID.
3. The highest-ROI optimization in fintech is cascade routing: a cheap classifier first, expensive model only on escalations. Typical result: 50–70% cost reduction on fraud workflows with no accuracy loss.
Fintech AI Use Cases and What They Actually Cost
Monthly estimates for a mid-size fintech — approximately $5M ARR, 50,000 active customers — running production AI workloads:
Customer support sits at the bottom despite being the most visible AI feature. The reason: it has the highest duplicate rate of any fintech workload — the same account questions asked by thousands of users — making it the most cacheable. Fraud scoring sits at the top because it runs against every transaction in the system.
For the full picture of how these numbers fit into your total AI cost structure, see our breakdown of why AI costs compound even when LLM prices fall.
Why Fraud Detection Costs 4x Your Budget
The budgeting mistake follows a consistent pattern. The fraud team scopes the LLM against "suspicious transactions" — those flagged by existing rule-based systems. That's typically 1–3% of total transaction volume. At 1 million transactions per day, that's 10,000–30,000 LLM calls. The estimate looks reasonable.
Then the system goes live. The engineering team discovers that scoring only pre-flagged transactions produces too many false negatives — the LLM misses fraud that the rules didn't catch. The right architecture scores all transactions and uses the LLM output as a signal, not a gatekeeper. Volume goes from 30,000 calls per day to 1,000,000.
LLMs are genuinely valuable for fraud detection — they reduce false positives by 60–80% compared to rules-only systems by understanding transaction context that structured data misses. The cost is real and justified. The problem is that teams discover the real cost in production rather than before go-live.
Want to see what fraud detection is actually costing per transaction?
Preto attributes LLM cost by feature and endpoint — so you can see the per-transaction economics before they surprise you.
See Your Cost Breakdown by FeatureOne URL change. See which features cost the most. Free to start.
The Compliance Caching Problem (and the Workaround)
The instinct to cache fraud and KYC responses runs into PCI-DSS and GDPR immediately. You can't store cardholder data or personal financial information in a cache without proper controls. So most fintech teams conclude caching isn't available to them — and overpay.
The workaround is architectural: separate what you cache from what contains PII.
For prompt caching: Cache the SHA-256 hash of the sanitized prompt — not the prompt itself. Strip or tokenize PII (card numbers, account IDs, customer names) before hashing. The cache key is derived from the content pattern, not the customer's data.
For document analysis: Cache at the document-type + extraction-schema level. "What fields does this type of income statement contain?" and "What risk indicators appear in this credit report format?" are cacheable questions. The specific customer values are not.
For GDPR right-to-erasure: Tag every cached response with the customer IDs it relates to. When a deletion request arrives, invalidate the corresponding cache entries. This is a few extra lines in your cache layer — not a reason to skip caching entirely.
PCI-DSS's constraint is on storing cardholder data, not on caching structured outputs. A cached response that says "high fraud risk: velocity pattern detected" contains no cardholder data — it's safe to store.
Three Optimizations That Work in Constrained Environments
1. Cascade routing for fraud scoring
The most impactful change in any high-volume fintech LLM workflow. Instead of sending every transaction to a capable (expensive) model, run a fast classifier first and escalate only uncertain cases:
70% of transactions get a high-confidence classification from the cheap model and never touch the expensive one. 30% escalate. Blended cost drops 50–70% with no measurable accuracy loss on the overall system — because the uncertain cases, where you need more reasoning, still get full model capability.
2. Document-level result caching for KYC
KYC documents — passports, utility bills, bank statements — are submitted once per customer during onboarding. If your system re-analyzes them on every subsequent verification check, you're paying for redundant work. Cache the structured extraction result against the document hash. The same document analyzed six months later returns the cached result instantly.
This is not a compliance risk because you're caching the extracted fields (document type: passport, issuing country: US, expiry: valid) — not the raw document image or PII. The cache stores what your system learned from the document, not the document itself.
3. Compliance prompt template caching
Fintech system prompts carrying regulatory context — AML rules, OFAC screening criteria, KYC policy — are often 1,500–3,000 tokens long. That's repeated on every call. OpenAI's prompt caching charges $0.025/1M tokens for cached input reads versus $2.00/1M for uncached. For a 2,000-token system prompt at 100,000 requests per day, that's a $3,500/month difference for one template change.
Frequently Asked Questions
Why does fraud detection cost so much more than expected?
Can fintech companies cache LLM responses given PCI-DSS and GDPR?
What are the largest LLM cost drivers in fintech?
How does PCI-DSS affect LLM proxy architecture?
What is cascade routing and how does it apply to fraud detection?
See which fintech features are driving your LLM bill.
Preto attributes cost by feature, model, and endpoint in real time — so fraud detection, KYC, and compliance monitoring each show their own cost line. One URL change, no code refactor.
See Your Cost Breakdown by FeatureFree forever up to 10K requests. No credit card required.