Legal tech teams scope their LLM costs carefully. They count the contracts to review, estimate average document length, multiply by token price, add a buffer. Then contract analysis goes live and the bill is five to seven times the estimate.
The surprise isn't volume — it's token count. Legal documents are an order of magnitude longer than the inputs most LLM cost estimates assume. An NDA is 3,000 tokens. A commercial lease is 20,000. An M&A agreement can exceed 100,000. Teams that calibrate their estimates on short documents and then process complex agreements face per-document costs that bear no resemblance to the budget.
1. Document length — not document count — is the primary cost driver in legal tech LLM stacks. Contract analysis budgets built on short-document assumptions run 5–7x over when complex agreements enter the pipeline.
2. Attorney-client privilege does not prevent caching — it constrains what you cache. Analytical outputs (clause types, risk signals) are cacheable; document content is not.
3. Cascaded document review — cheap model for relevance screening, expensive model for substantive analysis — cuts eDiscovery LLM costs 50–70% with no accuracy loss on the overall review.
Legal Tech AI Use Cases and What They Actually Cost
Monthly estimates for a mid-size legal tech SaaS — approximately $5M ARR, 50 law firm clients — running production AI workloads across active matters:
Client intake sits at the bottom despite being client-facing. The reason: intake questions follow predictable patterns — the same matter type, jurisdiction, and practice area questions asked by many clients — making it the most cacheable legal workflow. eDiscovery sits at the top because document volume is tied to litigation activity, which is unpredictable and can spike 10x overnight with a new matter.
For the full picture of how these numbers fit into broader AI cost growth, see why AI costs compound even when LLM prices fall.
Why Contract Analysis Costs 5–7x the Estimate
The budgeting error follows a consistent pattern. The legal team demos the contract analysis feature on NDAs — 3,000–5,000 tokens each, the most common contract type. The cost looks reasonable: roughly $0.01 per document at gpt-4.1 pricing. A volume estimate is built. The feature ships.
Then real production traffic arrives. The NDA is not the most common contract by revenue — it's the most common by count. The work that actually drives billing is commercial agreements: MSAs, software license agreements, commercial leases, supplier contracts. These run 15,000–40,000 tokens each. A single M&A purchase agreement exceeds 100,000 tokens.
LLM contract analysis earns its cost — it finds issues that manual review misses and speeds review by 60–80%. The problem is discovering the real cost after the feature is live, not before.
Want to see what contract analysis is actually costing per document?
Preto attributes LLM cost by feature and endpoint — so you see the per-document economics broken down by contract type before they surprise you.
See Your Cost Breakdown by FeatureOne URL change. See which features cost the most. Free to start.
The Privilege Caching Problem (and the Workaround)
Most legal tech engineering teams conclude early that attorney-client privilege prevents caching. The reasoning: you can't store privileged communications in a cache without risking inadvertent disclosure. So caching gets ruled out — and the team overpays on every request.
The constraint is real, but it's narrower than it appears. Privilege protects client communications and attorney work product. It does not apply to the analytical output your system generated about a document type.
For prompt caching: Regulatory context, statute summaries, clause classification schemas, and standard form definitions repeat across every document in a matter. Cache these as system prompt prefixes. A 2,000-token legal context block cached at OpenAI's prompt caching rate ($0.025/1M versus $2.00/1M uncached) saves $3,500/month at 100,000 requests per day.
For document-level results: Cache the extraction result — clause types identified, risk signals found, jurisdiction confirmed — against a hash of the document ID and version. The cache stores what your system learned about the document structure, not the document content itself. No privilege exposure.
For matter closeout: Tag cached entries with matter IDs. When a matter closes or a client departs, invalidate those entries. This satisfies both privilege management and any applicable data retention policies — and it's a few extra lines in the cache layer, not a reason to avoid caching entirely.
Three Optimizations That Work in Legal Environments
1. Cascaded document review for eDiscovery
The highest-ROI change in any high-volume legal LLM workflow. Run every document through a cheap classifier first — relevance screening, basic responsiveness determination, obvious non-responsive filtering:
In a typical eDiscovery matter, 60–75% of documents are non-responsive and need only a relevance determination. The cheap model handles those entirely. Only documents that pass first-pass screening go to the capable model for privilege review, key passage extraction, and substantive analysis. Blended cost drops 50–70% with no loss in review accuracy — the cases that require reasoning still get full model capability.
2. Boilerplate prefix caching for contract analysis
Standard contract forms are 60–80% boilerplate. The governing law clause, limitation of liability structure, IP assignment mechanics — these repeat with minor variation across thousands of contracts of the same type. Cache the system prompt with full clause-type definitions and risk criteria for each contract category (NDA, MSA, lease, SOW). The cache hit rate for same-category contracts typically exceeds 85%, turning a 2,000-token system prompt from a cost center into a near-zero line item.
3. Matter-level cost attribution for billing passthrough
Legal is the one vertical where LLM costs can often be recovered directly from clients — treated as a disbursement, similar to court filing fees or expert costs. But only if you can produce per-matter cost breakdowns. A proxy layer that tags every request with the matter ID and generates monthly cost reports by matter turns AI cost visibility into a billing asset. Teams that implement this typically recover 30–50% of their LLM spend through client billing.
Frequently Asked Questions
Why does contract analysis cost more than legal tech teams expect?
Can legal tech companies cache LLM responses given attorney-client privilege?
What is cascaded document review and how much does it save?
How do you attribute LLM costs per matter for client billing passthrough?
What is the typical per-document LLM cost for eDiscovery?
See which legal features are driving your LLM bill.
Preto attributes cost by feature, model, and endpoint in real time — so document review, contract analysis, and legal research each show their own cost line. One URL change, no code refactor.
See Your Cost Breakdown by FeatureFree forever up to 10K requests. No credit card required.