How much can model routing and caching reduce LLM costs?

Model routing typically reduces costs by 20-40% by sending simple tasks to cheaper models. Prompt caching adds another 15-25% savings on duplicate requests. Combined, teams see 40-60% cost reduction with no quality loss.

Should LLM API costs be classified as COGS or infrastructure?

LLM API costs should be classified as variable COGS, not fixed infrastructure. Unlike traditional SaaS hosting that scales sublinearly, LLM costs scale directly with user engagement — when usage goes up, the cost goes up proportionally.

Free AI Unit Economics Calculator — Real Gross Margin Per Tier

Get instant access to the calculator

Enter your work email and start calculating your real AI unit economics.

Real gross margin per pricing tier
Power user break-even threshold
LLM intensity ratio health check
Optimization impact (routing + caching)

No spam. Unsubscribe anytime.

Step 1

Pricing Tiers

Add each tier. Set price to $0 for free tiers.

Tier

$/mo

Users

Queries/day

Tier

$/mo

Users

Queries/day

Step 2

Model & Tokens

Primary model and average tokens per query.

Provider

Model

Input tokens

Output tokens

Calls/action

1 = simple, 5-15 = agentic

Step 3

Other Costs / User

Non-LLM per-user monthly costs.

Hosting

Support

Payment %

Optimization

What-If Scenario

Toggle to see optimized vs. current margins.

Model Routing

Route simple tasks cheaper

Prompt Caching

Cache duplicate responses

% to cheap model

Cheap model

Cache hit rate %

Typical: 15-25%

Gross Margin

blended, all tiers

LLM / User

avg per month

Intensity Ratio

LLM spend / MRR

LLM Spend

total monthly

Margin by Tier

Tier	Price	Users	LLM/User	Profit	Margin

Health Check

Power User Break-Even

Max queries/day before margin goes negative

Want your real numbers?

Preto tracks LLM costs per feature, per user, per tier. One URL change.

See Real Unit Economics — Free

Free up to 10K requests. No credit card.

Frequently Asked Questions

What is the real gross margin for AI SaaS products?

Most AI SaaS products show 70-80% gross margins in pitch decks, but after properly classifying LLM API costs as variable COGS, the real number is often 40-55% unoptimized. With model routing and caching applied, teams typically reach 60-68%. The gap depends on your model choice, query volume per user, and whether you have agentic workflows multiplying calls.

How do I calculate LLM cost per user?

Multiply daily queries per user by 30 (for monthly), then multiply by average tokens per query. Divide by 1 million and multiply by your model's per-million-token price. For agentic workflows, multiply the query count by calls per action (typically 5-15). The result is your LLM COGS per user per month.

What is the LLM intensity ratio?

The LLM intensity ratio is your total monthly LLM API spend divided by your MRR. It tells you how much of each revenue dollar goes to AI costs. Below 20% is comfortable, 20-30% needs monitoring, above 30% requires active cost management, and above 50% is a crisis. Track it monthly — when it grows faster than MRR, your margin is compressing in real time.

Should LLM API costs be classified as COGS?

Yes. LLM API costs should be classified as variable COGS, not fixed infrastructure. Unlike traditional SaaS hosting costs that scale sublinearly with users, LLM costs scale directly and linearly with user engagement. This distinction matters for accurate gross margin calculation and for investor due diligence.

How much can model routing and caching reduce costs?

Model routing typically saves 20-40% by sending simple tasks (classification, extraction, yes/no questions) to cheap models at $0.10-0.60/M tokens instead of frontier models at $2-15/M tokens. Prompt caching saves another 15-25% on duplicate requests. Combined, most teams see 40-60% total reduction without any quality loss on the tasks that matter.

AI Unit Economics Calculator

Get instant access to the calculator

Pricing Tiers

Model & Tokens

Other Costs / User

What-If Scenario

Margin by Tier

With Optimization

Health Check

Power User Break-Even

Want your real numbers?

How to Calculate AI SaaS Unit Economics

Why Blended Margins Are Misleading

The LLM Intensity Ratio

The Power User Problem

From Estimates to Real Data

Frequently Asked Questions