The slide was titled "Unit Economics." It said gross margin: 72%. I'd built the spreadsheet myself, copied the formula from the SaaS metrics deck Lenny published, and felt good about how the number compared to the traditional SaaS benchmarks I'd grown up reading.
Then a friend who runs an AI infra fund asked me where LLM costs sat in the model. I'd put them under "operating expenses" — the same bucket as Vercel hosting and the Postgres bill. He asked me to move them into COGS.
The 72% became 31%.
The product hadn't changed. The customers hadn't changed. The bill from OpenAI hadn't changed. What changed was that I stopped pretending the API spend was infrastructure and started counting it as what it actually is: variable cost of goods sold that scales with how much my users use the product.
1. If you're an AI founder and your LLM API spend isn't in COGS, your gross margin is wrong by 20–40 points — and your investors are going to find this during diligence.
2. The fix is real: model routing, prompt caching, and usage tier enforcement typically recover 20–40 points of gross margin without raising prices.
3. The teams that get this right early raise rounds. The teams that don't end up explaining a 30-point margin compression to existing investors who feel misled.
The Spreadsheet I'd Been Lying to Myself With
I run a $30/month AI writing tool. About 1,200 paying users. Most of them are light — they paste in three blog drafts a week and walk away. A handful of power users grind on it daily. I had been calculating margin like this:
Looked great. Felt great. Was wrong.
The OpenAI bill was sitting in a separate line on my P&L called "Software & Tools," next to my Notion subscription. About $13,500/month across 1,200 users — so an average of $11.25 per user, per month, in API spend. Nobody had told me to put that in COGS, so I hadn't.
Add it where it belongs:
That's the blended number. The blended number is also a lie — because the average hides what's happening at the tails.
Where the Margin Actually Goes
I segmented the OpenAI spend by user. The top decile — about 120 power users — accounted for roughly 55% of total API spend. Their per-user math:
The light users were subsidizing the power users. That's not a unit-economics model. That's a hope strategy.
I'm not the first person to hit this. Sam Altman publicly said in January 2025 that ChatGPT Pro at $200/month was losing money. Cursor's 2024 transition from "unlimited" to 500 fast requests/month on Pro was an explicit response to power-user economics. Anthropic added weekly usage limits to Claude Pro in 2024, citing "a small number of users consuming disproportionate resources." GitHub Copilot introduced "premium requests" tiering. Replit Agent moved to checkpoint-based pricing. Vercel v0 went from free to credit-based.
The pattern is the same across all of them: flat subscription pricing breaks at the power user. Either you fix the math at the cost side or you fix it at the price side. Most companies eventually do both.
The Investor Diligence Conversation Is Coming
Here's what an investor sees when they look at AI SaaS metrics in 2026:
Sequoia's "$600B Question" framed the gap between AI infrastructure spend and AI revenue. a16z's "Navigating the High Cost of AI Compute" made the case that compute is the dominant COGS line for AI products. Bessemer's State of the Cloud 2024 explicitly wrote about inference costs moving from R&D into COGS as products mature. Jamin Ball at Altimeter has been writing about public AI-exposed SaaS gross margins compressing into the 60s versus traditional SaaS 75–85%.
The investor reading your deck has internalized all of that. When they see 80% gross margin on an AI product, the first question is: is LLM cost in there? If the answer is no, the rest of the conversation is about whether you understand your own business.
I learned this the easy way — a friend who happens to invest in this space asked the question over coffee, not over a pitch. Most founders learn it the hard way.
Want to know which of your users are actually unprofitable?
Preto attributes API spend to user IDs and pricing tiers automatically. You see your real per-user margin — and which power users are eating the rest.
See Which Users Are Costing You MoneyYour LLM costs are variable COGS. Preto shows you per-user profitability.
The Four Levers I Pulled (Without Raising Prices)
I gave myself a quarter to fix the margin without changing the public price. Here's what worked.
Lever 1: Model routing. About 60% of my API spend was on Sonnet 4.6 doing tasks that Haiku 4.5 handles fine — outline expansion, formatting cleanup, simple rewrites. Haiku 4.5 is roughly 12x cheaper. I built a routing layer that classified the task type and sent the easy ones to Haiku, escalating only structurally complex generation to Sonnet. Tools like OpenRouter and NotDiamond can do this without you building the classifier yourself; both report 40–85% cost reduction on routable workloads.
Result: 28% reduction in monthly API spend.
Lever 2: Prompt caching. My system prompt was 1,800 tokens — full of writing-style instructions, brand voice rules, output formatting. Same prompt on every call. Anthropic's prompt caching launched in August 2024 with the official claim of up to 90% cost reduction on long shared prompts. Du'An Lightfoot publicly documented going from $720/mo to $72/mo on the same pattern.
I added the cache_control markers, made sure no dynamic content was in the prefix, and watched the input cost line drop.
Result: another 22% reduction on what was left after lever 1.
Lever 3: System prompt audit. I went line by line through my system prompt and removed anything that wasn't either (a) measurably improving output quality on my eval set or (b) actively preventing a known failure mode. Three of my five "few-shot examples" hadn't moved quality numbers in months. Half my formatting instructions were redundant after I switched to Structured Outputs. The prompt went from 1,800 tokens to 950.
Microsoft's LLMLingua research demonstrates up to 20x compression with about 1.5% performance loss. I didn't go that far. A manual audit got me almost 2x.
Result: another 12% reduction.
Lever 4: Usage tier enforcement. The hardest one to ship and the one I'd been avoiding. I added a soft cap at 100 generations per month on the $30 tier and offered a $79 "power" tier with 500 generations. The 120 power users either upgraded (about half did) or hit the cap (the other half). The downgrade churn from this was real — 14 of them cancelled — but the math on the remaining 106 went from −$32/user/month to either +$15 (cap users) or +$23 (upgraded users).
Result: stopped bleeding on the long tail.
Where the Margin Landed
After all four levers, the new per-user math:
From the real 47.8% to a real 70.2%. Not the fake 85% I started with. Not the panic-induced 31% I got after I correctly classified COGS without doing any optimization. The actual number, with LLM costs in the right place and the four levers actually pulled.
The thing I didn't appreciate until I'd done all the work: the optimization isn't a one-time project. The day I shipped the routing layer, OpenAI released a new model. The day I added prompt caching, my product team wanted to add a feature that broke the cache prefix. Margin is a moving target on AI products in a way it isn't on traditional SaaS, because the cost side is volatile. Pricing pages don't change every month. Model pricing does. New features land in the prompt prefix and silently break caching. Power users find new ways to grind on the product.
You don't fix this once. You instrument for it once and then keep watching.
What I Wish I'd Done at $5K MRR
If you're earlier-stage than I was when I learned this, here's the cheap version of the lesson:
Put LLM costs in COGS from the day you charge your first dollar. It doesn't matter if your accountant's GAAP standard hasn't caught up — your investors and your future-self CFO already think this way. The discipline of looking at the right number every month is worth more than the small amount of extra modeling work it takes.
Track per-user cost from day one, not from the day you raise. Tag every API request with a user ID. The infrastructure is one URL change — a proxy that does this for you exists, and the cost of setting it up before you have a problem is less than 1% of the cost of discovering you have a problem after you've raised on a misleading metric.
This is the work I cover in more depth in why cost-per-request is the number you should be losing sleep over and the unit economics nobody shows on AI SaaS pitch decks. Both go deeper on the implementation than the founder-narrative I've kept this post in.
The TL;DR for everyone who hasn't done it yet: your gross margin number is wrong, and the fix is straightforward. Move LLM costs into COGS where they belong. Then start pulling the four levers above, in roughly that order. You'll recover most of the gap. The teams that don't are going to be the ones explaining a 30-point margin surprise during their Series A diligence.
Frequently Asked Questions
Should LLM API costs be classified as COGS or operating expenses?
What is a realistic gross margin for an AI startup with LLM costs in COGS?
Why are AI startups losing money on power users?
How do you fix gross margin without raising prices?
What is the difference between gross margin and contribution margin for AI startups?
Find your real gross margin in 5 minutes.
Preto attributes LLM spend to users, tiers, and features automatically. You see per-user profitability — and which of your power users are silently eating the rest of your margin. One URL change. No code refactor.
See Which Users Are Costing You MoneyFree forever up to 10K requests. No credit card required.