Software engineer with 11 years of engineering leadership. I built Preto.ai after watching teams repeatedly waste 40–60% of their LLM budget on model choices they never audited.
I kept seeing the same pattern: a team ships an AI feature using GPT-4. The first month's bill comes in and it's fine. By month three, costs have tripled and nobody knows why. The OpenAI dashboard shows total spend — not which feature, not which user, not which call is driving it.
So the team sends a Slack message: "Hey everyone, be mindful of LLM usage." Nothing changes. The CFO asks again next quarter.
The actual fix is usually simple: a GPT-4 call costing $0.06 could run on GPT-4o mini for $0.002 — same quality, 97% cheaper. But nobody makes that change because nobody can prove it's safe without data. Preto generates that data and ranks the changes by projected dollar impact, so teams can act on evidence rather than guessing.
I've spent most of my career building infrastructure: distributed systems, data pipelines, and API platforms at scale. Preto.ai is built on Go and ClickHouse — chosen specifically for sub-millisecond proxy overhead and petabyte-scale cost analytics. The proxy adds less than 50ms at p95, which is under 3% of a typical LLM call's total latency.
If you want the engineering details, I write about it on the Preto blog: how the proxy architecture works, why we chose Go over Rust and Python, and what semantic caching actually does in production.
I answer every email personally. If you're evaluating Preto, want to discuss your team's LLM cost situation, or just want to talk infrastructure — email me at gaurav@preto.ai.
If you'd rather dive straight in: start with the free tier — 10,000 requests, no credit card, first recommendation in 24 hours.