Everyone calls their product a "gateway" now. LiteLLM markets itself as both a proxy and a gateway. Portkey is a gateway. Kong has an LLM plugin. Helicone's docs use proxy and gateway interchangeably. There's a well-cited Medium post by Bijit Ghosh that ranks on Google for this comparison — it gives correct high-level definitions but stops before the implementation details that tell you what to actually choose and deploy.
Here's the precise version: three different layers, concrete code for each, and a decision framework based on team size and scale.
Proxy = transport layer. Pipes requests from your app to the provider. Handles how traffic gets there.
Router = decision layer. Chooses which model or provider handles the request. Handles where traffic goes.
Gateway = policy layer. Auth, rate limiting, budget enforcement, compliance. Handles who can send traffic and under what rules.
In practice: these aren't three separate products — they're three layers. Most "gateways" bundle all three. What you need depends on your scale.
Policy Layer
Auth, multi-tenant API key management, per-team rate limits, budget enforcement, compliance controls, audit logging. Governs who can send traffic and under what constraints.
Decision Layer
Model selection, provider fallback, cost-based routing, load balancing. Decides where each request goes based on task complexity, budget, or availability.
Transport Layer
HTTP forwarding, connection pooling, TLS, request/response capture, streaming passthrough. Gets the request from your app to the provider.
The Proxy: Transport Layer
A proxy is the simplest component. It intercepts your HTTP request and forwards it to the provider. Your application changes one thing: the base_url. Everything else stays the same.
// Before
client := openai.NewClient(apiKey)
// After — same client, same code, different URL
client := openai.NewClient(
apiKey,
openai.WithBaseURL("https://proxy.your-company.com/v1"),
)
A minimal Go proxy handler:
func (p *Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// 1. Swap auth header: client key → upstream provider key
r.Header.Set("Authorization", "Bearer "+p.providerKey)
// 2. Forward to upstream
target, _ := url.Parse("https://api.openai.com")
proxy := httputil.NewSingleHostReverseProxy(target)
proxy.ServeHTTP(w, r)
}
That's the core. A proxy by itself doesn't decide anything — it doesn't choose GPT-4o over GPT-4o-mini, it doesn't enforce rate limits, it doesn't require auth beyond whatever the client presents. It pipes traffic. Everything else is built on top of this foundation.
Where a bare proxy earns its place: even without routing or policies, it captures every request for cost attribution and latency measurement — immediately, before you've written a line of routing logic.
The Router: Decision Layer
A router decides which model and provider handle each request. It doesn't touch transport — it returns a routing decision that the proxy executes.
There are three types of routing decisions. The type determines how much cost reduction you actually get:
Cost-based routing — send simple tasks to cheaper models:
type RoutingDecision struct {
Model string
Provider string
}
func (r *Router) Route(req *ChatRequest) RoutingDecision {
// Estimate task complexity from prompt length and system instruction
complexity := r.estimateComplexity(req)
switch {
case complexity < 0.3:
// Short, simple: classification, extraction, boolean questions
return RoutingDecision{Model: "gpt-4.1-nano", Provider: "openai"}
case complexity < 0.7:
// Medium: summarization, translation, structured output
return RoutingDecision{Model: "gpt-4.1", Provider: "openai"}
default:
// Complex: multi-step reasoning, code generation, analysis
return RoutingDecision{Model: "claude-opus-4-6", Provider: "anthropic"}
}
}
func (r *Router) estimateComplexity(req *ChatRequest) float64 {
totalTokens := estimateTokens(req.Messages)
hasSystemPrompt := req.Messages[0].Role == "system"
isMultiTurn := len(req.Messages) > 3
score := float64(totalTokens) / 2000.0
if hasSystemPrompt { score += 0.1 }
if isMultiTurn { score += 0.2 }
return min(score, 1.0)
}
Failover routing — fall back when the primary provider is unavailable:
var providerChain = []RoutingDecision{
{Model: "gpt-4.1", Provider: "openai"},
{Model: "claude-sonnet-4-6", Provider: "anthropic"},
{Model: "gemini-2.5-pro", Provider: "google"},
}
func (r *Router) RouteWithFailover(req *ChatRequest) RoutingDecision {
for _, candidate := range providerChain {
if r.circuit.IsAvailable(candidate.Provider) {
return candidate
}
}
return providerChain[len(providerChain)-1] // last resort
}
Metadata-based routing — route based on request headers or tags your app sets:
func (r *Router) RouteByTag(req *ChatRequest, headers http.Header) RoutingDecision {
switch headers.Get("X-Preto-Feature") {
case "support-bot":
return RoutingDecision{Model: "gpt-4.1-nano", Provider: "openai"}
case "code-review":
return RoutingDecision{Model: "claude-sonnet-4-6", Provider: "anthropic"}
case "report-generation":
return RoutingDecision{Model: "gpt-4.1", Provider: "openai"}
default:
return r.Route(req) // fall back to complexity-based routing
}
}
The router is pure business logic — no HTTP, no transport. This separation makes it testable independently of the proxy layer, and swappable — you can change routing logic without touching the proxy.
Want to see which requests are costing more than they should?
Preto flags over-routed requests and projects the savings from routing them to cheaper models.
See Your LLM Costs Free — 10K Requests IncludedNo credit card required. Works with OpenAI, Anthropic, and more.
The Gateway: Policy Layer
A gateway sits above the router and proxy and adds one thing: policy enforcement. The defining characteristic is that the gateway doesn't just route traffic — it governs it. Who can send requests, how many, at what cost, with what audit trail.
In Go, a gateway is a middleware chain that wraps the proxy handler:
func BuildGateway(proxy http.Handler) http.Handler {
return chain(
AuthMiddleware, // validate internal API key → map to tenant identity
RateLimitMiddleware, // per-tenant request and token rate limits
BudgetMiddleware, // per-team monthly budget enforcement
AuditMiddleware, // log every request with identity + policy decision
proxy, // finally: forward to the router + proxy
)
}
func AuthMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
key := r.Header.Get("Authorization")
tenant, err := db.LookupTenant(key)
if err != nil {
http.Error(w, "unauthorized", 401)
return
}
// Inject tenant identity — replace client key with upstream provider key
r = r.WithContext(context.WithValue(r.Context(), tenantKey, tenant))
r.Header.Set("Authorization", "Bearer "+tenant.ProviderKey)
next.ServeHTTP(w, r)
})
}
func BudgetMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
tenant := r.Context().Value(tenantKey).(*Tenant)
if tenant.MonthlySpend >= tenant.BudgetLimit {
http.Error(w, `{"error":"budget_exceeded"}`, 429)
return
}
next.ServeHTTP(w, r)
})
}
The key difference from a proxy: the gateway has a concept of identity. It knows which team or user is sending the request and makes policy decisions based on that. Which means you can charge teams back, enforce per-team budgets, and prove compliance — none of which is possible when you're flying blind on who sent what. A proxy is stateless with respect to the caller. A gateway is not.
How Products Map to These Layers
| Product | Proxy | Router | Gateway | Cost Intelligence |
|---|---|---|---|---|
| LiteLLM | ✓ | ✓ (100+ providers) | Partial (basic team mgmt) | — |
| Helicone | ✓ | — | Partial (rate limits, no budget) | Basic |
| Portkey | ✓ | ✓ | ✓ (full enterprise) | Basic |
| Langfuse | — (async observer) | — | — | Basic |
| Preto | ✓ | ✓ | ✓ | ✓ (per-feature waste + projected savings) |
One thing to know about Langfuse: it's an async observer — it doesn't sit in the request path at all. It reads logs after the fact through SDK hooks or API polling. Zero proxy latency, but also no caching, no routing, and no real-time budget enforcement. It's a deliberate architectural choice — just a different layer entirely.
What You Actually Need
The right choice depends on your scale:
One model, one team, under $2K/month. Skip all three. Call the OpenAI SDK directly. Add a proxy for logging once you have real production traffic to observe.
Multiple models, cost visibility needed. Add a proxy with cost logging and a router. This is the inflection point — the LLM bill exceeds what any one person can attribute to a specific feature. One URL change gives you per-request cost attribution and the ability to route simple tasks to cheaper models. Teams typically see 20–40% cost reduction within the first week of enabling model routing.
Multiple teams, budget enforcement needed. You need a gateway. The moment two teams share an OpenAI API key and neither can see what the other is spending, you have a governance problem. A bill spike hits. Nobody knows which team caused it. Nobody can be held accountable. A gateway with per-team budget limits and identity-mapped logging solves this without requiring every team to instrument their own cost tracking.
Compliance requirements (SOC 2, HIPAA, GDPR). Gateway, with audit logging and PII controls. The audit trail needs to capture which team sent which request to which model with what parameters — and prove that no PII left the approved perimeter. A gateway gives you the audit trail to prove it.
If you're evaluating your options, see how the major products compare in more detail: Helicone, Langfuse, LangSmith, and Datadog. Or read the deep dive on what happens inside the proxy layer itself.
Frequently Asked Questions
What is an LLM proxy?
What is an LLM router?
What is an LLM gateway?
Do I need a gateway or just a proxy?
What's the difference between LiteLLM and Portkey?
All three layers — proxy, router, and gateway — in one URL change.
Preto sits between your app and your LLM provider. Every request is logged, routed, and attributed to the team and feature that triggered it. See your costs, waste, and exactly which requests to route differently — in real time.
See Your LLM Costs Free — Start in 5 MinutesAdds <50ms p95 overhead — under 1% of a typical LLM call.
Free forever up to 10K requests. No credit card required.