Everyone calls their product a "gateway" now. LiteLLM markets itself as both a proxy and a gateway. Portkey is a gateway. Kong has an LLM plugin. Helicone's docs use proxy and gateway interchangeably. There's a well-cited Medium post by Bijit Ghosh that ranks on Google for this comparison — it gives correct high-level definitions but stops before the implementation details that tell you what to actually choose and deploy.

Here's the precise version: three different layers, concrete code for each, and a decision framework based on team size and scale.

TL;DR

Proxy = transport layer. Pipes requests from your app to the provider. Handles how traffic gets there.
Router = decision layer. Chooses which model or provider handles the request. Handles where traffic goes.
Gateway = policy layer. Auth, rate limiting, budget enforcement, compliance. Handles who can send traffic and under what rules.
In practice: these aren't three separate products — they're three layers. Most "gateways" bundle all three. What you need depends on your scale.

Gateway

Policy Layer

Auth, multi-tenant API key management, per-team rate limits, budget enforcement, compliance controls, audit logging. Governs who can send traffic and under what constraints.

Router

Decision Layer

Model selection, provider fallback, cost-based routing, load balancing. Decides where each request goes based on task complexity, budget, or availability.

Proxy

Transport Layer

HTTP forwarding, connection pooling, TLS, request/response capture, streaming passthrough. Gets the request from your app to the provider.

The Proxy: Transport Layer

A proxy is the simplest component. It intercepts your HTTP request and forwards it to the provider. Your application changes one thing: the base_url. Everything else stays the same.

// Before
client := openai.NewClient(apiKey)

// After — same client, same code, different URL
client := openai.NewClient(
    apiKey,
    openai.WithBaseURL("https://proxy.your-company.com/v1"),
)

A minimal Go proxy handler:

func (p *Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
  // 1. Swap auth header: client key → upstream provider key
  r.Header.Set("Authorization", "Bearer "+p.providerKey)

  // 2. Forward to upstream
  target, _ := url.Parse("https://api.openai.com")
  proxy := httputil.NewSingleHostReverseProxy(target)
  proxy.ServeHTTP(w, r)
}

That's the core. A proxy by itself doesn't decide anything — it doesn't choose GPT-4o over GPT-4o-mini, it doesn't enforce rate limits, it doesn't require auth beyond whatever the client presents. It pipes traffic. Everything else is built on top of this foundation.

Where a bare proxy earns its place: even without routing or policies, it captures every request for cost attribution and latency measurement — immediately, before you've written a line of routing logic.

The Router: Decision Layer

A router decides which model and provider handle each request. It doesn't touch transport — it returns a routing decision that the proxy executes.

There are three types of routing decisions. The type determines how much cost reduction you actually get:

Cost-based routing — send simple tasks to cheaper models:

type RoutingDecision struct {
  Model    string
  Provider string
}

func (r *Router) Route(req *ChatRequest) RoutingDecision {
  // Estimate task complexity from prompt length and system instruction
  complexity := r.estimateComplexity(req)

  switch {
  case complexity < 0.3:
    // Short, simple: classification, extraction, boolean questions
    return RoutingDecision{Model: "gpt-4.1-nano", Provider: "openai"}
  case complexity < 0.7:
    // Medium: summarization, translation, structured output
    return RoutingDecision{Model: "gpt-4.1", Provider: "openai"}
  default:
    // Complex: multi-step reasoning, code generation, analysis
    return RoutingDecision{Model: "claude-opus-4-6", Provider: "anthropic"}
  }
}

func (r *Router) estimateComplexity(req *ChatRequest) float64 {
  totalTokens := estimateTokens(req.Messages)
  hasSystemPrompt := req.Messages[0].Role == "system"
  isMultiTurn := len(req.Messages) > 3

  score := float64(totalTokens) / 2000.0
  if hasSystemPrompt { score += 0.1 }
  if isMultiTurn     { score += 0.2 }
  return min(score, 1.0)
}

Failover routing — fall back when the primary provider is unavailable:

var providerChain = []RoutingDecision{
  {Model: "gpt-4.1",           Provider: "openai"},
  {Model: "claude-sonnet-4-6", Provider: "anthropic"},
  {Model: "gemini-2.5-pro",    Provider: "google"},
}

func (r *Router) RouteWithFailover(req *ChatRequest) RoutingDecision {
  for _, candidate := range providerChain {
    if r.circuit.IsAvailable(candidate.Provider) {
      return candidate
    }
  }
  return providerChain[len(providerChain)-1] // last resort
}

Metadata-based routing — route based on request headers or tags your app sets:

func (r *Router) RouteByTag(req *ChatRequest, headers http.Header) RoutingDecision {
  switch headers.Get("X-Preto-Feature") {
  case "support-bot":
    return RoutingDecision{Model: "gpt-4.1-nano", Provider: "openai"}
  case "code-review":
    return RoutingDecision{Model: "claude-sonnet-4-6", Provider: "anthropic"}
  case "report-generation":
    return RoutingDecision{Model: "gpt-4.1", Provider: "openai"}
  default:
    return r.Route(req) // fall back to complexity-based routing
  }
}

The router is pure business logic — no HTTP, no transport. This separation makes it testable independently of the proxy layer, and swappable — you can change routing logic without touching the proxy.

Want to see which requests are costing more than they should?

Preto flags over-routed requests and projects the savings from routing them to cheaper models.

See Your LLM Costs Free — 10K Requests Included

No credit card required. Works with OpenAI, Anthropic, and more.

The Gateway: Policy Layer

A gateway sits above the router and proxy and adds one thing: policy enforcement. The defining characteristic is that the gateway doesn't just route traffic — it governs it. Who can send requests, how many, at what cost, with what audit trail.

In Go, a gateway is a middleware chain that wraps the proxy handler:

func BuildGateway(proxy http.Handler) http.Handler {
  return chain(
    AuthMiddleware,       // validate internal API key → map to tenant identity
    RateLimitMiddleware,  // per-tenant request and token rate limits
    BudgetMiddleware,     // per-team monthly budget enforcement
    AuditMiddleware,      // log every request with identity + policy decision
    proxy,                // finally: forward to the router + proxy
  )
}

func AuthMiddleware(next http.Handler) http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    key := r.Header.Get("Authorization")
    tenant, err := db.LookupTenant(key)
    if err != nil {
      http.Error(w, "unauthorized", 401)
      return
    }
    // Inject tenant identity — replace client key with upstream provider key
    r = r.WithContext(context.WithValue(r.Context(), tenantKey, tenant))
    r.Header.Set("Authorization", "Bearer "+tenant.ProviderKey)
    next.ServeHTTP(w, r)
  })
}

func BudgetMiddleware(next http.Handler) http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    tenant := r.Context().Value(tenantKey).(*Tenant)
    if tenant.MonthlySpend >= tenant.BudgetLimit {
      http.Error(w, `{"error":"budget_exceeded"}`, 429)
      return
    }
    next.ServeHTTP(w, r)
  })
}

The key difference from a proxy: the gateway has a concept of identity. It knows which team or user is sending the request and makes policy decisions based on that. Which means you can charge teams back, enforce per-team budgets, and prove compliance — none of which is possible when you're flying blind on who sent what. A proxy is stateless with respect to the caller. A gateway is not.

How Products Map to These Layers

Product Proxy Router Gateway Cost Intelligence
LiteLLM ✓ (100+ providers) Partial (basic team mgmt)
Helicone Partial (rate limits, no budget) Basic
Portkey ✓ (full enterprise) Basic
Langfuse — (async observer) Basic
Preto ✓ (per-feature waste + projected savings)

One thing to know about Langfuse: it's an async observer — it doesn't sit in the request path at all. It reads logs after the fact through SDK hooks or API polling. Zero proxy latency, but also no caching, no routing, and no real-time budget enforcement. It's a deliberate architectural choice — just a different layer entirely.

What You Actually Need

The right choice depends on your scale:

One model, one team, under $2K/month. Skip all three. Call the OpenAI SDK directly. Add a proxy for logging once you have real production traffic to observe.

Multiple models, cost visibility needed. Add a proxy with cost logging and a router. This is the inflection point — the LLM bill exceeds what any one person can attribute to a specific feature. One URL change gives you per-request cost attribution and the ability to route simple tasks to cheaper models. Teams typically see 20–40% cost reduction within the first week of enabling model routing.

Multiple teams, budget enforcement needed. You need a gateway. The moment two teams share an OpenAI API key and neither can see what the other is spending, you have a governance problem. A bill spike hits. Nobody knows which team caused it. Nobody can be held accountable. A gateway with per-team budget limits and identity-mapped logging solves this without requiring every team to instrument their own cost tracking.

Compliance requirements (SOC 2, HIPAA, GDPR). Gateway, with audit logging and PII controls. The audit trail needs to capture which team sent which request to which model with what parameters — and prove that no PII left the approved perimeter. A gateway gives you the audit trail to prove it.

If you're evaluating your options, see how the major products compare in more detail: Helicone, Langfuse, LangSmith, and Datadog. Or read the deep dive on what happens inside the proxy layer itself.

Frequently Asked Questions

What is an LLM proxy?
An LLM proxy is the transport layer between your application and an LLM provider. It intercepts HTTP requests and forwards them to the provider — intercepting for logging, auth key swapping, and request capture. It does not decide which model handles the request. Your app changes one thing: the base_url.
What is an LLM router?
An LLM router is the decision layer that chooses which model, provider, or endpoint handles a request. Routing decisions are based on task complexity, cost budget, provider availability, or custom metadata tags. The router decides where traffic goes; the proxy handles transport.
What is an LLM gateway?
An LLM gateway is the policy layer — it includes proxy and routing and adds multi-tenant auth, per-team rate limiting, budget enforcement, compliance controls, and audit logging. It has a concept of identity: it knows which team or user is sending each request and makes policy decisions based on that. Think API gateway (Kong, Nginx), but LLM-aware.
Do I need a gateway or just a proxy?
One team, one model, under $1K/month: a proxy with basic logging is enough. Add a router when you have multiple models or need cost-based routing. You need a gateway when you have multiple teams, compliance requirements, or need per-team budget enforcement and audit trails.
What's the difference between LiteLLM and Portkey?
LiteLLM is primarily a proxy and router — it normalizes the API across 100+ providers and handles fallback routing. Portkey is more gateway-oriented — it adds team management, access controls, compliance features, and enterprise audit trails. Both are expanding their feature sets and use the terms inconsistently in their own marketing.

All three layers — proxy, router, and gateway — in one URL change.

Preto sits between your app and your LLM provider. Every request is logged, routed, and attributed to the team and feature that triggered it. See your costs, waste, and exactly which requests to route differently — in real time.

See Your LLM Costs Free — Start in 5 Minutes

Adds <50ms p95 overhead — under 1% of a typical LLM call.

Free forever up to 10K requests. No credit card required.

Gaurav Dagade
Gaurav Dagade

Founder of Preto.ai. 11 years engineering leadership. Previously Engineering Manager at Bynry. Building the cost intelligence layer for AI infrastructure.

LinkedIn · Twitter