How We Log LLM Requests at Sub-50ms Latency Using ClickHouse

We were logging 50,000 LLM requests per day to PostgreSQL. Query latency was fine. At 400,000 requests, cost aggregation queries started taking 3 seconds. At 2 million, the database was the slowest thing in the stack — slower than the LLM calls themselves.

We switched to ClickHouse. Here's exactly what we did, why, and what the latency numbers look like now.

TL;DR

1. LLM request logs are append-only, high-cardinality, and read through aggregation queries. That's a ClickHouse workload — not PostgreSQL.
2. Switching dropped our cost dashboard p95 query latency from 3.2s to 180ms on 10M+ rows.
3. Our async write path keeps logging overhead under 2ms p95 — the client never waits for a log write.

The Schema That Broke PostgreSQL

Our initial log schema was straightforward:

CREATE TABLE llm_requests (
  id          UUID PRIMARY KEY,
  tenant_id   UUID NOT NULL,
  model       TEXT NOT NULL,
  provider    TEXT NOT NULL,
  feature     TEXT,
  user_id     TEXT,
  input_tokens  INT,
  output_tokens INT,
  cost_usd    NUMERIC(12,8),
  latency_ms  INT,
  cached      BOOLEAN,
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_tenant_created ON llm_requests (tenant_id, created_at DESC);
CREATE INDEX idx_model ON llm_requests (model);
CREATE INDEX idx_feature ON llm_requests (feature);

This worked fine at low volume. The problem was our query pattern. The dashboard runs queries like:

-- Cost by model, last 30 days
SELECT model, SUM(cost_usd), COUNT(*)
FROM llm_requests
WHERE tenant_id = $1
  AND created_at > NOW() - INTERVAL '30 days'
GROUP BY model;

-- Cost by feature, last 7 days
SELECT feature, SUM(cost_usd), AVG(latency_ms)
FROM llm_requests
WHERE tenant_id = $1
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY feature
ORDER BY SUM(cost_usd) DESC;

At 10 million rows, these GROUP BY queries scanned hundreds of thousands of rows per tenant despite the index. PostgreSQL is row-oriented — to compute SUM(cost_usd), it reads entire rows to extract one column. At scale, that's a lot of I/O for a field you could pre-aggregate.

Why LLM Logs Are a ClickHouse Workload

Three properties of LLM request logs make them a natural fit for ClickHouse:

Append-only writes. LLM logs are never updated or deleted (outside retention policies). ClickHouse is optimized for high-throughput inserts and efficient sequential reads — it doesn't pay the overhead PostgreSQL does for MVCC (multi-version concurrency control) on write-heavy workloads.

Columnar storage. When you query SUM(cost_usd) GROUP BY model, ClickHouse reads only the cost_usd and model columns from disk. PostgreSQL reads entire rows. For a table with 20 columns where you're aggregating 2, ClickHouse does 10% of the I/O.

Materialized views. ClickHouse can pre-aggregate data at insert time. When a new log row lands, a materialized view can increment a running total in a summary table. Your dashboard query hits the summary — not the raw logs. This is what takes the query from 3.2 seconds to 12ms.

For comparison: Langfuse uses PostgreSQL for its logging backend. It works well at lower volumes and for teams that need ACID transactions. If you're running more than 1M requests/month and querying with analytics patterns, you'll eventually feel the difference.

Want to see your LLM costs broken down by model and feature?

Preto logs every request to ClickHouse and surfaces per-feature cost attribution in real time.

See What Your LLM Spend Looks Like

Free forever for up to 10K requests. No credit card.

Our ClickHouse Schema

The production schema uses a MergeTree engine, partitioned by month, ordered by (tenant_id, created_at) to colocate each tenant's data on disk for fast tenant-scoped scans:

CREATE TABLE llm_requests (
  request_id    String,
  tenant_id     String,
  model         LowCardinality(String),
  provider      LowCardinality(String),
  feature       LowCardinality(String),
  user_id       String,
  input_tokens  UInt32,
  output_tokens UInt32,
  cost_usd      Float64,
  latency_ms    UInt32,
  cached        UInt8,
  created_at    DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (tenant_id, created_at)
SETTINGS index_granularity = 8192;

A few details worth noting:

LowCardinality(String) for model, provider, and feature — these columns have bounded value sets. ClickHouse dictionary-encodes them automatically, cutting storage by ~4x and improving scan speed.
No UUID primary key — ClickHouse doesn't use a traditional primary key for row lookups. The ORDER BY determines sort order and the sparse index ClickHouse builds internally.
Monthly partitions let us drop old data efficiently with ALTER TABLE DROP PARTITION instead of slow DELETE queries.

Materialized Views That Pre-Aggregate at Insert Time

This is the highest-leverage feature in ClickHouse for a logging use case. Every time a row is inserted into llm_requests, the materialized view fires and updates a running aggregate in a separate summary table.

-- Summary table (append-only, aggregated)
CREATE TABLE cost_by_model_daily (
  tenant_id   String,
  model       LowCardinality(String),
  day         Date,
  total_cost  AggregateFunction(sum, Float64),
  total_reqs  AggregateFunction(count, UInt64),
  avg_latency AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, model, day);

-- Materialized view that populates it
CREATE MATERIALIZED VIEW cost_by_model_mv
TO cost_by_model_daily AS
SELECT
  tenant_id,
  model,
  toDate(created_at) AS day,
  sumState(cost_usd)    AS total_cost,
  countState()          AS total_reqs,
  avgState(latency_ms)  AS avg_latency
FROM llm_requests
GROUP BY tenant_id, model, day;

Dashboard queries now read from cost_by_model_daily — a pre-aggregated table with orders of magnitude fewer rows than the raw log. The *State/*Merge function pattern is ClickHouse's way of storing partial aggregation states so they can be merged at query time.

Query latency before and after:

3.2s

PostgreSQL p95
10M rows, GROUP BY model

180ms

ClickHouse raw scan
same 10M rows

12ms

ClickHouse materialized view
pre-aggregated summary

The Async Write Path: How Logging Stays Under 2ms

Getting the query latency down was the first problem. The second was making sure logging never adds meaningful latency to the LLM request path.

Our architecture in Go:

// logCh buffers log entries in memory
var logCh = make(chan LogEntry, 10_000)

// In the request handler — fire and forget
func (p *Proxy) logRequest(entry LogEntry) {
  select {
  case logCh <- entry:
    // buffered successfully
  default:
    // channel full — drop rather than block the request
    metrics.Increment("log_dropped")
  }
}

// Background goroutine: batch flush every 500ms
func runLogFlusher(ch <-chan LogEntry, db *clickhouse.Conn) {
  ticker := time.NewTicker(500 * time.Millisecond)
  batch := make([]LogEntry, 0, 500)

  for {
    select {
    case entry := <-ch:
      batch = append(batch, entry)
      if len(batch) >= 500 {
        flushBatch(db, batch)
        batch = batch[:0]
      }
    case <-ticker.C:
      if len(batch) > 0 {
        flushBatch(db, batch)
        batch = batch[:0]
      }
    }
  }
}

The client gets its LLM response immediately. The log entry goes into a buffered channel. A background goroutine flushes to ClickHouse in batches of up to 500 entries every 500ms. If the channel is full (backpressure), we drop the log entry and increment a metric — we'd rather lose a log than slow down a request.

At Preto, this gives us p95 logging overhead under 2ms. The ClickHouse batch insert itself takes 10–30ms for 500 rows — invisible to any individual request.

ClickHouse vs. PostgreSQL: When to Switch

Factor	PostgreSQL	ClickHouse
Write pattern	Mixed read/write, ACID	Append-only inserts
Query pattern	Point lookups, joins	Aggregations, scans, GROUP BY
Performance ceiling	~5M rows before index degradation	Billions of rows, still sub-second
Operational complexity	Low — most teams know it	Medium — different mental model
Materialized views	Manual refresh required	Real-time, insert-triggered
Right choice when	<500K requests/month	>1M requests/month

Don't over-engineer. If you're logging under 500K LLM requests per month, a Postgres table with a (tenant_id, created_at) index is fine. Add ClickHouse when dashboard queries start timing out — not before.

If you're evaluating LLM observability tools and curious how different products approach the storage question, see how Preto compares to Helicone and Langfuse.

Frequently Asked Questions

Why is ClickHouse better than PostgreSQL for LLM request logging?

LLM request logs are append-only, high-cardinality, and read mostly through aggregation queries (GROUP BY model, GROUP BY feature, GROUP BY time window). This is exactly the workload ClickHouse is designed for — columnar storage, vectorized query execution, and pre-aggregation through materialized views. PostgreSQL's row-oriented storage struggles above a few million rows with analytics-heavy query patterns.

How much latency does logging add to LLM requests?

In Preto's architecture, logging adds under 2ms p95 to the request path. This is achieved through an async write pipeline: the response is streamed to the client immediately, while a Go channel buffers log entries and a background goroutine flushes them to ClickHouse in batches every 500ms. The client never waits for the log write.

What columns should I track in an LLM request log?

At minimum: request_id, tenant_id, model, provider, input_tokens, output_tokens, cost_usd, latency_ms, created_at, and a feature/endpoint tag. The feature tag is the highest-value addition — it enables per-feature cost attribution, which the OpenAI dashboard doesn't provide.

Does ClickHouse support real-time cost dashboards?

Yes — ClickHouse materialized views pre-aggregate data as it's inserted, so cost dashboard queries hit a pre-computed summary table instead of scanning raw log rows. Cost-by-model aggregations that take 3+ seconds on 10M PostgreSQL rows complete in under 200ms on ClickHouse with materialized views.

When should I stick with PostgreSQL for LLM logging?

If you're under 500K requests/month and don't need real-time cost dashboards, PostgreSQL is fine. The operational overhead of running ClickHouse isn't worth it at low volume. Migrate when your analytics queries start timing out — not before.

See what your LLM spend looks like, broken down by model and feature.

Preto logs every request through a ClickHouse-backed pipeline and surfaces per-feature cost attribution in real time. One URL change — no code refactor required.

Start for Free — 10K Requests Included

Free forever for up to 10K requests. No credit card.

Gaurav Dagade

Founder of Preto.ai. 11 years engineering leadership. Previously Engineering Manager at Bynry. Building the cost intelligence layer for AI infrastructure.

LinkedIn · Twitter