Analytics & observability
Three complementary observability systems cover different granularities:
| System | Granularity | Use case |
|---|---|---|
| PostgreSQL materialized views | Daily aggregates | Cost attribution, team leaderboards, billing |
| Langfuse | Per-request traces | Debugging, prompt quality, latency analysis |
| Prometheus | Real-time counters/histograms | Alerting, dashboards, SLO tracking |
PostgreSQL usage data
Section titled “PostgreSQL usage data”Every request records a UsageRecord with: user_id, team_id, model, prompt_tokens, completion_tokens, cost_usd, latency_ms, created_at.
A PostgreSQL materialized view (usage_daily) pre-aggregates these by (day, user_id, team_id, model) and is refreshed hourly in the background.
Query via admin API
Section titled “Query via admin API”# Usage for a specific user over the last 7 dayscurl "http://localhost:8000/internal/usage?user_id=user_01j...&since=2025-01-01" \ -H "Authorization: Bearer $PROXY_MASTER_KEY"# Team leaderboard — top token consumers this monthcurl "http://localhost:8000/internal/usage/leaderboard?dimension=team&metric=tokens&since=2025-01-01" \ -H "Authorization: Bearer $PROXY_MASTER_KEY"Direct SQL
Section titled “Direct SQL”-- Daily cost by team, last 30 daysSELECT day, team_id, SUM(cost_usd) AS total_cost, SUM(total_tokens) AS total_tokensFROM usage_dailyWHERE day >= NOW() - INTERVAL '30 days'GROUP BY day, team_idORDER BY day DESC, total_cost DESC;Langfuse
Section titled “Langfuse”Langfuse provides per-request traces with prompt/completion content, token counts, latency, and cost.
Enable
Section titled “Enable”analytics: enabled: true provider: langfuseSet credentials:
LANGFUSE_PUBLIC_KEY=pk-lf-...LANGFUSE_SECRET_KEY=sk-lf-...LANGFUSE_HOST= # empty = Langfuse Cloud; set for self-hostedHelm:
secrets: langfusePublicKey: "pk-lf-..." langfuseSecretKey: "sk-lf-..." langfuseHost: ""Self-hosted Langfuse
Section titled “Self-hosted Langfuse”The docker-compose.yml includes a Langfuse stack:
docker compose --profile langfuse up -dThen set LANGFUSE_HOST=http://langfuse:3000.
Prometheus
Section titled “Prometheus”All key proxy metrics are exposed at /metrics in Prometheus text format. See Health & metrics for the full metric list.
Recommended alerts:
# Alert: high rate limit hit rate- alert: HighRateLimitHits expr: rate(relay_rate_limit_hits_total[5m]) > 1 for: 5m
# Alert: p95 latency over 5s- alert: HighLatency expr: histogram_quantile(0.95, relay_request_duration_seconds_bucket) > 5 for: 10m
# Alert: upstream errors- alert: UpstreamErrors expr: rate(relay_requests_total{status="502"}[5m]) > 0.1 for: 2mGrafana dashboard
Section titled “Grafana dashboard”A suggested panel layout:
- Request rate —
rate(relay_requests_total[5m])by model - Latency p50/p95/p99 — histogram quantiles
- Token throughput —
rate(relay_tokens_total[5m])split prompt/completion - Rate limit hits —
rate(relay_rate_limit_hits_total[5m])by limit type - Cache hit rate —
rate(relay_cache_hits_total[5m]) / rate(relay_requests_total[5m]) - PII entities scrubbed —
rate(relay_pii_entities_total[5m])by entity type - Daily cost (from PostgreSQL or Langfuse via data source plugin)