One gateway
for every LLM
Drop-in OpenAI & Anthropic compatible proxy with enterprise controls — rate limiting, PII scrubbing, RAG, content policy, caching, and full usage analytics. Deploy to Kubernetes in minutes.
Works with every major LLM provider via LiteLLM
Enterprise controls, zero config
Every feature you need to safely operate LLMs at scale — out of the box, configurable via a single YAML file.
Content Policy
Block prompt injection, jailbreak attempts, and custom patterns before they reach the model. Configurable blocklists with regex support.
PII Scrubbing
Automatic detection and redaction of names, emails, phone numbers, SSNs, credit cards, and IPs using spaCy NER. Restored in responses.
Rate Limiting
Per-user and per-team limits on requests/minute, tokens/minute, and tokens/day. Redis backend for multi-replica deployments.
RAG Integration
Automatic context injection from your ChromaDB knowledge base. Semantic search with configurable top-k and score thresholds.
Response Caching
Semantic cache reduces latency and cost. Redis-backed for clusters, in-memory for single-node. Configurable TTL.
Usage Analytics
Daily materialized views for cost, token consumption, and request volume. Team and user leaderboards. Langfuse integration.
Team Management
Multi-team support with per-team API keys and rate limits. Google OAuth SSO creates keys automatically on first login.
Kubernetes Native
Production Helm chart with Bitnami PostgreSQL and Redis subcharts. Auto-generated master key, PVC management, HPA.
Every request through 9 stages
A hardened pipeline that runs on every inference request — from authentication to metered usage recording.
Works with your existing code
Change one line — the base URL. Every SDK, tool, and framework that supports OpenAI or Anthropic works with Geeper Relay.
Enterprise ready
Security and observability built in from day one — not bolted on.
Security
Defence in depth
- HMAC-signed OAuth stateStateless SSO — no server-side session storage, safe across replicas
- API key hashingKeys stored as SHA-256 hashes — compromised DB cannot leak secrets
- PII never leaves your networkEntities scrubbed before inference, restored locally in responses
- Content policy enforcementBlock prompt injection and jailbreak patterns at the gateway
- Non-root containersrunAsUser 1001, allowPrivilegeEscalation: false, drop ALL capabilities
- Secret auto-rotation safeMaster key preserved across helm upgrades via lookup() + resource-policy: keep
Observability
Know what's happening
- Prometheus metrics/metrics endpoint with ServiceMonitor for Kube-Prometheus-Stack
- Structured loggingJSON logs with request ID, user, model, tokens, latency per request
- Daily usage materializationPostgreSQL materialized view refreshed hourly — zero-cost queries
- Team leaderboardsCost, token, and request rankings across teams and users
- Langfuse integrationFull trace export: prompt, completion, latency, cost per request
- Health endpoints/healthz (liveness) and /readyz (readiness) — Kubernetes native
Get started in minutes
Two paths: deploy to Kubernetes with Helm, or run locally with Docker Compose.
Kubernetes (Helm)
Production deployment
Local / Docker Compose
Development & testing
Enable Google SSO for your team
Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET to activate
/auth/login. Team members log in with Google and receive their API key automatically — no admin intervention needed.