Open source · Self-hosted · Production ready

One gateway
for every LLM

Drop-in OpenAI & Anthropic compatible proxy with enterprise controls — rate limiting, PII scrubbing, RAG, content policy, caching, and full usage analytics. Deploy to Kubernetes in minutes.

9
Pipeline stages
3+
LLM providers
100%
API compatible
K8s
Helm chart included
terminal
# Point any OpenAI client to your proxy
export OPENAI_BASE_URL = "https://proxy.internal/v1"
# Works with Anthropic SDK too
export ANTHROPIC_BASE_URL = "https://proxy.internal"
# Or with Claude Code CLI
claude --api-url https://proxy.internal \
--api-key your-team-api-key
# Deploy with Helm
helm install relay ./helm/relay \
--set secrets.openaiApiKey = $OPENAI_KEY
Ready on port 8000

Works with every major LLM provider via LiteLLM

OpenAI
Anthropic
Azure OpenAI
Google Gemini
Mistral
100+ via LiteLLM

Enterprise controls, zero config

Every feature you need to safely operate LLMs at scale — out of the box, configurable via a single YAML file.

Security

Content Policy

Block prompt injection, jailbreak attempts, and custom patterns before they reach the model. Configurable blocklists with regex support.

Privacy

PII Scrubbing

Automatic detection and redaction of names, emails, phone numbers, SSNs, credit cards, and IPs using spaCy NER. Restored in responses.

Control

Rate Limiting

Per-user and per-team limits on requests/minute, tokens/minute, and tokens/day. Redis backend for multi-replica deployments.

Knowledge

RAG Integration

Automatic context injection from your ChromaDB knowledge base. Semantic search with configurable top-k and score thresholds.

Performance

Response Caching

Semantic cache reduces latency and cost. Redis-backed for clusters, in-memory for single-node. Configurable TTL.

Observability

Usage Analytics

Daily materialized views for cost, token consumption, and request volume. Team and user leaderboards. Langfuse integration.

Enterprise

Team Management

Multi-team support with per-team API keys and rate limits. Google OAuth SSO creates keys automatically on first login.

Deployment

Kubernetes Native

Production Helm chart with Bitnami PostgreSQL and Redis subcharts. Auto-generated master key, PVC management, HPA.

Every request through 9 stages

A hardened pipeline that runs on every inference request — from authentication to metered usage recording.

01
Authentication
API key lookup, team resolution, user identification
02
Content Policy
Block injection attempts, jailbreaks, and blocked patterns
03
Token Count
Count prompt tokens, enforce per-model max limits
04
Rate Limiting
req/min, tokens/min, tokens/day — per user and per team
05
PII Scrubbing
Detect and redact PII entities with spaCy NER
06
RAG Context
Semantic search in ChromaDB, inject relevant chunks
07
Cache Lookup
Return cached response if semantic match found
08
LLM Call
Route to provider via LiteLLM with fallback models
09
Metrics & Usage
Record tokens, cost, latency; restore PII in response
Streaming supported (SSE)
OpenAI & Anthropic format
Each stage independently configurable

Works with your existing code

Change one line — the base URL. Every SDK, tool, and framework that supports OpenAI or Anthropic works with Geeper Relay.

Enterprise ready

Security and observability built in from day one — not bolted on.

Security

Defence in depth

  • HMAC-signed OAuth state
    Stateless SSO — no server-side session storage, safe across replicas
  • API key hashing
    Keys stored as SHA-256 hashes — compromised DB cannot leak secrets
  • PII never leaves your network
    Entities scrubbed before inference, restored locally in responses
  • Content policy enforcement
    Block prompt injection and jailbreak patterns at the gateway
  • Non-root containers
    runAsUser 1001, allowPrivilegeEscalation: false, drop ALL capabilities
  • Secret auto-rotation safe
    Master key preserved across helm upgrades via lookup() + resource-policy: keep

Observability

Know what's happening

  • Prometheus metrics
    /metrics endpoint with ServiceMonitor for Kube-Prometheus-Stack
  • Structured logging
    JSON logs with request ID, user, model, tokens, latency per request
  • Daily usage materialization
    PostgreSQL materialized view refreshed hourly — zero-cost queries
  • Team leaderboards
    Cost, token, and request rankings across teams and users
  • Langfuse integration
    Full trace export: prompt, completion, latency, cost per request
  • Health endpoints
    /healthz (liveness) and /readyz (readiness) — Kubernetes native

Get started in minutes

Two paths: deploy to Kubernetes with Helm, or run locally with Docker Compose.

Kubernetes (Helm)

Production deployment

# 1. Add Bitnami for subcharts
helm repo add bitnami \
https://charts.bitnami.com/bitnami
# 2. Fetch chart dependencies
helm dependency build ./helm/relay
# 3. Install (master key auto-generated)
helm install relay ./helm/relay \
--set secrets.openaiApiKey=$OPENAI_KEY \
--set ingress.enabled=true \
--set ingress.hosts[0].host=proxy.internal
✓ PostgreSQL, Redis, PVCs created automatically
Bitnami PostgreSQL + Redis included
HPA, ServiceMonitor, PVC management
PROXY_MASTER_KEY auto-generated and preserved across upgrades

Local / Docker Compose

Development & testing

# Clone and configure
git clone https://github.com/geeper-io/relay
cp .env.example .env
# edit .env with your API keys
# Start everything
docker compose up -d
# Proxy is ready at localhost:8000
curl localhost:8000/healthz
# → {"status": "ok"}
# Create your first API key via admin
curl -X POST localhost:8000/internal/api-keys \
-H "Authorization: Bearer $MASTER_KEY" \
-d '{"name":"dev","user_id":"alice"}'
SQLite by default — no external DB needed
Google OAuth optional — enable with client ID/secret
Hot-reload config without restart

Enable Google SSO for your team

Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET to activate /auth/login. Team members log in with Google and receive their API key automatically — no admin intervention needed.

Setup guide →