Open source · Self-hosted · Beta

One gateway
for every LLM

Drop-in OpenAI & Anthropic compatible proxy with enterprise controls — rate limiting, PII scrubbing, RAG, content policy, caching, and full usage analytics. Deploy to Kubernetes in minutes.

Get started free See how it works

Pipeline stages

100+

LLM providers

100%

API compatible

K8s

Helm chart included

terminal

# Deploy with Helm

helm install relay oci://ghcr.io/geeper-io/charts/relay \

--version <version>

--set secrets.openaiApiKey = $OPENAI_KEY

--set secrets.anthropicApiKey = $ANTHROPIC_KEY

✓ Ready on port 8000

# Point any OpenAI client to your proxy

export OPENAI_BASE_URL = "https://proxy.internal/v1"

# Works with Anthropic SDK too

export ANTHROPIC_BASE_URL = "https://proxy.internal"

# Or with Claude Code CLI

claude --api-url https://proxy.internal \

--api-key your-relay-api-key

Works with every major LLM provider via LiteLLM

OpenAI

Anthropic

Azure OpenAI

Google Gemini

Mistral

100+ via LiteLLM

Enterprise controls, zero config

Every feature you need to safely operate LLMs at scale — out of the box, configurable via a single YAML file.

Security

Content Policy

Block prompt injection, jailbreak attempts, and custom patterns before they reach the model. Configurable blocklists with regex support.

Privacy

PII Scrubbing

Automatic detection and redaction of names, emails, phone numbers, SSNs, credit cards, and IPs using spaCy NER. Restored in responses.

Control

Rate Limiting

Per-user and per-team limits on requests/minute, tokens/minute, and tokens/day. Redis backend for multi-replica deployments.

Knowledge

RAG Integration

Automatic context injection from your ChromaDB knowledge base. AST-aware code chunking for 15+ languages. Scope retrieval to a specific repo with X-Relay-Repo.

Knowledge

Code Review

Index GitHub and GitLab repositories incrementally. Send a diff — Relay injects relevant functions, types, and docs so the model reviews against your actual codebase.

Performance

Response Caching

Semantic cache reduces latency and cost. Redis-backed for clusters, in-memory for single-node. Configurable TTL.

Observability

Usage Analytics

Daily materialized views for cost, token consumption, and request volume. Team and user leaderboards. Langfuse integration.

Enterprise

Team Management

Multi-team support with per-team API keys and rate limits. Google OAuth SSO creates keys automatically on first login.

Every request through 9 stages

A hardened pipeline that runs on every inference request — from authentication to metered usage recording.

Authentication

API key lookup, team resolution, user identification

Content Policy

Block injection attempts, jailbreaks, and blocked patterns

Token Count

Count prompt tokens, enforce per-model max limits

Rate Limiting

req/min, tokens/min, tokens/day — per user and per team

PII Scrubbing

Detect and redact PII entities with spaCy NER

RAG Context

Semantic search in ChromaDB, inject relevant chunks

Cache Lookup

Return cached response if semantic match found

LLM Call

Route to provider via LiteLLM with fallback models

Metrics & Usage

Record tokens, cost, latency; restore PII in response

Streaming supported (SSE)

OpenAI & Anthropic format

Each stage independently configurable

Works with your existing code

Change one line — the base URL. Every SDK, tool, and framework that supports OpenAI or Anthropic works with Geeper Relay.

from openai import OpenAI

# Point to your proxy — zero other changes needed
client = OpenAI(
    base_url="https://proxy.internal/v1",
    api_key="your-team-api-key",
)

# Standard completion — works exactly like OpenAI
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain RAG in one paragraph"}],
)
print(response.choices[0].message.content)

# Streaming also works
stream = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",  # route to Anthropic through proxy
    messages=[{"role": "user", "content": "Write a haiku"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

import anthropic

# Use ANTHROPIC_BASE_URL env var or pass base_url directly
client = anthropic.Anthropic(
    base_url="https://proxy.internal",
    api_key="your-team-api-key",
)

# Native Anthropic Messages API format
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude"}],
)
print(message.content[0].text)

# Streaming with context manager
with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Count to 5"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Option 1: Environment variables (recommended)
export ANTHROPIC_BASE_URL="https://proxy.internal"
export ANTHROPIC_API_KEY="your-team-api-key"
claude

# Option 2: Command-line flags
claude --api-url https://proxy.internal \
       --api-key your-team-api-key

# Option 3: Add to your shell profile
echo 'export ANTHROPIC_BASE_URL="https://proxy.internal"' >> ~/.zshrc
echo 'export ANTHROPIC_API_KEY="your-team-api-key"' >> ~/.zshrc

# The proxy enforces your team's:
#  • Rate limits (requests/min, tokens/day)
#  • Content policy (block prompt injections)
#  • PII scrubbing before leaving your network
#  • Usage tracking for cost attribution

# OpenAI-compatible endpoint
curl https://proxy.internal/v1/chat/completions \
  -H "Authorization: Bearer your-team-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

# Anthropic Messages API endpoint
curl https://proxy.internal/v1/messages \
  -H "x-api-key: your-team-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-3-5-sonnet-20241022","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

# List your API keys
curl https://proxy.internal/internal/api-keys \
  -H "Authorization: Bearer your-team-api-key"

Enterprise ready

Security and observability built in from day one — not bolted on.

Security

Defence in depth

HMAC-signed OAuth state

Stateless SSO — no server-side session storage, safe across replicas
API key hashing

Keys stored as SHA-256 hashes — compromised DB cannot leak secrets
PII never leaves your network

Entities scrubbed before inference, restored locally in responses
Content policy enforcement

Block prompt injection and jailbreak patterns at the gateway
Non-root containers

runAsUser 1001, allowPrivilegeEscalation: false, drop ALL capabilities
Secret auto-rotation safe

Master key preserved across helm upgrades via lookup() + resource-policy: keep

Observability

Know what's happening

Prometheus metrics

/metrics endpoint with ServiceMonitor for Kube-Prometheus-Stack
Structured logging

JSON logs with request ID, user, model, tokens, latency per request
Daily usage materialization

PostgreSQL materialized view refreshed hourly — zero-cost queries
Team leaderboards

Cost, token, and request rankings across teams and users
Langfuse integration

Full trace export: prompt, completion, latency, cost per request
Health endpoints

/healthz (liveness) and /readyz (readiness) — Kubernetes native

Get started in minutes

Two paths: deploy to Kubernetes with Helm, or run locally with Docker Compose.

Kubernetes (Helm)

Production deployment

# Install from OCI registry (no repo add needed)

helm install relay \

oci://ghcr.io/geeper-io/charts/relay \

--version <version> \

--set secrets.openaiApiKey=$OPENAI_KEY \

--set ingress.enabled=true \

--set ingress.hosts[0].host=proxy.internal

✓ PostgreSQL, Redis, PVCs created automatically

Bitnami PostgreSQL + Redis included

HPA, ServiceMonitor, PVC management

PROXY_MASTER_KEY auto-generated and preserved across upgrades

Local / Docker Compose

Development & testing

# Clone and configure

git clone https://github.com/geeper-io/relay

cp .env.example .env

# edit .env with your API keys

# Start everything

docker compose up -d

# Proxy is ready at localhost:8000

curl localhost:8000/healthz

# → {"status": "ok"}

# Create your first API key via admin

curl -X POST localhost:8000/internal/api-keys \

-H "Authorization: Bearer $MASTER_KEY" \

-d '{"name":"dev","user_id":"alice"}'

SQLite by default — no external DB needed

Google OAuth optional — enable with client ID/secret

Hot-reload config without restart

Enable Google SSO for your team

Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET to activate /auth/login. Team members log in with Google and receive their API key automatically — no admin intervention needed.

Setup guide →

One gateway for every LLM

Enterprise controls, zero config

Content Policy

PII Scrubbing

Rate Limiting

RAG Integration

Code Review

Response Caching

Usage Analytics

Team Management

Every request through 9 stages

Works with your existing code

Enterprise ready

Security

Observability

Get started in minutes

Kubernetes (Helm)

Local / Docker Compose

Enable Google SSO for your team

One gateway
for every LLM