Configuration reference
The proxy reads a YAML config file on startup. The path is set via the CONFIG_FILE environment variable (default: config/config.yaml). In Kubernetes the file is mounted from a ConfigMap generated from values.yaml.
server
Section titled “server”| Key | Type | Default | Description |
|---|---|---|---|
workers | int | 4 | Number of uvicorn worker processes |
log_level | string | "info" | Log level: debug, info, warning, error |
Helm: config.workers, config.logLevel
| Key | Type | Default | Description |
|---|---|---|---|
default_model | string | "gpt-4o" | Model used when none is specified in the request |
allowed_models | list | see below | Requests for any other model are rejected with 400 |
fallback_models | list | [] | Tried in order when the primary model returns an error |
model_aliases | map | {} | e.g. gpt-4: gpt-4o — rewrite model names before routing |
per_model_max_tokens | map | {} | Override max output tokens per model |
Default allowed_models:
- gpt-4o- gpt-4o-mini- claude-3-5-sonnet-20241022- claude-3-haiku-20240307Helm: config.llm.*
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable RAG context injection |
top_k | int | 5 | Maximum chunks to retrieve |
score_threshold | float | 0.4 | Minimum cosine similarity score |
embedding_model | string | "all-MiniLM-L6-v2" | sentence-transformers model for embedding |
Helm: config.rag.*
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable PII detection and scrubbing |
score_threshold | float | 0.7 | Minimum Presidio confidence score to redact |
entities | list | see below | Entity types to detect |
Default entities: PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, US_SSN, IP_ADDRESS, LOCATION
Helm: config.pii.*
rate_limiting
Section titled “rate_limiting”| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable rate limiting |
backend | string | "memory" | memory or redis. Auto-set to redis when redis.enabled=true in Helm |
defaults.requests_per_minute | int | 60 | Per-user req/min limit |
defaults.tokens_per_minute | int | 100000 | Per-user tokens/min limit |
defaults.tokens_per_day | int | 1000000 | Per-user tokens/day limit |
Per-team limits are set via the admin API — see Teams & API keys.
Helm: config.rateLimiting.*
content_policy
Section titled “content_policy”| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable content policy checks |
max_input_tokens | int | 32000 | Reject requests with more prompt tokens than this |
blocked_patterns | list | see below | Literal strings (case-insensitive) to block |
Default blocked patterns:
- "ignore previous instructions"- "ignore all previous"- "jailbreak"Helm: config.contentPolicy.*
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable response caching |
type | string | "local" | local (in-process dict) or redis. Auto-set to redis when redis.enabled=true in Helm |
ttl | int | 3600 | Cache TTL in seconds |
Helm: config.cache.*
analytics
Section titled “analytics”| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable Langfuse trace export |
provider | string | "langfuse" | Only langfuse supported currently |
Langfuse credentials are set via environment variables: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST (empty = Langfuse Cloud).
Helm: config.analytics.*, secrets.langfuse*
Complete example
Section titled “Complete example”server: workers: 4 log_level: info
llm: default_model: gpt-4o allowed_models: - gpt-4o - gpt-4o-mini - claude-3-5-sonnet-20241022 fallback_models: [] model_aliases: gpt-4: gpt-4o per_model_max_tokens: gpt-4o: 8192
rag: enabled: true top_k: 5 score_threshold: 0.4 embedding_model: all-MiniLM-L6-v2
pii: enabled: true score_threshold: 0.7 entities: - PERSON - EMAIL_ADDRESS - PHONE_NUMBER - CREDIT_CARD - US_SSN - IP_ADDRESS - LOCATION
rate_limiting: enabled: true backend: memory defaults: requests_per_minute: 60 tokens_per_minute: 100000 tokens_per_day: 1000000
content_policy: enabled: true max_input_tokens: 32000 blocked_patterns: - "ignore previous instructions" - "ignore all previous" - "jailbreak"
cache: enabled: false type: local ttl: 3600
analytics: enabled: false provider: langfuse