Skip to content

Configuration reference

The proxy reads a YAML config file on startup. The path is set via the CONFIG_FILE environment variable (default: config/config.yaml). In Kubernetes the file is mounted from a ConfigMap generated from values.yaml.

KeyTypeDefaultDescription
workersint4Number of uvicorn worker processes
log_levelstring"info"Log level: debug, info, warning, error

Helm: config.workers, config.logLevel

KeyTypeDefaultDescription
default_modelstring"gpt-4o"Model used when none is specified in the request
allowed_modelslistsee belowRequests for any other model are rejected with 400
fallback_modelslist[]Tried in order when the primary model returns an error
model_aliasesmap{}e.g. gpt-4: gpt-4o — rewrite model names before routing
per_model_max_tokensmap{}Override max output tokens per model

Default allowed_models:

- gpt-4o
- gpt-4o-mini
- claude-3-5-sonnet-20241022
- claude-3-haiku-20240307

Helm: config.llm.*

KeyTypeDefaultDescription
enabledbooltrueEnable RAG context injection
top_kint5Maximum chunks to retrieve
score_thresholdfloat0.4Minimum cosine similarity score
embedding_modelstring"all-MiniLM-L6-v2"sentence-transformers model for embedding

Helm: config.rag.*

KeyTypeDefaultDescription
enabledbooltrueEnable PII detection and scrubbing
score_thresholdfloat0.7Minimum Presidio confidence score to redact
entitieslistsee belowEntity types to detect

Default entities: PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, US_SSN, IP_ADDRESS, LOCATION

Helm: config.pii.*

KeyTypeDefaultDescription
enabledbooltrueEnable rate limiting
backendstring"memory"memory or redis. Auto-set to redis when redis.enabled=true in Helm
defaults.requests_per_minuteint60Per-user req/min limit
defaults.tokens_per_minuteint100000Per-user tokens/min limit
defaults.tokens_per_dayint1000000Per-user tokens/day limit

Per-team limits are set via the admin API — see Teams & API keys.

Helm: config.rateLimiting.*

KeyTypeDefaultDescription
enabledbooltrueEnable content policy checks
max_input_tokensint32000Reject requests with more prompt tokens than this
blocked_patternslistsee belowLiteral strings (case-insensitive) to block

Default blocked patterns:

- "ignore previous instructions"
- "ignore all previous"
- "jailbreak"

Helm: config.contentPolicy.*

KeyTypeDefaultDescription
enabledboolfalseEnable response caching
typestring"local"local (in-process dict) or redis. Auto-set to redis when redis.enabled=true in Helm
ttlint3600Cache TTL in seconds

Helm: config.cache.*

KeyTypeDefaultDescription
enabledboolfalseEnable Langfuse trace export
providerstring"langfuse"Only langfuse supported currently

Langfuse credentials are set via environment variables: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST (empty = Langfuse Cloud).

Helm: config.analytics.*, secrets.langfuse*

server:
workers: 4
log_level: info
llm:
default_model: gpt-4o
allowed_models:
- gpt-4o
- gpt-4o-mini
- claude-3-5-sonnet-20241022
fallback_models: []
model_aliases:
gpt-4: gpt-4o
per_model_max_tokens:
gpt-4o: 8192
rag:
enabled: true
top_k: 5
score_threshold: 0.4
embedding_model: all-MiniLM-L6-v2
pii:
enabled: true
score_threshold: 0.7
entities:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
- US_SSN
- IP_ADDRESS
- LOCATION
rate_limiting:
enabled: true
backend: memory
defaults:
requests_per_minute: 60
tokens_per_minute: 100000
tokens_per_day: 1000000
content_policy:
enabled: true
max_input_tokens: 32000
blocked_patterns:
- "ignore previous instructions"
- "ignore all previous"
- "jailbreak"
cache:
enabled: false
type: local
ttl: 3600
analytics:
enabled: false
provider: langfuse