Skip to content

Helm values reference

All values with their defaults. Override via --set key=value or a -f values.yaml file.

KeyDefaultDescription
image.repositoryllm-proxyContainer image repository
image.tag"latest"Image tag (overrides Chart.appVersion)
image.pullPolicyIfNotPresentKubernetes pull policy
imagePullSecrets[]Image pull secrets for private registries
replicaCount1Pod replicas. See Scaling for RWX requirements.

Non-secret settings mounted as /app/config/config.yaml:

KeyDefaultDescription
config.workers4uvicorn worker processes
config.logLevel"info"Log level
config.llm.defaultModel"gpt-4o"Default model
config.llm.allowedModels[gpt-4o, gpt-4o-mini, ...]Model allowlist
config.llm.fallbackModels[]Fallback chain on provider error
config.llm.modelAliases{}Alias → canonical model map
config.llm.perModelMaxTokens{}Per-model output token caps
config.rag.enabledtrueEnable RAG
config.rag.topK5Top-k chunks
config.rag.scoreThreshold0.4Min similarity score
config.rag.embeddingModel"all-MiniLM-L6-v2"Embedding model
config.pii.enabledtrueEnable PII scrubbing
config.pii.scoreThreshold0.7Min Presidio confidence
config.pii.entities[PERSON, EMAIL_ADDRESS, ...]Entity types
config.rateLimiting.enabledtrueEnable rate limiting
config.rateLimiting.backend"memory"Auto-set to redis when redis.enabled
config.rateLimiting.defaults.requestsPerMinute60
config.rateLimiting.defaults.tokensPerMinute100000
config.rateLimiting.defaults.tokensPerDay1000000
config.contentPolicy.enabledtrueEnable content policy
config.contentPolicy.maxInputTokens32000Max prompt tokens
config.contentPolicy.blockedPatterns[...]Blocked phrase list
config.cache.enabledfalseEnable response caching
config.cache.type"local"Auto-set to redis when redis.enabled
config.cache.ttl3600Cache TTL (seconds)
config.analytics.enabledfalseEnable Langfuse export
config.analytics.provider"langfuse"Analytics provider
KeyDefaultDescription
secrets.createtrueCreate a Secret from values below
secrets.existingSecret""Name of a pre-existing Secret (disables secrets.create)
secrets.existingMasterKeySecret""Bring your own master key Secret. Empty = auto-generate.
secrets.openaiApiKey""
secrets.anthropicApiKey""
secrets.azureOpenaiApiKey""
secrets.azureOpenaiEndpoint""
secrets.googleClientId""
secrets.googleClientSecret""
secrets.authBaseUrl""e.g. https://proxy.internal
secrets.langfusePublicKey""
secrets.langfuseSecretKey""
secrets.langfuseHost""Empty = Langfuse Cloud
KeyDefaultDescription
externalDatabase.url""Skip bundled PostgreSQL. e.g. postgresql+asyncpg://user:pass@host:5432/llm_proxy
KeyDefaultDescription
persistence.chroma.enabledtrueCreate ChromaDB PVC
persistence.chroma.size10Gi
persistence.chroma.storageClass""Empty = cluster default
persistence.chroma.accessModeReadWriteOnceUse ReadWriteMany for replicaCount > 1
persistence.knowledgeBase.enabledtrueCreate knowledge base PVC
persistence.knowledgeBase.size5Gi
persistence.knowledgeBase.storageClass""
persistence.knowledgeBase.accessModeReadWriteOnce
KeyDefaultDescription
service.typeClusterIPClusterIP, NodePort, or LoadBalancer
service.port8000Service port
KeyDefaultDescription
ingress.enabledfalse
ingress.className""Ingress class name
ingress.annotations{}e.g. cert-manager.io/cluster-issuer
ingress.hosts(example)Host + path configuration
ingress.tls[]TLS configuration
KeyDefaultDescription
resources.requests.cpu500m
resources.requests.memory1500MiSized for spaCy en_core_web_lg (~800 MB)
resources.limits.cpu"2"
resources.limits.memory3Gi
KeyDefaultDescription
autoscaling.enabledfalse
autoscaling.minReplicas1
autoscaling.maxReplicas5
autoscaling.targetCPUUtilizationPercentage70
KeyDefaultDescription
livenessProbe.initialDelaySeconds60Allow time for spaCy model to load
readinessProbe.initialDelaySeconds60
KeyDefaultDescription
prometheus.serviceMonitor.enabledfalseCreate ServiceMonitor for Prometheus Operator
prometheus.serviceMonitor.interval"15s"Scrape interval
prometheus.serviceMonitor.namespace""Empty = release namespace
prometheus.serviceMonitor.labels{}Labels to match Prometheus Operator
KeyDefaultDescription
postgresql.enabledtrueDeploy bundled PostgreSQL
postgresql.auth.usernameproxy
postgresql.auth.password""Auto-generated when empty
postgresql.auth.databasellm_proxy
postgresql.primary.persistence.size20Gi

Pass-through to bitnami/postgresql chart. See Bitnami docs for all options.

KeyDefaultDescription
redis.enabledfalseDeploy bundled Redis
redis.auth.enabledfalseEnable Redis auth
redis.master.persistence.size2Gi

When redis.enabled: true, the proxy automatically uses Redis for rate limiting and caching.