Response caching

Caching runs at stage 07. On a cache hit the response is returned immediately — the LLM is never called, and stages 08–09 are skipped entirely.

Configuration

cache:
  enabled: false   # disabled by default
  type: local      # "local" or "redis"
  ttl: 3600        # seconds

Enable it:

cache:
  enabled: true
  type: local
  ttl: 3600

The cache key is a hash of:

Two requests with identical messages and model always hit the same cache entry, regardless of which user sent them.

Best for: single-replica development or deterministic demo workloads.

cache:
  type: redis

CACHE__REDIS_URL=redis://redis.internal:6379

Caching is most effective for:

Avoid caching for:

Streaming responses are not cached — only non-streaming requests
Cache entries are per-request-shape only; there is no partial/semantic cache (i.e. a rephrased question always misses)
No cache invalidation endpoint — entries expire naturally via TTL or flush Redis manually