feat: Redis caching + async queue for LLM scaling (v1.6.0)

- Add async Redis client singleton (redis_client.py) for caching and arq pool - Add arq job functions (jobs.py) for background LLM processing - Cache ask/explain LLM responses with TTL (1h ask, 24h explain) - Add async mode to /api/ask: enqueue job, return job_id, poll /api/jobs/{id} - Add GET /api/jobs/{job_id} endpoint for job status polling - Add arq worker service to docker-compose (dev + prod) - Switch from Redis to Valkey (BSD fork) in Docker Compose - Add REDIS_URL config setting - Add tests for cache hit, async mode, and job status
2026-04-22 09:55:05 +02:00
parent 47e0dfc2ca
commit f75f165911
16 changed files with 498 additions and 14 deletions
@@ -50,6 +50,11 @@ LLM_MAX_EVENTS=200
 LLM_TIMEOUT_SECONDS=30
 LLM_API_VERSION=

+# Valkey (caching + async job queue for LLM calls)
+# In Docker Compose, this is set automatically to redis://redis:6379/0
+# For local dev, start Valkey with: docker run -d -p 6379:6379 valkey/valkey:8-alpine
+REDIS_URL=redis://localhost:6379/0
+
 # Optional: privacy / access control
 # Hide entire services from users without PRIVACY_SERVICE_ROLES
 # PRIVACY_SERVICES=Exchange,Teams