Files
aoc/ROADMAP.md
Tomas Kracmar f75f165911
Some checks failed
Release / build-and-push (push) Successful in 1m24s
CI / lint-and-test (push) Failing after 29s
feat: Redis caching + async queue for LLM scaling (v1.6.0)
- Add async Redis client singleton (redis_client.py) for caching and arq pool
- Add arq job functions (jobs.py) for background LLM processing
- Cache ask/explain LLM responses with TTL (1h ask, 24h explain)
- Add async mode to /api/ask: enqueue job, return job_id, poll /api/jobs/{id}
- Add GET /api/jobs/{job_id} endpoint for job status polling
- Add arq worker service to docker-compose (dev + prod)
- Switch from Redis to Valkey (BSD fork) in Docker Compose
- Add REDIS_URL config setting
- Add tests for cache hit, async mode, and job status
2026-04-22 09:55:05 +02:00

3.5 KiB
Raw Permalink Blame History

AOC Roadmap

This roadmap tracks planned improvements for the Admin Operations Center (AOC) project, organized by phase.


Phase 1: Harden

Goal: fix critical security and reliability gaps before production use.

  • Fix JWT signature verification in auth.py
  • Fix broken frontend auth button references (loginBtn / logoutBtn)
  • Add MongoDB indexes (dedupe_key, timestamp, service+timestamp, id, text search)
  • Add MongoDB TTL index for data retention (RETENTION_DAYS)
  • Add /health endpoint with database connectivity check
  • Replace manual os.getenv parsing with Pydantic Settings (pydantic-settings)
  • Add structured JSON logging (structlog)
  • Configure CORS middleware via CORS_ORIGINS environment variable
  • Escape user input before MongoDB $regex queries (routes/events.py)
  • Fix incorrect return value in maintenance.py dedupe()

Phase 2: Stabilize

Goal: improve resilience, code quality, and development experience.

  • Cache Graph API tokens and reuse them until near expiry
  • Add exponential backoff / retry logic for Graph API and Office 365 API calls
  • Add unit tests for normalize_event(), _make_dedupe_key(), and auth.py
  • Add integration tests for /api/events and /api/fetch-audit-logs
  • Configure linter/formatter (ruff) and pre-commit hooks
  • Set up GitHub Actions CI pipeline (lint + test)
  • Add Pydantic request/response models for API endpoints
  • Validate page_size and hours with strict FastAPI constraints

Phase 3: Scale

Goal: handle larger data volumes and support real-time ingestion.

  • Replace skip-based pagination with cursor-based (search-after) pagination
  • Add Prometheus /metrics endpoint and a Grafana dashboard
  • Implement incremental fetch watermarking per source (store last fetch timestamp)
  • Add webhook endpoints to receive Microsoft Graph change notifications
  • Evaluate Elasticsearch or Azure Cognitive Search for advanced full-text search (MongoDB text index sufficient for current scale)
  • Add request ID / correlation ID middleware for distributed tracing

Phase 4: Enhance

Goal: evolve from a polling dashboard into a full security operations tool.

  • Migrate frontend to Alpine.js for better state management and maintainability
  • Add rule-based alerting (e.g., alert on privileged operations, after-hours activity)
  • Add SIEM export (Splunk, Sentinel, syslog webhook)
  • Build an audit trail for AOC itself (who queried what, who triggered fetches)
  • Add event tagging and commenting (e.g., investigating, false_positive)
  • Add export functionality (CSV / JSON) from the UI
  • Add source health dashboard showing last fetch time and status per source

Phase 5: Intelligence

Goal: add AI-powered analysis and external tool integration.

  • AI feature flag (AI_FEATURES_ENABLED) to gate LLM-dependent features
  • Natural language query endpoint (/api/ask) with intent extraction and smart sampling
  • MCP (Model Context Protocol) server for Claude Desktop / Cursor integration
  • Valkey caching for LLM responses and frequent queries
  • Async queue (arq) for LLM requests to prevent timeout/cost explosions at scale
  • Advanced analytics dashboard (trending operations, anomaly detection)

Completed in this PR

All Phase 5 items marked done were implemented in v1.3.0v1.5.0. Redis caching + async queue implemented in v1.6.0, switched to Valkey.