# AOC Roadmap This roadmap tracks planned improvements for the Admin Operations Center (AOC) project, organized by phase. --- ## Phase 1: Harden ✅ Goal: fix critical security and reliability gaps before production use. - [x] Fix JWT signature verification in `auth.py` - [x] Fix broken frontend auth button references (`loginBtn` / `logoutBtn`) - [x] Add MongoDB indexes (`dedupe_key`, `timestamp`, `service+timestamp`, `id`, text search) - [x] Add MongoDB TTL index for data retention (`RETENTION_DAYS`) - [x] Add `/health` endpoint with database connectivity check - [x] Replace manual `os.getenv` parsing with Pydantic Settings (`pydantic-settings`) - [x] Add structured JSON logging (`structlog`) - [x] Configure CORS middleware via `CORS_ORIGINS` environment variable - [x] Escape user input before MongoDB `$regex` queries (`routes/events.py`) - [x] Fix incorrect return value in `maintenance.py dedupe()` --- ## Phase 2: Stabilize ✅ Goal: improve resilience, code quality, and development experience. - [x] Cache Graph API tokens and reuse them until near expiry - [x] Add exponential backoff / retry logic for Graph API and Office 365 API calls - [x] Add unit tests for `normalize_event()`, `_make_dedupe_key()`, and `auth.py` - [x] Add integration tests for `/api/events` and `/api/fetch-audit-logs` - [x] Configure linter/formatter (`ruff`) and pre-commit hooks - [x] Set up GitHub Actions CI pipeline (lint + test) - [x] Add Pydantic request/response models for API endpoints - [x] Validate `page_size` and `hours` with strict FastAPI constraints --- ## Phase 3: Scale ✅ Goal: handle larger data volumes and support real-time ingestion. - [x] Replace skip-based pagination with cursor-based (search-after) pagination - [x] Add Prometheus `/metrics` endpoint and a Grafana dashboard - [x] Implement incremental fetch watermarking per source (store last fetch timestamp) - [x] Add webhook endpoints to receive Microsoft Graph change notifications - [x] Evaluate Elasticsearch or Azure Cognitive Search for advanced full-text search (MongoDB text index sufficient for current scale) - [x] Add request ID / correlation ID middleware for distributed tracing --- ## Phase 4: Enhance ✅ Goal: evolve from a polling dashboard into a full security operations tool. - [x] Migrate frontend to Alpine.js for better state management and maintainability - [x] Add rule-based alerting (e.g., alert on privileged operations, after-hours activity) - [x] Add SIEM export (Splunk, Sentinel, syslog webhook) - [x] Build an audit trail for AOC itself (who queried what, who triggered fetches) - [x] Add event tagging and commenting (e.g., `investigating`, `false_positive`) - [x] Add export functionality (CSV / JSON) from the UI - [x] Add source health dashboard showing last fetch time and status per source --- ## Phase 5: Intelligence Goal: add AI-powered analysis and external tool integration. - [x] AI feature flag (`AI_FEATURES_ENABLED`) to gate LLM-dependent features - [x] Natural language query endpoint (`/api/ask`) with intent extraction and smart sampling - [x] MCP (Model Context Protocol) server for Claude Desktop / Cursor integration - [x] Valkey caching for LLM responses and frequent queries - [x] Async queue (arq) for LLM requests to prevent timeout/cost explosions at scale - [ ] Advanced analytics dashboard (trending operations, anomaly detection) ## Completed in this PR All Phase 5 items marked done were implemented in v1.3.0–v1.5.0. Redis caching + async queue implemented in v1.6.0, switched to Valkey. UI polish (topbar, footer, clickable pills) in v1.6.1–v1.6.4. --- ## Phase 6: Multi-Tenancy (Premium) ⏸️ Goal: allow MSPs to manage multiple client tenants from a single deployment. Status: **Planned — not started**. Architecture designed, pending validation of core features (SIEM export, alerting) in production first. ### Architecture - Row-level isolation: `tenant_id` field on every MongoDB document - Each tenant has their own Microsoft Entra tenant + app registration credentials - Auth: user's JWT `tid` claim maps to tenant config automatically - Super-admin role for MSP staff to access all tenants ### Implementation phases - **Phase 6.1** (2–3 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth - **Phase 6.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints - **Phase 6.3** (2 days): Frontend tenant switcher, tenant name display, admin page - **Phase 6.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode ### Licensing model - Single-tenant: remains MIT/free - Multi-tenant: premium feature requiring a signed license key - License key is a JWT with claims: `plan`, `max_tenants`, `exp`, `features` - Offline license generation tool included ### Effort estimate ~7–9 days total. Deferred until SIEM export and alerting are battle-tested.