5.7 KiB
AOC Roadmap
This roadmap tracks planned improvements for the Admin Operations Center (AOC) project, organized by phase.
Phase 1: Harden ✅
Goal: fix critical security and reliability gaps before production use.
- Fix JWT signature verification in
auth.py - Fix broken frontend auth button references (
loginBtn/logoutBtn) - Add MongoDB indexes (
dedupe_key,timestamp,service+timestamp,id, text search) - Add MongoDB TTL index for data retention (
RETENTION_DAYS) - Add
/healthendpoint with database connectivity check - Replace manual
os.getenvparsing with Pydantic Settings (pydantic-settings) - Add structured JSON logging (
structlog) - Configure CORS middleware via
CORS_ORIGINSenvironment variable - Escape user input before MongoDB
$regexqueries (routes/events.py) - Fix incorrect return value in
maintenance.py dedupe()
Phase 2: Stabilize ✅
Goal: improve resilience, code quality, and development experience.
- Cache Graph API tokens and reuse them until near expiry
- Add exponential backoff / retry logic for Graph API and Office 365 API calls
- Add unit tests for
normalize_event(),_make_dedupe_key(), andauth.py - Add integration tests for
/api/eventsand/api/fetch-audit-logs - Configure linter/formatter (
ruff) and pre-commit hooks - Set up GitHub Actions CI pipeline (lint + test)
- Add Pydantic request/response models for API endpoints
- Validate
page_sizeandhourswith strict FastAPI constraints
Phase 3: Scale ✅
Goal: handle larger data volumes and support real-time ingestion.
- Replace skip-based pagination with cursor-based (search-after) pagination
- Add Prometheus
/metricsendpoint and a Grafana dashboard - Implement incremental fetch watermarking per source (store last fetch timestamp)
- Add webhook endpoints to receive Microsoft Graph change notifications
- Evaluate Elasticsearch or Azure Cognitive Search for advanced full-text search (MongoDB text index sufficient for current scale)
- Add request ID / correlation ID middleware for distributed tracing
Phase 4: Enhance ✅
Goal: evolve from a polling dashboard into a full security operations tool.
- Migrate frontend to Alpine.js for better state management and maintainability
- Add rule-based alerting (e.g., alert on privileged operations, after-hours activity)
- Add SIEM export (Splunk, Sentinel, syslog webhook)
- Build an audit trail for AOC itself (who queried what, who triggered fetches)
- Add event tagging and commenting (e.g.,
investigating,false_positive) - Add export functionality (CSV / JSON) from the UI
- Add source health dashboard showing last fetch time and status per source
Phase 5: Intelligence ✅
Goal: add AI-powered analysis and external tool integration.
- AI feature flag (
AI_FEATURES_ENABLED) to gate LLM-dependent features - Natural language query endpoint (
/api/ask) with intent extraction and smart sampling - MCP (Model Context Protocol) server for Claude Desktop / Cursor integration
- Valkey caching for LLM responses and frequent queries
- Async queue (arq) for LLM requests to prevent timeout/cost explosions at scale
- Advanced analytics dashboard (trending operations, anomaly detection)
Completed in this PR
All Phase 5 items marked done were implemented in v1.3.0–v1.5.0. Redis caching + async queue implemented in v1.6.0, switched to Valkey. UI polish (topbar, footer, clickable pills) in v1.6.1–v1.6.4.
Phase 6: Security Hardening ✅
Goal: address penetration test findings and threat model gaps.
- Fix CORS credentials leak (v1.7.12)
- Add security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) (v1.7.12)
- Make rate limiter fail-closed on Redis failure (v1.7.12)
- Disable OpenAPI docs by default (v1.7.12)
- Hide tenant_id/client_id from config endpoint when auth disabled (v1.7.12)
- Validate webhook validationToken before echo (v1.7.12)
- Gate
/metricsbehind IP allowlist (v1.7.12) - Add LLM domain allowlist (
LLM_ALLOWED_DOMAINS) (v1.7.14) - Add SIEM webhook SSRF guard + domain allowlist (v1.7.14)
- Add SRI hashes to CDN scripts (v1.7.14)
- Add startup warning for auth misconfiguration (v1.7.14)
- Add Azure Key Vault integration for secrets storage (v1.7.14)
- Internal penetration test + threat model documentation
Phase 7: Multi-Tenancy (Premium) ⏸️
Goal: allow MSPs to manage multiple client tenants from a single deployment.
Status: Planned — not started. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.
Architecture
- Row-level isolation:
tenant_idfield on every MongoDB document - Each tenant has their own Microsoft Entra tenant + app registration credentials
- Auth: user's JWT
tidclaim maps to tenant config automatically - Super-admin role for MSP staff to access all tenants
Implementation phases
- Phase 7.1 (2–3 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
- Phase 7.2 (1 day): Tenant-scoped API routes, tenant-specific config endpoints
- Phase 7.3 (2 days): Frontend tenant switcher, tenant name display, admin page
- Phase 7.4 (1 day): License gating — signed JWT
LICENSE_KEYgates multi-tenant mode
Licensing model
- Single-tenant: remains MIT/free
- Multi-tenant: premium feature requiring a signed license key
- License key is a JWT with claims:
plan,max_tenants,exp,features - Offline license generation tool included
Effort estimate
~7–9 days total. Deferred until SIEM export and alerting are battle-tested.