Files
aoc/ROADMAP.md
T
tomas.kracmar 7639f5f69d
Release / build-and-push (push) Successful in 1m23s
CI / lint-and-test (push) Successful in 1m22s
Release v1.7.18: fix Alpine.js SRI + CSP, add frontend modernization roadmap
- Revert @alpinejs/csp (CSP build has no support for template literals,
  optional chaining, or x-html — all used in the app template); switch
  back to the regular alpinejs build
- Pin Alpine.js to 3.15.12 with a verified SRI hash (replaces the
  floating @3.x.x tag that caused the integrity check failure)
- Restore 'unsafe-eval' to script-src (required by Alpine.js's
  new Function() expression evaluator; inline script was already
  eliminated in v1.7.17 so 'unsafe-inline' stays removed)
- Add Phase 7.5 Frontend Modernization to ROADMAP: Vue 3 + Vite
  migration plan that will allow a clean CSP without unsafe-eval

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 08:01:57 +02:00

7.0 KiB
Raw Blame History

AOC Roadmap

This roadmap tracks planned improvements for the Admin Operations Center (AOC) project, organized by phase.


Phase 1: Harden

Goal: fix critical security and reliability gaps before production use.

  • Fix JWT signature verification in auth.py
  • Fix broken frontend auth button references (loginBtn / logoutBtn)
  • Add MongoDB indexes (dedupe_key, timestamp, service+timestamp, id, text search)
  • Add MongoDB TTL index for data retention (RETENTION_DAYS)
  • Add /health endpoint with database connectivity check
  • Replace manual os.getenv parsing with Pydantic Settings (pydantic-settings)
  • Add structured JSON logging (structlog)
  • Configure CORS middleware via CORS_ORIGINS environment variable
  • Escape user input before MongoDB $regex queries (routes/events.py)
  • Fix incorrect return value in maintenance.py dedupe()

Phase 2: Stabilize

Goal: improve resilience, code quality, and development experience.

  • Cache Graph API tokens and reuse them until near expiry
  • Add exponential backoff / retry logic for Graph API and Office 365 API calls
  • Add unit tests for normalize_event(), _make_dedupe_key(), and auth.py
  • Add integration tests for /api/events and /api/fetch-audit-logs
  • Configure linter/formatter (ruff) and pre-commit hooks
  • Set up GitHub Actions CI pipeline (lint + test)
  • Add Pydantic request/response models for API endpoints
  • Validate page_size and hours with strict FastAPI constraints

Phase 3: Scale

Goal: handle larger data volumes and support real-time ingestion.

  • Replace skip-based pagination with cursor-based (search-after) pagination
  • Add Prometheus /metrics endpoint and a Grafana dashboard
  • Implement incremental fetch watermarking per source (store last fetch timestamp)
  • Add webhook endpoints to receive Microsoft Graph change notifications
  • Evaluate Elasticsearch or Azure Cognitive Search for advanced full-text search (MongoDB text index sufficient for current scale)
  • Add request ID / correlation ID middleware for distributed tracing

Phase 4: Enhance

Goal: evolve from a polling dashboard into a full security operations tool.

  • Migrate frontend to Alpine.js for better state management and maintainability
  • Add rule-based alerting (e.g., alert on privileged operations, after-hours activity)
  • Add SIEM export (Splunk, Sentinel, syslog webhook)
  • Build an audit trail for AOC itself (who queried what, who triggered fetches)
  • Add event tagging and commenting (e.g., investigating, false_positive)
  • Add export functionality (CSV / JSON) from the UI
  • Add source health dashboard showing last fetch time and status per source

Phase 5: Intelligence

Goal: add AI-powered analysis and external tool integration.

  • AI feature flag (AI_FEATURES_ENABLED) to gate LLM-dependent features
  • Natural language query endpoint (/api/ask) with intent extraction and smart sampling
  • MCP (Model Context Protocol) server for Claude Desktop / Cursor integration
  • Valkey caching for LLM responses and frequent queries
  • Async queue (arq) for LLM requests to prevent timeout/cost explosions at scale
  • Advanced analytics dashboard (trending operations, anomaly detection)

Completed in this PR

All Phase 5 items marked done were implemented in v1.3.0v1.5.0. Redis caching + async queue implemented in v1.6.0, switched to Valkey. UI polish (topbar, footer, clickable pills) in v1.6.1v1.6.4.


Phase 6: Security Hardening

Goal: address penetration test findings and threat model gaps.

  • Fix CORS credentials leak (v1.7.12)
  • Add security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) (v1.7.12)
  • Make rate limiter fail-closed on Redis failure (v1.7.12)
  • Disable OpenAPI docs by default (v1.7.12)
  • Hide tenant_id/client_id from config endpoint when auth disabled (v1.7.12)
  • Validate webhook validationToken before echo (v1.7.12)
  • Gate /metrics behind IP allowlist (v1.7.12)
  • Add LLM domain allowlist (LLM_ALLOWED_DOMAINS) (v1.7.14)
  • Add SIEM webhook SSRF guard + domain allowlist (v1.7.14)
  • Add SRI hashes to CDN scripts (v1.7.14)
  • Add startup warning for auth misconfiguration (v1.7.14)
  • Add Azure Key Vault integration for secrets storage (v1.7.14)
  • Internal penetration test + threat model documentation

Phase 7.5: Frontend Modernization 📋

Goal: eliminate unsafe-eval from the Content Security Policy by migrating from Alpine.js to a compiled frontend framework.

Status: Planned. Current Alpine.js requires unsafe-eval because it uses new Function() to evaluate attribute expressions at runtime. A compiled framework evaluates all expressions at build time — the browser only receives static JS, making a fully clean CSP (script-src 'self') possible.

Alpine.js was inspired by Vue, so the migration is largely mechanical:

Alpine.js Vue 3
x-data="aocApp()" <script setup> or createApp(aocApp)
x-text, x-show, x-if, x-for v-text, v-show, v-if, v-for
x-model, x-html v-model, v-html
@click="method()" @click="method()" (identical)

The app.js logic (aocApp() function body, ~820 lines) translates almost directly. The CDN dependencies on cdn.jsdelivr.net and alcdn.msauth.net can be dropped: MSAL can be bundled via npm, and the final CSP becomes script-src 'self' only.

Effort estimate

  • Vite + Vue 3 project setup: ~23 hours
  • Template migration (HTML directives): ~46 hours
  • app.js → Vue component: ~23 hours
  • MSAL integration via npm: ~1 hour
  • Testing + polish: ~24 hours

Total: ~12 days


Phase 7: Multi-Tenancy (Premium) ⏸️

Goal: allow MSPs to manage multiple client tenants from a single deployment.

Status: Planned — not started. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.

Architecture

  • Row-level isolation: tenant_id field on every MongoDB document
  • Each tenant has their own Microsoft Entra tenant + app registration credentials
  • Auth: user's JWT tid claim maps to tenant config automatically
  • Super-admin role for MSP staff to access all tenants

Implementation phases

  • Phase 7.1 (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
  • Phase 7.2 (1 day): Tenant-scoped API routes, tenant-specific config endpoints
  • Phase 7.3 (2 days): Frontend tenant switcher, tenant name display, admin page
  • Phase 7.4 (1 day): License gating — signed JWT LICENSE_KEY gates multi-tenant mode

Licensing model

  • Single-tenant: remains MIT/free
  • Multi-tenant: premium feature requiring a signed license key
  • License key is a JWT with claims: plan, max_tenants, exp, features
  • Offline license generation tool included

Effort estimate

~79 days total. Deferred until SIEM export and alerting are battle-tested.