Files
aoc/ROADMAP.md
Tomas Kracmar a220494bcf
All checks were successful
CI / lint-and-test (push) Successful in 43s
docs: add Phase 6 multi-tenancy plan to roadmap
- Row-level isolation architecture
- Per-tenant Entra + Graph credentials
- License-gated premium feature
- Deferred until SIEM export and alerting are production-tested
2026-04-22 13:49:56 +02:00

4.8 KiB
Raw Permalink Blame History

AOC Roadmap

This roadmap tracks planned improvements for the Admin Operations Center (AOC) project, organized by phase.


Phase 1: Harden

Goal: fix critical security and reliability gaps before production use.

  • Fix JWT signature verification in auth.py
  • Fix broken frontend auth button references (loginBtn / logoutBtn)
  • Add MongoDB indexes (dedupe_key, timestamp, service+timestamp, id, text search)
  • Add MongoDB TTL index for data retention (RETENTION_DAYS)
  • Add /health endpoint with database connectivity check
  • Replace manual os.getenv parsing with Pydantic Settings (pydantic-settings)
  • Add structured JSON logging (structlog)
  • Configure CORS middleware via CORS_ORIGINS environment variable
  • Escape user input before MongoDB $regex queries (routes/events.py)
  • Fix incorrect return value in maintenance.py dedupe()

Phase 2: Stabilize

Goal: improve resilience, code quality, and development experience.

  • Cache Graph API tokens and reuse them until near expiry
  • Add exponential backoff / retry logic for Graph API and Office 365 API calls
  • Add unit tests for normalize_event(), _make_dedupe_key(), and auth.py
  • Add integration tests for /api/events and /api/fetch-audit-logs
  • Configure linter/formatter (ruff) and pre-commit hooks
  • Set up GitHub Actions CI pipeline (lint + test)
  • Add Pydantic request/response models for API endpoints
  • Validate page_size and hours with strict FastAPI constraints

Phase 3: Scale

Goal: handle larger data volumes and support real-time ingestion.

  • Replace skip-based pagination with cursor-based (search-after) pagination
  • Add Prometheus /metrics endpoint and a Grafana dashboard
  • Implement incremental fetch watermarking per source (store last fetch timestamp)
  • Add webhook endpoints to receive Microsoft Graph change notifications
  • Evaluate Elasticsearch or Azure Cognitive Search for advanced full-text search (MongoDB text index sufficient for current scale)
  • Add request ID / correlation ID middleware for distributed tracing

Phase 4: Enhance

Goal: evolve from a polling dashboard into a full security operations tool.

  • Migrate frontend to Alpine.js for better state management and maintainability
  • Add rule-based alerting (e.g., alert on privileged operations, after-hours activity)
  • Add SIEM export (Splunk, Sentinel, syslog webhook)
  • Build an audit trail for AOC itself (who queried what, who triggered fetches)
  • Add event tagging and commenting (e.g., investigating, false_positive)
  • Add export functionality (CSV / JSON) from the UI
  • Add source health dashboard showing last fetch time and status per source

Phase 5: Intelligence

Goal: add AI-powered analysis and external tool integration.

  • AI feature flag (AI_FEATURES_ENABLED) to gate LLM-dependent features
  • Natural language query endpoint (/api/ask) with intent extraction and smart sampling
  • MCP (Model Context Protocol) server for Claude Desktop / Cursor integration
  • Valkey caching for LLM responses and frequent queries
  • Async queue (arq) for LLM requests to prevent timeout/cost explosions at scale
  • Advanced analytics dashboard (trending operations, anomaly detection)

Completed in this PR

All Phase 5 items marked done were implemented in v1.3.0v1.5.0. Redis caching + async queue implemented in v1.6.0, switched to Valkey. UI polish (topbar, footer, clickable pills) in v1.6.1v1.6.4.


Phase 6: Multi-Tenancy (Premium) ⏸️

Goal: allow MSPs to manage multiple client tenants from a single deployment.

Status: Planned — not started. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.

Architecture

  • Row-level isolation: tenant_id field on every MongoDB document
  • Each tenant has their own Microsoft Entra tenant + app registration credentials
  • Auth: user's JWT tid claim maps to tenant config automatically
  • Super-admin role for MSP staff to access all tenants

Implementation phases

  • Phase 6.1 (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
  • Phase 6.2 (1 day): Tenant-scoped API routes, tenant-specific config endpoints
  • Phase 6.3 (2 days): Frontend tenant switcher, tenant name display, admin page
  • Phase 6.4 (1 day): License gating — signed JWT LICENSE_KEY gates multi-tenant mode

Licensing model

  • Single-tenant: remains MIT/free
  • Multi-tenant: premium feature requiring a signed license key
  • License key is a JWT with claims: plan, max_tenants, exp, features
  • Offline license generation tool included

Effort estimate

~79 days total. Deferred until SIEM export and alerting are battle-tested.