Files
aoc/ROADMAP.md
Tomas Kracmar fe95dfcfce
All checks were successful
Release / build-and-push (push) Successful in 21s
CI / lint-and-test (push) Successful in 25s
docs: update AGENTS.md, README.md, DEPLOY.md, ROADMAP.md for v1.7.14 security features
2026-04-27 16:52:35 +02:00

123 lines
5.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AOC Roadmap
This roadmap tracks planned improvements for the Admin Operations Center (AOC) project, organized by phase.
---
## Phase 1: Harden ✅
Goal: fix critical security and reliability gaps before production use.
- [x] Fix JWT signature verification in `auth.py`
- [x] Fix broken frontend auth button references (`loginBtn` / `logoutBtn`)
- [x] Add MongoDB indexes (`dedupe_key`, `timestamp`, `service+timestamp`, `id`, text search)
- [x] Add MongoDB TTL index for data retention (`RETENTION_DAYS`)
- [x] Add `/health` endpoint with database connectivity check
- [x] Replace manual `os.getenv` parsing with Pydantic Settings (`pydantic-settings`)
- [x] Add structured JSON logging (`structlog`)
- [x] Configure CORS middleware via `CORS_ORIGINS` environment variable
- [x] Escape user input before MongoDB `$regex` queries (`routes/events.py`)
- [x] Fix incorrect return value in `maintenance.py dedupe()`
---
## Phase 2: Stabilize ✅
Goal: improve resilience, code quality, and development experience.
- [x] Cache Graph API tokens and reuse them until near expiry
- [x] Add exponential backoff / retry logic for Graph API and Office 365 API calls
- [x] Add unit tests for `normalize_event()`, `_make_dedupe_key()`, and `auth.py`
- [x] Add integration tests for `/api/events` and `/api/fetch-audit-logs`
- [x] Configure linter/formatter (`ruff`) and pre-commit hooks
- [x] Set up GitHub Actions CI pipeline (lint + test)
- [x] Add Pydantic request/response models for API endpoints
- [x] Validate `page_size` and `hours` with strict FastAPI constraints
---
## Phase 3: Scale ✅
Goal: handle larger data volumes and support real-time ingestion.
- [x] Replace skip-based pagination with cursor-based (search-after) pagination
- [x] Add Prometheus `/metrics` endpoint and a Grafana dashboard
- [x] Implement incremental fetch watermarking per source (store last fetch timestamp)
- [x] Add webhook endpoints to receive Microsoft Graph change notifications
- [x] Evaluate Elasticsearch or Azure Cognitive Search for advanced full-text search (MongoDB text index sufficient for current scale)
- [x] Add request ID / correlation ID middleware for distributed tracing
---
## Phase 4: Enhance ✅
Goal: evolve from a polling dashboard into a full security operations tool.
- [x] Migrate frontend to Alpine.js for better state management and maintainability
- [x] Add rule-based alerting (e.g., alert on privileged operations, after-hours activity)
- [x] Add SIEM export (Splunk, Sentinel, syslog webhook)
- [x] Build an audit trail for AOC itself (who queried what, who triggered fetches)
- [x] Add event tagging and commenting (e.g., `investigating`, `false_positive`)
- [x] Add export functionality (CSV / JSON) from the UI
- [x] Add source health dashboard showing last fetch time and status per source
---
## Phase 5: Intelligence ✅
Goal: add AI-powered analysis and external tool integration.
- [x] AI feature flag (`AI_FEATURES_ENABLED`) to gate LLM-dependent features
- [x] Natural language query endpoint (`/api/ask`) with intent extraction and smart sampling
- [x] MCP (Model Context Protocol) server for Claude Desktop / Cursor integration
- [x] Valkey caching for LLM responses and frequent queries
- [x] Async queue (arq) for LLM requests to prevent timeout/cost explosions at scale
- [ ] Advanced analytics dashboard (trending operations, anomaly detection)
## Completed in this PR
All Phase 5 items marked done were implemented in v1.3.0v1.5.0.
Redis caching + async queue implemented in v1.6.0, switched to Valkey.
UI polish (topbar, footer, clickable pills) in v1.6.1v1.6.4.
---
## Phase 6: Security Hardening ✅
Goal: address penetration test findings and threat model gaps.
- [x] Fix CORS credentials leak (v1.7.12)
- [x] Add security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) (v1.7.12)
- [x] Make rate limiter fail-closed on Redis failure (v1.7.12)
- [x] Disable OpenAPI docs by default (v1.7.12)
- [x] Hide tenant_id/client_id from config endpoint when auth disabled (v1.7.12)
- [x] Validate webhook validationToken before echo (v1.7.12)
- [x] Gate `/metrics` behind IP allowlist (v1.7.12)
- [x] Add LLM domain allowlist (`LLM_ALLOWED_DOMAINS`) (v1.7.14)
- [x] Add SIEM webhook SSRF guard + domain allowlist (v1.7.14)
- [x] Add SRI hashes to CDN scripts (v1.7.14)
- [x] Add startup warning for auth misconfiguration (v1.7.14)
- [x] Add Azure Key Vault integration for secrets storage (v1.7.14)
- [x] Internal penetration test + threat model documentation
---
## Phase 7: Multi-Tenancy (Premium) ⏸️
Goal: allow MSPs to manage multiple client tenants from a single deployment.
Status: **Planned — not started**. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.
### Architecture
- Row-level isolation: `tenant_id` field on every MongoDB document
- Each tenant has their own Microsoft Entra tenant + app registration credentials
- Auth: user's JWT `tid` claim maps to tenant config automatically
- Super-admin role for MSP staff to access all tenants
### Implementation phases
- **Phase 7.1** (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
- **Phase 7.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints
- **Phase 7.3** (2 days): Frontend tenant switcher, tenant name display, admin page
- **Phase 7.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode
### Licensing model
- Single-tenant: remains MIT/free
- Multi-tenant: premium feature requiring a signed license key
- License key is a JWT with claims: `plan`, `max_tenants`, `exp`, `features`
- Offline license generation tool included
### Effort estimate
~79 days total. Deferred until SIEM export and alerting are battle-tested.