2 Commits

Author SHA1 Message Date
fe95dfcfce docs: update AGENTS.md, README.md, DEPLOY.md, ROADMAP.md for v1.7.14 security features
All checks were successful
Release / build-and-push (push) Successful in 21s
CI / lint-and-test (push) Successful in 25s
2026-04-27 16:52:35 +02:00
8d951fc335 v1.7.14: LLM/SIEM domain allowlists, SRI hashes, auth misconfig warning, Azure Key Vault integration
All checks were successful
CI / lint-and-test (push) Successful in 22s
Release / build-and-push (push) Successful in 1m7s
2026-04-27 16:45:06 +02:00
15 changed files with 718 additions and 28 deletions

View File

@@ -30,6 +30,15 @@ CORS_ORIGINS=*
# OpenAPI docs exposure (set true only for dev)
DOCS_ENABLED=false
# LLM endpoint domain restriction (comma-separated, supports wildcards like *.openai.azure.com)
# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
# SIEM webhook domain restriction (comma-separated)
# SIEM_ALLOWED_DOMAINS=your-siem.com
# Optional Azure Key Vault for secrets storage
# AZURE_KEY_VAULT_NAME=your-keyvault-name
# Optional: SIEM export webhook (e.g., Splunk HEC, Sentinel, or generic syslog webhook)
SIEM_ENABLED=false
SIEM_WEBHOOK_URL=

View File

@@ -9,20 +9,24 @@ AOC is a FastAPI microservice that ingests Microsoft Entra (Azure AD) audit logs
- **Runtime**: Python 3.11 (3.14 for tests)
- **Web Framework**: FastAPI + Uvicorn (Gunicorn in production)
- **Database**: MongoDB (PyMongo)
- **Cache/Queue**: Valkey/Redis 8 (caching + arq async job queue)
- **Frontend**: Alpine.js + HTML/CSS (served as static files from `backend/frontend/`)
- **Authentication**: Optional OIDC Bearer token validation against Microsoft Entra (using `python-jose` and MSAL.js on the frontend)
- **External APIs**: Microsoft Graph API, Office 365 Management Activity API, Azure OpenAI / MS Foundry
- **Deployment**: Docker Compose (dev), Docker Compose + nginx (prod)
- **CI/CD**: Gitea Actions (lint + test + Docker build + release)
- **Secrets Storage**: Environment variables (`.env`) or optional Azure Key Vault
## Project Structure
```
backend/
main.py # FastAPI app, router registration, background periodic fetch
config.py # Pydantic Settings configuration (loads .env)
config.py # Pydantic Settings configuration (loads .env + optional Key Vault)
database.py # MongoClient setup (db = micro_soc, collection = events)
auth.py # OIDC Bearer token validation, JWKS caching, role/group checks
secrets_manager.py # Optional Azure Key Vault integration for secrets
rate_limiter.py # Redis-backed fixed-window rate limiter (fail-closed)
requirements.txt # Python dependencies
Dockerfile # python:3.11-slim image, non-root user, version baked at build
mcp_server.py # Standalone MCP server for Claude Desktop / Cursor integration
@@ -34,6 +38,9 @@ backend/
health.py # GET /health, GET /metrics
rules.py # Rule-based alerting endpoints
webhooks.py # Microsoft Graph change notification webhooks
alerts.py # Alert management endpoints
saved_searches.py # Saved filter presets
jobs.py # Async job status polling
graph/
auth.py # Client credentials token acquisition for Graph
audit_logs.py # Fetch and enrich directory audit logs from Graph
@@ -59,16 +66,42 @@ Copy `.env.example` to `.env` at the repo root and fill in values:
cp .env.example .env
```
Key variables:
### Core variables
- `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` — Microsoft app registration credentials (application permissions)
- `AUTH_ENABLED` — set `true` to protect API/UI with OIDC Bearer tokens
- `AUTH_TENANT_ID`, `AUTH_CLIENT_ID` — token validation audience/issuer
- `AUTH_ALLOWED_ROLES`, `AUTH_ALLOWED_GROUPS` — comma-separated access control lists
- `ENABLE_PERIODIC_FETCH`, `FETCH_INTERVAL_MINUTES` — background ingestion scheduler
- `MONGO_ROOT_USERNAME`, `MONGO_ROOT_PASSWORD`, `MONGO_PORT` — used by Docker Compose for MongoDB
### AI / LLM variables
- `AI_FEATURES_ENABLED` — set `false` to completely disable AI endpoints and UI (default `true`)
- `LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`, `LLM_MAX_EVENTS`, `LLM_TIMEOUT_SECONDS` — LLM provider settings
- `LLM_API_VERSION` — required for Azure OpenAI / MS Foundry endpoints
- `LLM_ALLOWED_DOMAINS` — comma-separated domain allowlist for LLM endpoints (e.g. `api.openai.com,*.openai.azure.com`)
### Security variables
- `CORS_ORIGINS` — comma-separated allowed origins (default `*`; set explicit origins in production)
- `DOCS_ENABLED` — set `true` to expose `/docs`, `/redoc`, `/openapi.json` (default `false`)
- `METRICS_ALLOWED_IPS` — comma-separated CIDRs allowed to access `/metrics` (default: private networks + loopback)
- `WEBHOOK_CLIENT_SECRET` — secret for validating Graph webhook `clientState`
- `SIEM_ENABLED`, `SIEM_WEBHOOK_URL` — optional SIEM forwarding
- `SIEM_ALLOWED_DOMAINS` — comma-separated domain allowlist for SIEM webhook URLs
- `RATE_LIMIT_ENABLED`, `RATE_LIMIT_REQUESTS`, `RATE_LIMIT_WINDOW_SECONDS` — Redis-backed rate limiting
### Optional Azure Key Vault
- `AZURE_KEY_VAULT_NAME` — name of the Azure Key Vault to load secrets from
- When set, AOC fetches these secrets at startup:
- `aoc-client-secret``CLIENT_SECRET`
- `aoc-llm-api-key``LLM_API_KEY`
- `aoc-mongo-uri``MONGO_URI`
- `aoc-webhook-client-secret``WEBHOOK_CLIENT_SECRET`
- Requires `azure-identity` and `azure-keyvault-secrets` (uncomment in `requirements.txt`)
### Privacy / access control
- `PRIVACY_SERVICES` — comma-separated services to hide from non-privileged users (e.g. `Exchange,Teams`)
- `PRIVACY_SENSITIVE_OPERATIONS` — comma-separated operations to gate
- `PRIVACY_SERVICE_ROLES` — comma-separated Entra roles that grant access to privacy data
## Build and Run Commands
@@ -102,7 +135,9 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000
- `GET /api/config/features` — feature flags (`ai_features_enabled`)
- `POST /api/ask` — natural language query; returns LLM narrative + referenced events (only when `AI_FEATURES_ENABLED=true`)
- `GET /health` — liveness probe with DB connectivity
- `GET /metrics` — Prometheus metrics
- `GET /metrics` — Prometheus metrics (IP-restricted by default)
- `GET /api/source-health` — last fetch status per ingestion source
- `GET /api/version` — running version
## MCP Server
@@ -162,16 +197,30 @@ When adding new features or bug fixes, add or update tests in `backend/tests/`.
- Auth middleware and token validation
- API endpoints (`/api/events`, `/api/fetch-audit-logs`, `/api/ask`)
- NLQ time range extraction, entity extraction, query building
- Rate limiting behavior
## Security Considerations
- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env`. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer claims. Tokens are decoded without strict signature verification (`jwt.get_unverified_claims`), so the tenant and issuer checks are the primary gate.
- **Role/Group gating**: Access is allowed if the tokens `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed.
- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env` or Azure Key Vault. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer/audience claims. Tokens are decoded with RS256 signature verification.
- **Role/Group gating**: Access is allowed if the token's `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed — a startup warning is logged in this case.
- **CORS**: When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, `allow_credentials` is forced to `false` to prevent cross-origin token leakage.
- **Rate limiting**: Redis-backed fixed-window rate limiting with per-category limits (fetch=10/hr, ask=30/min, write=20/min, default=120/min). Fails closed (returns 429) when Redis is unavailable.
- **Pagination limits**: `page_size` is clamped to a maximum of 500 to prevent large queries.
- **Fetch window cap**: `hours` is clamped to 720 (30 days) to avoid runaway API calls.
- **LLM SSRF guard**: `LLM_BASE_URL` must be HTTPS and cannot point to private IPs. Optional `LLM_ALLOWED_DOMAINS` restricts to specific domains.
- **SIEM SSRF guard**: `SIEM_WEBHOOK_URL` has the same validation as LLM URLs, plus optional `SIEM_ALLOWED_DOMAINS`.
- **Metrics IP gating**: `/metrics` is restricted to private/loopback IPs by default via `METRICS_ALLOWED_IPS`.
- **OpenAPI docs**: Disabled by default (`DOCS_ENABLED=false`). Enable only in development.
- **CSP**: Content-Security-Policy headers are set on all responses. `unsafe-eval` is required for Alpine.js v3 expression evaluation.
- **SRI**: CDN scripts (Alpine.js, MSAL.js) include Subresource Integrity hashes to prevent supply chain compromise.
- **MCP server**: The MCP server bypasses auth entirely. Only run it in trusted environments or behind a VPN.
### Security Documentation
- `PEN_TEST_REPORT_v1.7.11.md` — Internal soft penetration test findings and remediation
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra/token abuse vectors
## Maintenance and Operations
The `backend/maintenance.py` script provides two CLI commands useful for backfilling or correcting stored data:

View File

@@ -7,6 +7,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
- **nginx** — reverse proxy, TLS termination, static file serving
- **backend** — FastAPI application (Gunicorn + Uvicorn workers)
- **mongo** — MongoDB data store (not exposed externally)
- **valkey** — Redis-compatible cache and async job queue (not exposed externally)
## Prerequisites
@@ -20,7 +21,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
1. **Clone / pull the latest release**
```bash
git checkout v1.1.0
git checkout v1.7.14
```
2. **Copy and edit environment variables**
@@ -33,7 +34,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
3. **Set the release version**
```bash
export AOC_VERSION=v1.1.0
export AOC_VERSION=v1.7.14
```
4. **Deploy**
@@ -53,7 +54,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
## Updating to a new release
```bash
export AOC_VERSION=v1.2.0
export AOC_VERSION=v1.7.14
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```
@@ -75,24 +76,56 @@ docker compose -f docker-compose.prod.yml up -d
Replace the `nginx` service in `docker-compose.prod.yml` with a Certbot-friendly setup (e.g., use the `nginx-proxy` + `acme-companion` stack) or mount the Certbot certificates into `nginx/ssl/`.
## Security hardening
## Security Hardening
- MongoDB is **not exposed** to the host — only the backend container can reach it.
- Valkey/Redis is **not exposed** to the host — only the backend container can reach it.
- The backend runs as a non-root (`aoc`) user inside the container.
- nginx adds security headers (`X-Frame-Options`, `X-Content-Type-Options`, etc.).
- Keep `.env` out of version control — it is listed in `.gitignore`.
- Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access to admin/security roles.
- Set explicit `CORS_ORIGINS` — do not use `*` in production when auth is enabled.
- Set `DOCS_ENABLED=false` to hide OpenAPI docs (`/docs`, `/openapi.json`).
- Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications.
- Set `LLM_ALLOWED_DOMAINS` if using AI features (e.g. `api.openai.com,*.openai.azure.com`).
- Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding.
- Review `METRICS_ALLOWED_IPS` — defaults to private networks + loopback.
## Azure Key Vault (Optional)
To eliminate long-lived secrets from `.env`:
1. Create an Azure Key Vault and add these secrets:
- `aoc-client-secret` — your Graph app `CLIENT_SECRET`
- `aoc-llm-api-key` — your `LLM_API_KEY` (if using AI)
- `aoc-mongo-uri` — your `MONGO_URI`
- `aoc-webhook-client-secret` — your `WEBHOOK_CLIENT_SECRET`
2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
4. Grant the container identity `Get` permission on secrets:
- If using Azure Container Instances / AKS: assign a managed identity
- If using VM: assign a managed identity or use a service principal
- If using local Docker: authenticate via `az login` on the host
5. Rebuild and redeploy:
```bash
docker compose -f docker-compose.prod.yml up -d --build
```
## Rollback
```bash
export AOC_VERSION=v1.0.3
export AOC_VERSION=v1.7.13
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```
## Monitoring
- Prometheus metrics: `http://your-host/metrics`
- Prometheus metrics: `http://your-host/metrics` (IP-restricted by default)
- Health check: `http://your-host/health`
- Container logs:
@@ -100,4 +133,13 @@ docker compose -f docker-compose.prod.yml up -d
docker compose -f docker-compose.prod.yml logs -f backend
docker compose -f docker-compose.prod.yml logs -f nginx
docker compose -f docker-compose.prod.yml logs -f mongo
docker compose -f docker-compose.prod.yml logs -f valkey
```
## Troubleshooting
- **Auth warning in logs**: "AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured" — set these to restrict access.
- **CORS issues**: Set `CORS_ORIGINS` to your exact frontend origin(s). Wildcard with auth enabled disables credentials.
- **Rate limiting 429s**: Check Redis/Valkey connectivity. The rate limiter fails closed (returns 429) when Redis is down.
- **LLM errors**: Verify `LLM_BASE_URL` is in `LLM_ALLOWED_DOMAINS` if the allowlist is configured.
- **SIEM not forwarding**: Verify `SIEM_WEBHOOK_URL` uses HTTPS and is in `SIEM_ALLOWED_DOMAINS`.

View File

@@ -11,13 +11,14 @@ FastAPI microservice that ingests Microsoft Entra (Azure AD) and other admin aud
- Optional OIDC bearer auth (Entra) to protect the API/UI and gate access by roles/groups.
- Natural language query (`/api/ask`) powered by LLM (OpenAI, Azure OpenAI, or any compatible API).
- MCP server for Claude Desktop / Cursor integration.
- Optional Azure Key Vault integration for secrets storage.
## Prerequisites (macOS)
- Python 3.11
- Docker Desktop (for the quickest start) or a local MongoDB instance
- An Entra app registration with **Application** permission `AuditLog.Read.All` and admin consent granted
- Also required to fetch other sources:
- `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registrations API permissions for Office 365 Management APIs)
- `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registration's API permissions for Office 365 Management APIs)
- Intune audit: `DeviceManagementConfiguration.Read.All` (or broader) for `/deviceManagement/auditEvents`
- Optional API protection: configure `AUTH_ENABLED=true` and set `AUTH_TENANT_ID`/`AUTH_CLIENT_ID` (the audience) plus allowed roles/groups.
@@ -49,8 +50,43 @@ cp .env.example .env
# LLM_BASE_URL=https://api.openai.com/v1
# LLM_MODEL=gpt-4o-mini
# LLM_TIMEOUT_SECONDS=30
# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
# Optional: SIEM forwarding
# SIEM_ENABLED=true
# SIEM_WEBHOOK_URL=https://your-siem.com/webhook
# SIEM_ALLOWED_DOMAINS=your-siem.com
# Optional: Azure Key Vault for secrets storage
# AZURE_KEY_VAULT_NAME=your-keyvault-name
```
### Using Azure Key Vault for secrets
Instead of storing `CLIENT_SECRET`, `LLM_API_KEY`, `MONGO_URI`, and `WEBHOOK_CLIENT_SECRET` in `.env`, you can store them in Azure Key Vault:
1. Create a Key Vault and add secrets with these names:
- `aoc-client-secret` → your Graph app `CLIENT_SECRET`
- `aoc-llm-api-key` → your `LLM_API_KEY`
- `aoc-mongo-uri` → your `MONGO_URI`
- `aoc-webhook-client-secret` → your `WEBHOOK_CLIENT_SECRET`
2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
4. Ensure the container has Azure identity credentials (managed identity, service principal, or Azure CLI auth)
## Security Hardening Checklist
Before deploying to production:
- [ ] Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access
- [ ] Set explicit `CORS_ORIGINS` (do not use `*` in production with auth enabled)
- [ ] Set `DOCS_ENABLED=false` (default) to hide OpenAPI docs
- [ ] Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications
- [ ] Set `LLM_ALLOWED_DOMAINS` if using AI features to prevent data exfiltration
- [ ] Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding
- [ ] Review `METRICS_ALLOWED_IPS` — defaults to private networks only
- [ ] Consider Azure Key Vault instead of `.env` for secrets
- [ ] Review the threat model: `THREAT_MODEL_v1.7.13.md`
## Run with Docker Compose (recommended)
```bash
docker compose up --build
@@ -76,7 +112,7 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000
## API
- `GET /health` — health check with MongoDB connectivity status.
- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors.
- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors (IP-restricted).
- `GET /api/version` — running version (baked into the Docker image at build time).
- `GET /api/fetch-audit-logs` — pulls the last 7 days by default (override with `?hours=N`, capped to 30 days) of:
- Entra directory audit logs (`/auditLogs/directoryAudits`)
@@ -171,7 +207,7 @@ curl http://localhost:8000/api/fetch-audit-logs
- Visit the UI at http://localhost:8000 to filter by user/service/action/result/time, search raw text, paginate, and view raw events.
## Maintenance (Dockerized)
Use the backend image so you dont need a local venv:
Use the backend image so you don't need a local venv:
```bash
# ensure Mongo + backend network are up
docker compose up -d mongo
@@ -182,10 +218,15 @@ docker compose run --rm backend python maintenance.py dedupe
```
Omit `--limit` to process all events. You can also run commands inside a running backend container with `docker compose exec backend ...`.
## Security Documentation
- `PEN_TEST_REPORT_v1.7.11.md` — Penetration test findings and remediation
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra application abuse, token handling, data exfiltration vectors
## Notes / Troubleshooting
- Ensure `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET` match an app registration with `AuditLog.Read.All` (application) permission and admin consent.
- Additional permissions: Office 365 Management Activity (`ActivityFeed.Read`), and Intune audit (`DeviceManagementConfiguration.Read.All`).
- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set).
- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set). A startup warning is logged if auth is enabled but no roles/groups are configured.
- Backfill limits: Management Activity API typically exposes ~7 days of history via API (longer if your tenant has extended/Advanced Audit retention). Directory/Intune audit retention follows your tenant policy (commonly 3090 days, longer with Advanced Audit).
- If you change Mongo credentials/ports, update `MONGO_URI` in `.env` (Docker Compose passes it through to the backend).
- The service uses the `micro_soc` database and `events` collection by default; adjust in `backend/config.py` if needed.
- If using Azure Key Vault, ensure the runtime identity (managed identity, service principal, or local Azure CLI) has `Get` permission on secrets.

64
RELEASE_NOTES_v1.7.14.md Normal file
View File

@@ -0,0 +1,64 @@
# AOC v1.7.14 Release Notes
**Release Date:** 2026-04-27
## Security Hardening: Threat Model Remediation
This release addresses the high-severity findings from the v1.7.13 threat model review.
### LLM Endpoint Domain Allowlist
- **New config:** `LLM_ALLOWED_DOMAINS` (comma-separated, supports wildcards like `*.openai.azure.com`)
- **Behavior:** When configured, the `/api/ask` endpoint rejects `LLM_BASE_URL` domains not in the allowlist
- **Impact:** Prevents audit data exfiltration via a compromised or attacker-controlled LLM endpoint
### SIEM Webhook SSRF Guard
- **New config:** `SIEM_ALLOWED_DOMAINS` (comma-separated)
- **Behavior:** The SIEM forwarder now validates `SIEM_WEBHOOK_URL` with the same SSRF checks as the LLM endpoint (HTTPS-only, blocks private IPs, enforces domain allowlist)
- **Impact:** Prevents real-time audit data exfiltration via a malicious SIEM webhook URL
### CDN Subresource Integrity (SRI)
- Added `integrity` hashes to both CDN scripts in the frontend:
- Alpine.js 3.15.11: `sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be`
- MSAL.js 2.37.0: `sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW`
- **Impact:** Browser refuses to execute CDN scripts if the content doesn't match the hash, preventing supply chain compromise
### Auth Misconfiguration Warning
- At startup, AOC now logs a `WARNING` if `AUTH_ENABLED=true` but neither `AUTH_ALLOWED_ROLES` nor `AUTH_ALLOWED_GROUPS` is configured
- **Impact:** Operators are alerted when the app is accidentally left open to all Entra users
### Azure Key Vault Integration (Optional)
- **New module:** `backend/secrets_manager.py`
- **New config:** `AZURE_KEY_VAULT_NAME`
- **Behavior:** If `AZURE_KEY_VAULT_NAME` is set, AOC fetches these secrets from Key Vault at startup:
- `aoc-client-secret``CLIENT_SECRET`
- `aoc-llm-api-key``LLM_API_KEY`
- `aoc-mongo-uri``MONGO_URI`
- `aoc-webhook-client-secret``WEBHOOK_CLIENT_SECRET`
- Falls back silently to `.env` / environment variables when Key Vault is not configured
- **Dependencies:** `azure-identity` and `azure-keyvault-secrets` (commented out in `requirements.txt` — uncomment when using Key Vault)
- **Impact:** Eliminates long-lived secrets from `.env` files and Docker images
## Files Changed
| File | Change |
|------|--------|
| `backend/config.py` | Added `LLM_ALLOWED_DOMAINS`, `SIEM_ALLOWED_DOMAINS`, `AZURE_KEY_VAULT_NAME` |
| `backend/routes/ask.py` | Domain allowlist enforcement for LLM URL |
| `backend/siem.py` | SSRF guard + domain allowlist for SIEM webhook |
| `backend/frontend/index.html` | SRI hashes for Alpine.js and MSAL.js |
| `backend/main.py` | Startup warning for auth misconfiguration |
| `backend/secrets_manager.py` | New — Azure Key Vault integration |
| `backend/requirements.txt` | Added optional Azure Key Vault packages |
| `.env.example` | Documented new settings |
| `VERSION` | Bumped to 1.7.14 |
| `THREAT_MODEL_v1.7.13.md` | Threat model documentation |
## Test Results
- **80/80 pytest tests passing**
- Ruff lint/format clean

View File

@@ -59,7 +59,7 @@ Goal: evolve from a polling dashboard into a full security operations tool.
---
## Phase 5: Intelligence
## Phase 5: Intelligence
Goal: add AI-powered analysis and external tool integration.
- [x] AI feature flag (`AI_FEATURES_ENABLED`) to gate LLM-dependent features
@@ -76,7 +76,26 @@ UI polish (topbar, footer, clickable pills) in v1.6.1v1.6.4.
---
## Phase 6: Multi-Tenancy (Premium) ⏸️
## Phase 6: Security Hardening ✅
Goal: address penetration test findings and threat model gaps.
- [x] Fix CORS credentials leak (v1.7.12)
- [x] Add security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) (v1.7.12)
- [x] Make rate limiter fail-closed on Redis failure (v1.7.12)
- [x] Disable OpenAPI docs by default (v1.7.12)
- [x] Hide tenant_id/client_id from config endpoint when auth disabled (v1.7.12)
- [x] Validate webhook validationToken before echo (v1.7.12)
- [x] Gate `/metrics` behind IP allowlist (v1.7.12)
- [x] Add LLM domain allowlist (`LLM_ALLOWED_DOMAINS`) (v1.7.14)
- [x] Add SIEM webhook SSRF guard + domain allowlist (v1.7.14)
- [x] Add SRI hashes to CDN scripts (v1.7.14)
- [x] Add startup warning for auth misconfiguration (v1.7.14)
- [x] Add Azure Key Vault integration for secrets storage (v1.7.14)
- [x] Internal penetration test + threat model documentation
---
## Phase 7: Multi-Tenancy (Premium) ⏸️
Goal: allow MSPs to manage multiple client tenants from a single deployment.
Status: **Planned — not started**. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.
@@ -88,10 +107,10 @@ Status: **Planned — not started**. Architecture designed, pending validation o
- Super-admin role for MSP staff to access all tenants
### Implementation phases
- **Phase 6.1** (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
- **Phase 6.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints
- **Phase 6.3** (2 days): Frontend tenant switcher, tenant name display, admin page
- **Phase 6.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode
- **Phase 7.1** (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
- **Phase 7.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints
- **Phase 7.3** (2 days): Frontend tenant switcher, tenant name display, admin page
- **Phase 7.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode
### Licensing model
- Single-tenant: remains MIT/free

321
THREAT_MODEL_v1.7.13.md Normal file
View File

@@ -0,0 +1,321 @@
# AOC Threat Model — v1.7.13
**Date:** 2026-04-27
**Scope:** Entra ID / Microsoft Graph integration, token handling, data flows, external dependencies
**Assumptions:** Deployment is Docker Compose behind nginx reverse proxy; `AUTH_ENABLED=true`; `AI_FEATURES_ENABLED` may be true or false.
---
## Attack Surface Map
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ATTACKER │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Frontend │ │ API │ │ Webhook │ │
│ │ (CDN JS) │ │ (/api/*) │ │ (/api/webhooks)│ │
│ └──────┬──────┘ └──────┬───────┘ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ AOC BACKEND │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Auth │ │ Events │ │ Fetch │ │ Ask/LLM │ │ │
│ │ │ (JWT) │ │ (Mongo) │ │ (Graph) │ │ (HTTP) │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ │ │ │ │ │ │ │
│ │ ▼ ▼ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ SECRETS / CREDENTIALS │ │ │
│ │ │ CLIENT_SECRET │ LLM_API_KEY │ MONGO_PASSWORD │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Microsoft │ │ LLM API │ │ SIEM Webhook │ │
│ │ Graph API │ │ (OpenAI/ │ │ (optional) │ │
│ │ │ │ Azure) │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 1. Entra App Registration Abuse — HIGH
### 1.1 Client Credentials Leak = Full Tenant Read
**How it works:**
- AOC uses `client_credentials` flow (`graph/auth.py`)
- `CLIENT_ID` + `CLIENT_SECRET` are exchanged for an access token at `login.microsoftonline.com`
- The token has `https://graph.microsoft.com/.default` scope
- This grants **all application permissions** configured in the Entra app registration
**Typical permissions:**
- `Directory.Read.All` — read all users, groups, devices, roles
- `AuditLog.Read.All` — read all audit logs
- `DeviceManagementManagedDevices.Read.All` — read all Intune devices
**Attack scenario:**
1. Attacker gains read access to `.env` or the Docker container filesystem
2. Attacker calls the token endpoint directly with the leaked `CLIENT_ID`/`CLIENT_SECRET`
3. Attacker receives a Graph API access token valid for ~1 hour
4. Attacker can query ALL tenant data independently of AOC
**Impact:** Complete tenant data exfiltration — users, groups, devices, audit logs, mailboxes (if `Exchange.Read` granted).
**Mitigation in place:** None. The backend needs these permissions to function.
**Recommendation:**
- Store `CLIENT_SECRET` in a secret manager (Azure Key Vault, HashiCorp Vault) rather than `.env`
- Use short-lived certificates instead of long-lived secrets for app authentication
- Monitor Entra sign-in logs for anomalous `client_credentials` token requests
- Restrict app registration permissions to the absolute minimum (e.g., `AuditLog.Read.All` + `Directory.Read.All` only)
---
### 1.2 No Scope Restriction on Graph Token
**Finding:** `get_access_token()` always requests `https://graph.microsoft.com/.default` — the full permission set. There's no mechanism to request narrower scopes for specific operations.
**Impact:** If the app registration has 10 permissions, every token has all 10. A bug in one code path could expose data from all 10 permission areas.
**Recommendation:** Not easily fixable without splitting into multiple app registrations. Document as accepted risk.
---
## 2. Authentication & Token Validation — MEDIUM
### 2.1 JWKS Fetch Without TLS Certificate Validation Hardening
**Finding:** `_get_jwks()` fetches OIDC configuration and JWKS from `login.microsoftonline.com` using standard `requests` TLS validation. No certificate pinning or CA bundle restriction.
**Attack scenario (advanced):**
1. Attacker compromises DNS or a network hop between AOC and Microsoft
2. Attacker serves a fake JWKS endpoint with their own public key
3. Attacker issues a forged JWT signed with their private key
4. AOC validates the forged JWT against the attacker's public key
5. Attacker gains authenticated access
**Likelihood:** Very low (requires DNS compromise or nation-state-level interception).
**Mitigation:** Standard TLS validation is in place. For high-security environments, consider pinning the `login.microsoftonline.com` certificate thumbprint.
---
### 2.2 Missing `nbf` / `iat` Claim Verification
**Finding:** `_decode_token()` verifies `exp`, `tid`, `iss`, and `aud` but does not check `nbf` (not before) or `iat` (issued at) claims.
**Impact:** A token used before its validity period (`nbf`) or with a suspicious future `iat` would be accepted. Minor issue — MSAL tokens are well-formed in practice.
---
### 2.3 Role/Group Gating Defaults to "Allow All"
**Finding:** In `auth.py`:
```python
def _allowed(claims, allowed_roles, allowed_groups):
if not allowed_roles and not allowed_groups:
return True
```
**Impact:** If `AUTH_ENABLED=true` but `AUTH_ALLOWED_ROLES` and `AUTH_ALLOWED_GROUPS` are left empty (the default), **every Entra user in the tenant** can authenticate and use AOC. This is a common misconfiguration.
**Recommendation:** Add a startup warning when auth is enabled but no roles/groups are configured. Consider changing the default to deny-all.
---
### 2.4 Privacy Service Role Gating Also Defaults to "Allow All"
**Finding:** `user_can_access_privacy_services()` returns `True` if `PRIVACY_SERVICE_ROLES` is empty. If an admin configures `PRIVACY_SERVICES` (e.g., `Exchange`) but forgets to set `PRIVACY_SERVICE_ROLES`, all users see all privacy data.
---
## 3. Data Exfiltration Paths — HIGH
### 3.1 LLM Endpoint as Data Exfiltration Channel
**Finding:** When `AI_FEATURES_ENABLED=true` and `LLM_API_KEY` is set:
- The `/api/ask` endpoint sends audit event data (actors, targets, operations, summaries) to the configured LLM API
- `_validate_llm_url()` blocks private IPs but does NOT restrict the domain to an allowlist
- Any HTTPS URL is accepted
**Attack scenario:**
1. Attacker gains `.env` write access (or container filesystem access)
2. Attacker changes `LLM_BASE_URL` to `https://attacker.com/fake-llm`
3. Attacker sends an `/api/ask` request like "show me all events"
4. AOC queries MongoDB and sends up to `LLM_MAX_EVENTS` (default 200) events to the attacker's URL
5. Attacker receives structured audit data including actor names, UPNs, device names, operation details
**Impact:** Up to 200 audit events exfiltrated per API call. With pagination, an attacker could exfiltrate the entire database.
**Mitigation in place:** SSRF guard blocks private IPs and localhost.
**Gap:** No domain allowlist. An attacker-controlled public HTTPS endpoint is accepted.
**Recommendation:**
- Add `LLM_ALLOWED_DOMAINS` config (e.g., `api.openai.com,*.openai.azure.com`)
- Validate `LLM_BASE_URL` against this allowlist at startup and on every request
- Log all LLM requests with event counts sent
---
### 3.2 SIEM Webhook as Real-Time Exfiltration Channel
**Finding:** `siem.py` forwards every normalized event to `SIEM_WEBHOOK_URL` during ingestion:
```python
def forward_event(event):
if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
return
requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
```
**Gap:** No URL validation at all. Unlike the LLM endpoint, the SIEM webhook has NO SSRF guard.
**Attack scenario:**
1. Attacker sets `SIEM_ENABLED=true` and `SIEM_WEBHOOK_URL=https://attacker.com/collect`
2. Every new audit event fetched from Graph is immediately POSTed to the attacker's URL
3. Attacker receives real-time stream of all tenant audit events
**Impact:** Real-time, continuous data exfiltration of all audit events.
**Recommendation:**
- Add the same SSRF validation to `SIEM_WEBHOOK_URL` that exists for `LLM_BASE_URL`
- Add `SIEM_ALLOWED_DOMAINS` config
- Log SIEM forwarding failures prominently
---
### 3.3 Export Features (JSON/CSV)
**Finding:** The frontend has `exportJSON()` and `exportCSV()` functions that download all currently filtered events. These are authenticated but not rate-limited separately from `/api/events`.
**Impact:** A compromised account can export large batches of events. However, this requires authentication and is bounded by the 500-event page size limit.
**Risk level:** LOW — requires valid auth and is noisy.
---
## 4. Webhook Abuse — MEDIUM
### 4.1 Graph Change Notification Webhook
**Finding:** `/api/webhooks/graph` receives Microsoft Graph change notifications:
- Echoes `validationToken` for subscription handshake
- Accepts notifications with optional `clientState` validation
- `WEBHOOK_CLIENT_SECRET` is empty by default
**Attack scenario 1 — Subscription hijacking:**
1. Attacker discovers the webhook URL (via API enumeration or guess)
2. Attacker creates a Graph subscription pointing to the AOC webhook URL
3. Attacker receives change notifications for the subscribed resource
**Mitigation:** Notifications without matching `clientState` are rejected when `WEBHOOK_CLIENT_SECRET` is configured. But it's empty by default.
**Attack scenario 2 — Validation token abuse:**
1. Attacker sends a POST to `/api/webhooks/graph?validationToken=<arbitrary content>`
2. AOC echoes the token back as `text/plain`
3. Could be used for cache poisoning or response splitting
**Mitigation:** Length and ASCII validation added in v1.7.12.
**Recommendation:**
- Require `WEBHOOK_CLIENT_SECRET` to be set in production
- Document that the webhook endpoint should NOT be exposed to the public internet
---
## 5. Supply Chain — MEDIUM
### 5.1 CDN Scripts Without Subresource Integrity (SRI)
**Finding:** The frontend loads two external scripts without SRI hashes:
```html
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
```
**Attack scenario:**
1. `cdn.jsdelivr.net` or `alcdn.msauth.net` is compromised (supply chain attack)
2. Malicious JavaScript is served instead of the legitimate library
3. Malicious script can steal MSAL tokens, modify API requests, or exfiltrate data
**Impact:** Complete frontend compromise — token theft, data exfiltration, UI spoofing.
**Recommendation:**
- Add SRI hashes to both script tags:
```html
<script defer src="..." integrity="sha384-..." crossorigin="anonymous"></script>
```
- Or vendor the JS files and serve them from the same origin
---
## 6. Privilege Escalation — MEDIUM
### 6.1 Application Permissions Bypass User Boundaries
**Finding:** Because AOC uses application permissions (not delegated permissions), the backend can read audit logs for ALL users, not just the authenticated user. The privacy service filtering (`PRIVACY_SERVICES`) is the only boundary — and it's opt-in.
**Impact:** A user with minimal Entra permissions (e.g., a regular user who can authenticate) can view audit logs for the entire tenant if:
- `PRIVACY_SERVICES` is not configured, OR
- `PRIVACY_SERVICE_ROLES` is not configured
**Recommendation:**
- Document that AOC should be restricted to admin/security roles via `AUTH_ALLOWED_ROLES`
- Consider adding per-user event filtering (only show events where the authenticated user is the actor or target)
---
## 7. Miscellaneous Vectors — LOW
### 7.1 Token Cache in Memory
**Finding:** `_TOKEN_CACHE` in `graph/auth.py` is an in-memory dictionary. If an attacker gains code execution in the Python process, they can read the cache or call `get_access_token()` directly.
**Impact:** Attacker with code execution can get Graph API tokens. But if they have code execution, they already have `CLIENT_SECRET` from memory or `.env`.
### 7.2 MongoDB Connection String
**Finding:** `MONGO_URI` contains credentials. If an attacker gains filesystem access, they can connect directly to MongoDB and bypass all AOC auth/privacy controls.
**Mitigation:** MongoDB is internal to Docker network (not exposed to host in production compose file).
### 7.3 Audit Trail Log Injection
**Finding:** `audit_trail.log_action()` stores actions in MongoDB. The `details` dict could contain user-controlled data (e.g., filter values). If the audit log is ever rendered without escaping, this could lead to XSS.
**Risk level:** LOW — audit logs are not currently rendered in the UI.
---
## Risk Summary
| Vector | Severity | Likelihood | Requires |
|--------|----------|------------|----------|
| Client secret leak → full tenant read | **HIGH** | Medium | `.env` or container access |
| LLM endpoint hijacking → data exfil | **HIGH** | Low | `.env` write access |
| SIEM webhook hijacking → real-time exfil | **HIGH** | Low | `.env` write access |
| CDN compromise → frontend token theft | **MEDIUM** | Low | Supply chain attack |
| Role gating misconfig → all users access | **MEDIUM** | High | Misconfiguration |
| Webhook subscription hijacking | **MEDIUM** | Low | URL discovery |
| DNS compromise → fake JWKS | **MEDIUM** | Very low | Network compromise |
| Application permissions bypass boundaries | **MEDIUM** | High | Default config |
| Token replay | LOW | Low | Token theft |
| Audit log injection | LOW | Low | Filter manipulation |
---
## Immediate Recommendations
1. **Add LLM domain allowlist** (`LLM_ALLOWED_DOMAINS`) and validate at startup
2. **Add SIEM SSRF guard** — reuse `_validate_llm_url()` for `SIEM_WEBHOOK_URL`
3. **Add SRI hashes** to CDN script tags, or vendor the libraries
4. **Add startup warning** when auth is enabled but no `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` configured
5. **Document webhook security** — require `WEBHOOK_CLIENT_SECRET` in production
6. **Consider Key Vault integration** for `CLIENT_SECRET` and `LLM_API_KEY`
7. **Add per-user filtering option** — restrict events to those involving the authenticated user

View File

@@ -1 +1 @@
1.7.13
1.7.14

View File

@@ -1,4 +1,10 @@
from pydantic_settings import BaseSettings, SettingsConfigDict
from secrets_manager import load_key_vault_secrets
# Pre-load Azure Key Vault secrets into os.environ before pydantic-settings reads them.
# This is a no-op if AZURE_KEY_VAULT_NAME is not set.
load_key_vault_secrets()
from pydantic_settings import BaseSettings, SettingsConfigDict # noqa: E402
class Settings(BaseSettings):
@@ -80,6 +86,15 @@ class Settings(BaseSettings):
DOCS_ENABLED: bool = False
METRICS_ALLOWED_IPS: str = "127.0.0.1,::1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
# LLM endpoint restriction (comma-separated domains, e.g. "api.openai.com,*.openai.azure.com")
LLM_ALLOWED_DOMAINS: str = ""
# SIEM webhook restriction (comma-separated domains)
SIEM_ALLOWED_DOMAINS: str = ""
# Optional Azure Key Vault integration for secrets
AZURE_KEY_VAULT_NAME: str = ""
_settings = Settings()
@@ -134,3 +149,8 @@ RATE_LIMIT_WINDOW_SECONDS = _settings.RATE_LIMIT_WINDOW_SECONDS
DOCS_ENABLED = _settings.DOCS_ENABLED
METRICS_ALLOWED_IPS = _settings.METRICS_ALLOWED_IPS
LLM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.LLM_ALLOWED_DOMAINS.split(",") if d.strip()]
SIEM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.SIEM_ALLOWED_DOMAINS.split(",") if d.strip()]
AZURE_KEY_VAULT_NAME = _settings.AZURE_KEY_VAULT_NAME

View File

@@ -5,8 +5,8 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Admin Operations Center</title>
<link rel="stylesheet" href="/style.css?v=15" />
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js" integrity="sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be" crossorigin="anonymous"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" integrity="sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW" crossorigin="anonymous"></script>
</head>
<body>
<div class="page" x-data="aocApp()" x-init="initApp()">

View File

@@ -10,6 +10,8 @@ import structlog
from audit_trail import log_action
from config import (
AI_FEATURES_ENABLED,
AUTH_ALLOWED_GROUPS,
AUTH_ALLOWED_ROLES,
AUTH_ENABLED,
CORS_ORIGINS,
DOCS_ENABLED,
@@ -275,6 +277,13 @@ async def start_periodic_fetch():
auth_enabled=AUTH_ENABLED,
ai_enabled=AI_FEATURES_ENABLED,
)
# Warn when auth is enabled but no role/group restrictions are configured
if AUTH_ENABLED and not AUTH_ALLOWED_ROLES and not AUTH_ALLOWED_GROUPS:
logger.warning(
"AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured. "
"Any Entra user in the tenant can authenticate and access AOC. "
"Set AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS to restrict access."
)
if ENABLE_PERIODIC_FETCH:
app.state.fetch_task = asyncio.create_task(_periodic_fetch())

View File

@@ -16,3 +16,8 @@ gunicorn
mcp
redis
arq
# Optional: Azure Key Vault integration for secrets storage
# Uncomment if using AZURE_KEY_VAULT_NAME
# azure-identity
# azure-keyvault-secrets

View File

@@ -7,6 +7,7 @@ import httpx
import structlog
from auth import require_auth, user_can_access_privacy_services
from config import (
LLM_ALLOWED_DOMAINS,
LLM_API_KEY,
LLM_API_VERSION,
LLM_BASE_URL,
@@ -398,7 +399,7 @@ def _format_events_for_llm(
def _validate_llm_url(url: str):
"""Prevent SSRF by rejecting internal/reserved addresses."""
"""Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
from urllib.parse import urlparse
parsed = urlparse(url)
@@ -420,6 +421,12 @@ def _validate_llm_url(url: str):
except ValueError:
pass # hostname is not an IP, which is fine
# Enforce domain allowlist if configured
if LLM_ALLOWED_DOMAINS:
allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in LLM_ALLOWED_DOMAINS)
if not allowed:
raise RuntimeError(f"LLM_BASE_URL domain '{hostname}' is not in LLM_ALLOWED_DOMAINS")
def _build_chat_url(base_url: str, api_version: str) -> str:
base = base_url.rstrip("/")

View File

@@ -0,0 +1,76 @@
"""Optional Azure Key Vault integration for secrets storage.
If AZURE_KEY_VAULT_NAME is configured, sensitive secrets are fetched from
Azure Key Vault at startup and injected into the environment so that
pydantic-settings can read them. Falls back to .env / environment variables
when Key Vault is not configured.
Secret naming convention in Key Vault:
aoc-client-secret → CLIENT_SECRET
aoc-llm-api-key → LLM_API_KEY
aoc-mongo-uri → MONGO_URI
aoc-webhook-client-secret → WEBHOOK_CLIENT_SECRET
"""
import os
import structlog
logger = structlog.get_logger("aoc.secrets")
_KEY_VAULT_SECRET_MAP = {
"aoc-client-secret": "CLIENT_SECRET",
"aoc-llm-api-key": "LLM_API_KEY",
"aoc-mongo-uri": "MONGO_URI",
"aoc-webhook-client-secret": "WEBHOOK_CLIENT_SECRET",
}
def _load_from_key_vault(vault_name: str) -> dict[str, str]:
"""Fetch secrets from Azure Key Vault and return as {env_name: value}."""
try:
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
except ImportError as exc:
raise RuntimeError(
"Azure Key Vault libraries are not installed. Run: pip install azure-identity azure-keyvault-secrets"
) from exc
vault_url = f"https://{vault_name}.vault.azure.net/"
credential = DefaultAzureCredential()
client = SecretClient(vault_url=vault_url, credential=credential)
loaded = {}
for kv_name, env_name in _KEY_VAULT_SECRET_MAP.items():
try:
secret = client.get_secret(kv_name)
if secret.value:
loaded[env_name] = secret.value
logger.info("Loaded secret from Key Vault", secret_name=kv_name)
except Exception as exc:
logger.warning(
"Failed to load secret from Key Vault",
secret_name=kv_name,
error=str(exc),
)
return loaded
def load_key_vault_secrets(vault_name: str | None = None):
"""Load secrets from Azure Key Vault into os.environ if configured.
This should be called BEFORE pydantic-settings parses configuration.
"""
vault = vault_name or os.environ.get("AZURE_KEY_VAULT_NAME", "")
if not vault:
return
logger.info("Loading secrets from Azure Key Vault", vault_name=vault)
secrets = _load_from_key_vault(vault)
for env_name, value in secrets.items():
os.environ[env_name] = value
logger.info(
"Key Vault secrets loaded",
count=len(secrets),
keys=list(secrets.keys()),
)

View File

@@ -1,15 +1,43 @@
import ipaddress
import requests
import structlog
from config import SIEM_ENABLED, SIEM_WEBHOOK_URL
from config import SIEM_ALLOWED_DOMAINS, SIEM_ENABLED, SIEM_WEBHOOK_URL
logger = structlog.get_logger("aoc.siem")
def _validate_siem_url(url: str):
"""Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
from urllib.parse import urlparse
parsed = urlparse(url)
if parsed.scheme != "https":
raise RuntimeError("SIEM_WEBHOOK_URL must use HTTPS")
hostname = (parsed.hostname or "").lower()
if not hostname:
raise RuntimeError("SIEM_WEBHOOK_URL must have a valid hostname")
blocked = {"localhost", "127.0.0.1", "0.0.0.0", "::1", "169.254.169.254"}
if hostname in blocked:
raise RuntimeError(f"SIEM_WEBHOOK_URL hostname '{hostname}' is not allowed")
try:
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise RuntimeError(f"SIEM_WEBHOOK_URL IP '{hostname}' is not allowed")
except ValueError:
pass
if SIEM_ALLOWED_DOMAINS:
allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in SIEM_ALLOWED_DOMAINS)
if not allowed:
raise RuntimeError(f"SIEM_WEBHOOK_URL domain '{hostname}' is not in SIEM_ALLOWED_DOMAINS")
def forward_event(event: dict):
"""Forward a normalized event to the configured SIEM webhook."""
if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
return
try:
_validate_siem_url(SIEM_WEBHOOK_URL)
res = requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
res.raise_for_status()
logger.debug("Event forwarded to SIEM", event_id=event.get("id"))