v1.7.13: switch Alpine.js to CSP build, remove unsafe-eval from CSP

v1.7.12: security hardening — CORS fix, security headers, fail-closed rate limiter, OpenAPI docs disabled by default, config auth privacy, webhook validation
hotfix(v1.7.11): add unsafe-eval to CSP for Alpine.js
2026-04-27 16:08:34 +02:00 · 2026-04-27 14:19:28 +02:00 · 2026-04-27 10:39:33 +02:00 · 2026-04-27 10:32:35 +02:00 · 2026-04-27 10:09:44 +02:00 · 2026-04-27 09:41:28 +02:00
24 changed files with 887 additions and 71 deletions
--- a/.env.example
+++ b/.env.example
@@ -27,6 +27,9 @@ RETENTION_DAYS=0
 # Optional: comma-separated CORS origins (e.g., http://localhost:3000,https://app.example.com)
 CORS_ORIGINS=*

+# OpenAPI docs exposure (set true only for dev)
+DOCS_ENABLED=false
+
 # Optional: SIEM export webhook (e.g., Splunk HEC, Sentinel, or generic syslog webhook)
 SIEM_ENABLED=false
 SIEM_WEBHOOK_URL=
@@ -64,6 +67,10 @@ ALERT_WEBHOOK_URL=
 ALERT_WEBHOOK_FORMAT=generic  # generic | slack | teams
 ALERT_DEDUPE_MINUTES=15

+# Webhook security (optional but strongly recommended)
+# Set this to the same clientState used when creating Graph subscriptions
+WEBHOOK_CLIENT_SECRET=
+
 # Optional: privacy / access control
 # Hide entire services from users without PRIVACY_SERVICE_ROLES
 # PRIVACY_SERVICES=Exchange,Teams
--- a/PEN_TEST_REPORT_v1.7.11.md
+++ b/PEN_TEST_REPORT_v1.7.11.md
@@ -0,0 +1,203 @@
+# AOC v1.7.11 Soft Penetration Test Report
+
+**Date:** 2026-04-27
+**Target:** Local AOC instance (port 8001), auth disabled, AI disabled
+**Tester:** Automated + manual curl-based probing
+**Scope:** FastAPI backend, REST API endpoints, middleware, headers
+
+---
+
+## Executive Summary
+
+AOC v1.7.11 has one **CRITICAL** vulnerability (CORS credentials leak) and several defense-in-depth gaps. The good news: input validation, NoSQL injection resistance, and error handling are solid. The bad news: CORS is dangerously permissive, security headers are missing, and the rate limiter fails open on Redis failure.
+
+| Severity | Count | Categories |
+|----------|-------|------------|
+| CRITICAL | 1 | CORS with credentials |
+| HIGH | 1 | Missing security headers |
+| MEDIUM | 2 | Fail-open rate limiter, OpenAPI exposure |
+| LOW | 2 | Information disclosure, webhook content injection |
+| INFO | 3 | Positive findings (no stack traces, input validation, NoSQL resistance) |
+
+---
+
+## CRITICAL
+
+### 1. CORS Reflects Any Origin with `allow_credentials=true`
+
+**Finding:** The CORS middleware returns `Access-Control-Allow-Origin: <any origin>` AND `Access-Control-Allow-Credentials: true` for every origin that sends an `Origin` header.
+
+**Evidence:**
+```bash
+curl -H "Origin: https://evil-attacker.com" http://localhost:8001/api/config/auth
+# Response headers:
+# access-control-allow-origin: https://evil-attacker.com
+# access-control-allow-credentials: true
+```
+
+**Impact:** An attacker can host a malicious page on any domain and make authenticated cross-origin requests to the AOC API using the victim's browser cookies/tokens. This effectively bypasses Same-Origin Policy for authenticated actions.
+
+**Root Cause:** `main.py` configures CORS with `allow_origins=["*"]` (from `CORS_ORIGINS` env var, default `"*"`) AND `allow_credentials=True`. According to CORS spec, a wildcard origin with credentials is technically invalid, but Starlette/FastAPI appears to reflect the request origin instead.
+
+**Recommendation:**
+- When `AUTH_ENABLED=true`, reject requests from origins not in an explicit allowlist.
+- Set `allow_credentials=False` if wildcard origins are needed.
+- Or, require `CORS_ORIGINS` to be explicitly configured (no default wildcard) when auth is enabled.
+
+---
+
+## HIGH
+
+### 2. Missing Security Headers
+
+**Finding:** The following security headers are absent from all responses:
+
+| Header | Purpose | Status |
+|--------|---------|--------|
+| `X-Content-Type-Options: nosniff` | Prevents MIME sniffing | MISSING |
+| `X-Frame-Options: DENY` or `SAMEORIGIN` | Clickjacking protection | MISSING |
+| `Strict-Transport-Security` | HSTS enforcement | MISSING |
+| `Referrer-Policy: strict-origin-when-cross-origin` | Limits referrer leakage | MISSING |
+| `Permissions-Policy` | Restricts browser features | MISSING |
+
+**Impact:** Increased attack surface for clickjacking, MIME confusion attacks, and information leakage via referrer headers.
+
+**Recommendation:** Add a security headers middleware to set these on all responses. HSTS only when served over HTTPS.
+
+---
+
+## MEDIUM
+
+### 3. Rate Limiter Fails Open on Redis Failure
+
+**Finding:** In `rate_limiter.py` line 81-82:
+```python
+except Exception as exc:
+    logger.warning("Rate limiter Redis error; allowing request", error=str(exc))
+```
+
+If Redis becomes unreachable, all rate limits are silently bypassed.
+
+**Evidence:** When Redis was down, 150+ requests to `/api/events` all returned 200 with no 429s.
+
+**Impact:** A DoS on Redis (or a network partition) removes all rate limiting, allowing unlimited API abuse.
+
+**Recommendation:** Make the rate limiter fail-closed: return 429 or 503 when Redis is unavailable, or use an in-memory fallback with a conservative limit.
+
+### 4. OpenAPI Schema Publicly Exposed
+
+**Finding:** `/docs`, `/redoc`, and `/openapi.json` are accessible without authentication and return the full API schema.
+
+**Evidence:**
+```bash
+curl -s http://localhost:8001/openapi.json | jq '.paths | keys'
+# Returns all 15+ API paths including internal endpoints
+```
+
+**Impact:** Attackers get a complete map of the API, including request/response schemas, parameter types, and endpoint structure. This significantly reduces reconnaissance time.
+
+**Recommendation:** Disable OpenAPI docs in production (`docs_url=None, redoc_url=None, openapi_url=None`) or gate them behind admin authentication.
+
+---
+
+## LOW
+
+### 5. Information Disclosure via `/api/config/auth` and `/metrics`
+
+**Finding:**
+- `/api/config/auth` leaks `tenant_id` and `client_id` even when auth is disabled. These values fall back to the Graph API credentials (`TENANT_ID`/`CLIENT_ID`), which may be sensitive.
+- `/metrics` exposes Python version (`3.14.3`), GC statistics, and application-internal metric names.
+
+**Evidence:**
+```json
+{
+  "auth_enabled": false,
+  "tenant_id": "0ec9f34c-17c8-4541-b084-7d64ecdcc997",
+  "client_id": "cc31fd45-1eca-431f-a2c6-ba81cd4c5d50"
+}
+```
+
+**Impact:** Low direct impact (tenant/client IDs are not secrets), but aids reconnaissance and narrows the attack surface.
+
+**Recommendation:**
+- Return empty strings for `tenant_id`/`client_id` when `auth_enabled=false`.
+- Gate `/metrics` behind IP allowlist or admin auth (standard Prometheus practice).
+
+### 6. Webhook Validation Token Echoed Without Sanitization
+
+**Finding:** The `/api/webhooks/graph` endpoint echoes `validationToken` query parameter as `text/plain` without any sanitization or length limits.
+
+**Evidence:**
+```bash
+curl -X POST "http://localhost:8001/api/webhooks/graph?validationToken=<script>alert(1)</script>"
+# Returns: <script>alert(1)</script> with Content-Type: text/plain
+```
+
+**Impact:** Low in the intended Microsoft Graph flow (token is Microsoft-generated), but if the endpoint is hit directly, an attacker could use this for cache poisoning, response splitting, or social engineering by making the endpoint return attacker-controlled content.
+
+**Recommendation:** Validate the validationToken format (e.g., JWT-like structure, length limits) before echoing, or set `Content-Type: text/plain; charset=utf-8` with `X-Content-Type-Options: nosniff` to reduce MIME confusion risk.
+
+---
+
+## INFO (Positive Findings)
+
+### A. No Stack Traces in Error Responses
+
+All errors (422, 404, 429, 500 if triggered) return generic JSON messages without internal details or stack traces. Good.
+
+### B. Pydantic Input Validation is Effective
+
+- `page_size` capped at 500 (returns 422 for 501, 0, -1)
+- `hours` capped at 720 (returns 422 for 721)
+- Invalid cursors return 400 with "Invalid cursor"
+- Malformed JSON bodies return 422 with field-level validation errors
+- `AlertCondition` op field strictly validated against `Literal["eq", "neq", "contains", "in", "after_hours"]`
+
+### C. NoSQL Injection Resistant
+
+MongoDB operators passed as string filter values are treated as literals, not operators:
+
+```bash
+curl "http://localhost:8001/api/events?operation=\$ne"
+# Returns 0 results (treated as literal string "$ne")
+```
+
+The `_build_query()` function in `events.py` uses `re.escape()` for search input and constructs queries safely.
+
+### D. Bulk Tags Pre-Count Check Works
+
+`bulk_tags` endpoint capped at 10,000 matched documents via pre-count check. 93 events were successfully tagged with no bypass.
+
+### E. Rate Limiting Works When Redis is Healthy
+
+- `/api/fetch-audit-logs`: 429 after 11 requests (limit: 10/hr)
+- `/api/events`: 429 after ~120 requests (limit: 120/min)
+- Exempt paths work correctly: `/health`, `/metrics`, `/api/config/auth`, `/api/config/features`
+- `Retry-After` header is returned on 429 responses
+
+---
+
+## Recommendations Summary
+
+| Priority | Action | Effort |
+|----------|--------|--------|
+| P0 | Fix CORS: do not allow credentials with wildcard/reflected origins | Small |
+| P1 | Add security headers middleware (X-Content-Type-Options, X-Frame-Options, HSTS, Referrer-Policy) | Small |
+| P2 | Make rate limiter fail-closed on Redis errors | Small |
+| P2 | Disable OpenAPI docs in production or gate behind auth | Small |
+| P3 | Sanitize or validate webhook validationToken before echo | Small |
+| P3 | Gate `/metrics` behind IP allowlist | Small |
+| P3 | Hide tenant_id/client_id from `/api/config/auth` when auth is disabled | Tiny |
+| P4 | Consider Alpine.js CSP build to remove `unsafe-eval` from script-src | Medium |
+
+---
+
+## Test Environment
+
+```
+Backend: uvicorn on localhost:8001 (auth=false, ai=false)
+MongoDB: docker container, port 27018
+Redis:   docker container, port 6380
+```
+
+*Test commands and raw outputs available in `/tmp/pen_test*.sh` scripts.*
--- a/RELEASE_NOTES_v1.7.12.md
+++ b/RELEASE_NOTES_v1.7.12.md
@@ -0,0 +1,43 @@
+# AOC v1.7.12 Release Notes
+
+**Release Date:** 2026-04-27
+
+## Security Hardening (Penetration Test Remediation)
+
+This release addresses all findings from the internal soft penetration test of v1.7.11.
+
+### Critical Fix: CORS Credentials Leak
+- **Issue:** When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, the CORS middleware reflected any origin with `Access-Control-Allow-Credentials: true`, allowing cross-origin authenticated requests from attacker-controlled domains.
+- **Fix:** When auth is enabled with a wildcard origin, `allow_credentials` is now forced to `False`. CORS still works for unauthenticated requests, but bearer tokens cannot be leaked cross-origin.
+
+### High Fix: Missing Security Headers
+- Added `X-Content-Type-Options: nosniff`
+- Added `X-Frame-Options: DENY`
+- Added `Referrer-Policy: strict-origin-when-cross-origin`
+- Added `Permissions-Policy` restricting browser features (accelerometer, camera, geolocation, gyroscope, magnetometer, microphone, payment, USB)
+
+### Medium Fixes
+- **Rate limiter fail-closed:** Previously, a Redis outage silently disabled all rate limiting. The rate limiter now returns `429` when Redis is unreachable.
+- **OpenAPI docs exposure:** `/docs`, `/redoc`, and `/openapi.json` are disabled by default. Set `DOCS_ENABLED=true` to re-enable (intended for development only).
+
+### Low Fixes
+- **Information disclosure:** `/api/config/auth` no longer leaks `tenant_id` and `client_id` when `auth_enabled=false`.
+- **Webhook validation token:** Added length cap (1024 chars) and ASCII-only validation before echoing `validationToken`. Response now includes `X-Content-Type-Options: nosniff`.
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `backend/main.py` | CORS fix, security headers middleware, conditional OpenAPI docs |
+| `backend/config.py` | Added `DOCS_ENABLED` setting |
+| `backend/rate_limiter.py` | Fail-closed on Redis errors |
+| `backend/routes/config.py` | Hide tenant/client IDs when auth disabled |
+| `backend/routes/webhooks.py` | Validate validationToken before echo |
+| `backend/tests/conftest.py` | Enhanced FakeRedis mock with `incr`/`expire` |
+| `.env.example` | Documented `DOCS_ENABLED` |
+| `VERSION` | Bumped to 1.7.12 |
+
+## Test Results
+
+- **80/80 pytest tests passing**
+- Penetration test report: `PEN_TEST_REPORT_v1.7.11.md`
--- a/RELEASE_NOTES_v1.7.13.md
+++ b/RELEASE_NOTES_v1.7.13.md
@@ -0,0 +1,34 @@
+# AOC v1.7.13 Release Notes
+
+**Release Date:** 2026-04-27
+
+## Security Hardening: Alpine.js CSP Build
+
+This release removes `unsafe-eval` from the Content-Security-Policy by switching the frontend to Alpine.js's CSP-compatible build.
+
+### Changes
+
+- **Frontend:** Switched from `alpinejs@3.x.x/dist/cdn.min.js` to `alpinejs@3.x.x/dist/csp.min.js`
+- **Frontend:** Added explicit `Alpine.start()` call on `DOMContentLoaded` (required by CSP build)
+- **Backend CSP:** Removed `'unsafe-eval'` from `script-src` directive
+
+### Why this matters
+
+The previous v1.7.11–1.7.12 releases included `'unsafe-eval'` in the CSP because the standard Alpine.js CDN build uses `new Function()` internally for reactive expression evaluation. The CSP build eliminates this requirement, further hardening the application against XSS and injection attacks.
+
+### Compatibility
+
+All existing Alpine.js directives (`x-data`, `x-init`, `x-show`, `x-text`, `x-for`, `x-if`, `x-model`, event handlers) continue to work unchanged. The CSP build uses a safe expression evaluator that produces identical behavior without `eval`/`new Function`.
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `backend/frontend/index.html` | Alpine.js src → `csp.min.js`; added `Alpine.start()` |
+| `backend/main.py` | Removed `'unsafe-eval'` from `script-src` CSP |
+| `VERSION` | Bumped to 1.7.13 |
+
+## Test Results
+
+- **80/80 pytest tests passing**
+- Ruff lint/format clean
--- a/RELEASE_NOTES_v1.7.7.md
+++ b/RELEASE_NOTES_v1.7.7.md
@@ -0,0 +1,99 @@
+# AOC v1.7.7 Release Notes
+
+**Release date:** 2026-04-24
+
+---
+
+## Security Hardening
+
+This release is a focused security patch addressing findings from an internal audit. All users running AOC in production are encouraged to upgrade.
+
+### Webhook authentication (`/api/webhooks/graph`)
+- **ClientState validation** — Notifications now require a matching `WEBHOOK_CLIENT_SECRET`. Set this in your `.env` to the same value used when creating Graph subscriptions.
+- Rejects spoofed notification payloads with `401 Unauthorized`.
+
+### Rate limiting
+- **Redis-backed fixed-window rate limiting** is now enabled by default.
+- Per-category limits:
+  - `/api/fetch-audit-logs` — 10 requests/hour
+  - `/api/ask` — 30 requests/minute
+  - `/api/events/bulk-tags` — 20 requests/minute
+  - All other endpoints — 120 requests/minute
+- Returns `429 Too Many Requests` with a `Retry-After` header when exceeded.
+
+### SSRF protection for LLM calls
+- `LLM_BASE_URL` is now validated before every outbound request.
+- Blocks non-HTTPS URLs, localhost, link-local addresses (`169.254.169.254`), and all private IP ranges.
+
+### CORS enforcement
+- Wildcard (`*`) origins are **automatically stripped** when `AUTH_ENABLED=true`.
+- A startup warning is logged if an insecure CORS configuration is detected.
+
+### Content Security Policy
+- API and HTML responses now include a `Content-Security-Policy` header.
+- Restricts script sources to self, CDN origins, and MSAL auth library.
+
+### Audit trail integrity
+- The audit middleware no longer parses JWT tokens without signature verification.
+- Verified claims are now propagated safely via `contextvars`, eliminating audit log poisoning.
+
+### Standalone MCP server
+- Prints a prominent security warning on startup reminding operators that the stdio transport has no authentication layer.
+
+---
+
+## Operational Improvements
+
+### Bulk tag cap
+- `POST /api/events/bulk-tags` now refuses to update more than **10,000 events** in a single request.
+- Returns `400` with guidance to narrow filters.
+
+### Generic error responses
+- Internal exception details are no longer leaked in HTTP 500/502 responses.
+- Full stack traces remain in server-side logs.
+
+### Alert rule schema
+- `conditions` field now uses a strict Pydantic model (`AlertCondition`) instead of an unconstrained `list[dict]`.
+- Prevents stored data pollution from malformed rule payloads.
+
+### Docker Compose
+- MongoDB (`27017`) and Redis (`6379`) ports are no longer forwarded to the Docker host.
+- Internal services are reachable only via the Docker network.
+
+---
+
+## Configuration
+
+Add to your `.env`:
+
+```bash
+# Required if you use Graph webhooks
+WEBHOOK_CLIENT_SECRET=your-random-secret
+
+# Optional: disable rate limiting (not recommended)
+RATE_LIMIT_ENABLED=true
+RATE_LIMIT_REQUESTS=120
+RATE_LIMIT_WINDOW_SECONDS=60
+```
+
+---
+
+## Upgrade notes
+
+**No breaking changes.** Existing event data, tags, comments, and saved searches are preserved.
+
+After pulling:
+
+```bash
+export AOC_VERSION=v1.7.7
+docker compose -f docker-compose.prod.yml pull
+docker compose -f docker-compose.prod.yml up -d
+```
+
+---
+
+## Docker image
+
+```
+git.cqre.net/cqrenet/aoc-backend:v1.7.7
+```
--- a/2
+++ b/2
@@ -1 +1 @@
-1.7.2
+1.7.13
--- a/backend/auth.py
+++ b/backend/auth.py
@@ -1,3 +1,4 @@
+import contextvars
 import time

 import requests
@@ -15,6 +16,9 @@ from fastapi import Header, HTTPException
 from jwt import ExpiredSignatureError, InvalidTokenError, decode
 from jwt.algorithms import RSAAlgorithm

+# Thread-/task-local storage for verified auth claims (used by audit middleware)
+_auth_context: contextvars.ContextVar[dict | None] = contextvars.ContextVar("auth_context", default=None)
+
 JWKS_CACHE = {"exp": 0, "keys": []}
 logger = structlog.get_logger("aoc.auth")

@@ -94,7 +98,9 @@ def user_can_access_privacy_services(claims: dict) -> bool:

 def require_auth(authorization: str | None = Header(None)):
    if not AUTH_ENABLED:
-        return {"sub": "anonymous"}
+        user = {"sub": "anonymous"}
+        _auth_context.set(user)
+        return user

    if not authorization or not authorization.lower().startswith("bearer "):
        raise HTTPException(status_code=401, detail="Missing bearer token")
@@ -106,4 +112,5 @@ def require_auth(authorization: str | None = Header(None)):
    if not _allowed(claims, AUTH_ALLOWED_ROLES, AUTH_ALLOWED_GROUPS):
        raise HTTPException(status_code=403, detail="Forbidden")

+    _auth_context.set(claims)
    return claims
--- a/backend/config.py
+++ b/backend/config.py
@@ -68,6 +68,18 @@ class Settings(BaseSettings):
    ALERT_WEBHOOK_FORMAT: str = "generic"  # generic | slack | teams
    ALERT_DEDUPE_MINUTES: int = 15

+    # Webhook security
+    WEBHOOK_CLIENT_SECRET: str = ""
+
+    # Rate limiting
+    RATE_LIMIT_ENABLED: bool = True
+    RATE_LIMIT_REQUESTS: int = 120
+    RATE_LIMIT_WINDOW_SECONDS: int = 60
+
+    # Security / docs exposure
+    DOCS_ENABLED: bool = False
+    METRICS_ALLOWED_IPS: str = "127.0.0.1,::1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
+

 _settings = Settings()

@@ -113,3 +125,12 @@ DEFAULT_PAGE_SIZE = _settings.DEFAULT_PAGE_SIZE
 ALERT_WEBHOOK_URL = _settings.ALERT_WEBHOOK_URL
 ALERT_WEBHOOK_FORMAT = _settings.ALERT_WEBHOOK_FORMAT
 ALERT_DEDUPE_MINUTES = _settings.ALERT_DEDUPE_MINUTES
+
+WEBHOOK_CLIENT_SECRET = _settings.WEBHOOK_CLIENT_SECRET
+
+RATE_LIMIT_ENABLED = _settings.RATE_LIMIT_ENABLED
+RATE_LIMIT_REQUESTS = _settings.RATE_LIMIT_REQUESTS
+RATE_LIMIT_WINDOW_SECONDS = _settings.RATE_LIMIT_WINDOW_SECONDS
+
+DOCS_ENABLED = _settings.DOCS_ENABLED
+METRICS_ALLOWED_IPS = _settings.METRICS_ALLOWED_IPS
--- a/backend/database.py
+++ b/backend/database.py
@@ -12,6 +12,20 @@ alerts_collection = db["alerts"]
 logger = structlog.get_logger("aoc.database")


+def _dedupe_alert_rules():
+    """Remove duplicate alert_rules by name, keeping the oldest document."""
+    try:
+        pipeline = [
+            {"$sort": {"_id": ASCENDING}},
+            {"$group": {"_id": "$name", "first_id": {"$first": "$_id"}}},
+        ]
+        seen = {doc["_id"]: doc["first_id"] for doc in db["alert_rules"].aggregate(pipeline)}
+        for name, keep_id in seen.items():
+            db["alert_rules"].delete_many({"name": name, "_id": {"$ne": keep_id}})
+    except Exception:
+        pass  # Collection may not exist yet
+
+
 def setup_indexes(max_retries: int = 5, delay: float = 2.0):
    """Ensure MongoDB indexes exist. Retries on connection errors."""
    from time import sleep
@@ -23,6 +37,8 @@ def setup_indexes(max_retries: int = 5, delay: float = 2.0):
            events_collection.create_index([("service", ASCENDING), ("timestamp", DESCENDING)])
            events_collection.create_index("id")
            saved_searches_collection.create_index([("created_by", ASCENDING), ("created_at", DESCENDING)])
+            _dedupe_alert_rules()
+            db["alert_rules"].create_index("name", unique=True)
            events_collection.create_index(
                [("actor_display", TEXT), ("raw_text", TEXT), ("operation", TEXT)],
                name="text_search_index",
--- a/backend/frontend/index.html
+++ b/backend/frontend/index.html
@@ -4,7 +4,7 @@
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Admin Operations Center</title>
-  <link rel="stylesheet" href="/style.css?v=14" />
+  <link rel="stylesheet" href="/style.css?v=15" />
  <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
  <script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
 </head>
@@ -56,8 +56,11 @@
    </header>

    <section class="panel">
-      <h3>Source Health</h3>
-      <div class="source-health">
+      <div class="panel-header panel-header--collapsible" @click="togglePanel('sourceHealth')">
+        <h3>Source Health</h3>
+        <span class="panel-toggle" :class="panelState.sourceHealth ? 'panel-toggle--open' : ''">▸</span>
+      </div>
+      <div x-show="panelState.sourceHealth">
        <template x-for="src in sourceHealth" :key="src.source">
          <div class="health-card">
            <strong x-text="src.source"></strong>
@@ -71,11 +74,15 @@
    </section>

    <section class="panel">
-      <div class="panel-header">
+      <div class="panel-header panel-header--collapsible" @click="togglePanel('alerts')">
        <h3>Alerts</h3>
-        <span x-text="`${alertSummary.total_open} open`" class="alert-open-count"></span>
+        <div style="display:flex;align-items:center;gap:10px;">
+          <span x-text="`${alertSummary.total_open} open`" class="alert-open-count"></span>
+          <span class="panel-toggle" :class="panelState.alerts ? 'panel-toggle--open' : ''">▸</span>
+        </div>
      </div>
-      <div class="alert-filters">
+      <div x-show="panelState.alerts">
+        <div class="alert-filters">
        <select x-model="alertsFilter.status" @change="alertsPage = 1; loadAlerts()">
          <option value="">All statuses</option>
          <option value="open">Open</option>
@@ -117,14 +124,19 @@
        <span x-text="`Page ${alertsPage}`"></span>
        <button type="button" :disabled="alertsPage * 20 >= alertsTotal" @click="alertsPage++; loadAlerts()">Next</button>
      </div>
+      </div>
    </section>

    <section class="panel">
-      <div class="panel-header">
+      <div class="panel-header panel-header--collapsible" @click="togglePanel('rules')">
        <h3>Alert Rules</h3>
-        <button type="button" class="btn--compact" @click="openRuleEditor()">+ Add rule</button>
+        <div style="display:flex;align-items:center;gap:10px;">
+          <button type="button" class="btn--compact" @click.stop="openRuleEditor()">+ Add rule</button>
+          <span class="panel-toggle" :class="panelState.rules ? 'panel-toggle--open' : ''">▸</span>
+        </div>
      </div>
-      <div class="rules-list">
+      <div x-show="panelState.rules">
+        <div class="rules-list">
        <template x-for="rule in rules" :key="rule.id">
          <div class="rule-card" :class="rule.enabled ? '' : 'rule-card--disabled'">
            <div class="rule-card__meta">
@@ -151,6 +163,7 @@
      <div class="rules-empty" x-show="rules.length === 0">
        <p>No custom rules yet. Pre-built admin-ops rules are active by default. Add your own rules to detect specific patterns.</p>
      </div>
+      </div>

      <div id="ruleModal" class="modal hidden" role="dialog" aria-modal="true" :class="{ 'hidden': !ruleModalOpen }">
        <div class="modal__content" style="max-width: 600px;">
@@ -210,7 +223,11 @@
    </section>

    <section class="panel">
-      <form id="filters" class="filters" @submit.prevent="resetPagination(); loadEvents()">
+      <div class="panel-header panel-header--collapsible" @click="togglePanel('filters')">
+        <h3>Filters</h3>
+        <span class="panel-toggle" :class="panelState.filters ? 'panel-toggle--open' : ''">▸</span>
+      </div>
+      <form id="filters" class="filters" @submit.prevent="resetPagination(); loadEvents()" x-show="panelState.filters">
        <div class="filter-row">
          <label>
            User (name/UPN)
@@ -304,8 +321,11 @@
    </section>

    <section class="panel" x-show="aiFeaturesEnabled">
-      <h3>Ask a question</h3>
-      <form class="ask-form" @submit.prevent="askQuestion()">
+      <div class="panel-header panel-header--collapsible" @click="togglePanel('ask')">
+        <h3>Ask a question</h3>
+        <span class="panel-toggle" :class="panelState.ask ? 'panel-toggle--open' : ''">▸</span>
+      </div>
+      <form class="ask-form" @submit.prevent="askQuestion()" x-show="panelState.ask">
        <div class="ask-row">
          <input
            type="text"
@@ -347,11 +367,15 @@
    </section>

    <section class="panel">
-      <div class="panel-header">
+      <div class="panel-header panel-header--collapsible" @click="togglePanel('events')">
        <h2>Events</h2>
-        <span id="count" x-text="countText"></span>
+        <div style="display:flex;align-items:center;gap:10px;">
+          <span id="count" x-text="countText"></span>
+          <span class="panel-toggle" :class="panelState.events ? 'panel-toggle--open' : ''">▸</span>
+        </div>
      </div>
-      <div id="status" class="status" aria-live="polite" x-text="statusText"></div>
+      <div x-show="panelState.events">
+        <div id="status" class="status" aria-live="polite" x-text="statusText"></div>
      <div id="events" class="events">
        <template x-for="(evt, idx) in events" :key="evt._id || evt.id || idx">
          <article class="event">
@@ -391,6 +415,7 @@
        <span x-text="`Page ${cursorStack.length + 1}`"></span>
        <button type="button" id="nextPage" :disabled="!nextCursor" @click="goNext()">Next</button>
      </div>
+      </div>
    </section>

    <div id="modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="modalTitle" :class="{ 'hidden': !modalOpen }">
@@ -452,6 +477,7 @@
        filters: {
          actor: '', selectedServices: [], search: '', operation: '', result: '', start: '', end: '', limit: 24, includeTags: '', excludeTags: '',
        },
+        panelState: { sourceHealth: true, alerts: true, rules: true, filters: true, ask: true, events: true },
        options: { actors: [], services: [], operations: [], results: [] },
        savedSearches: [],
        appVersion: '',
@@ -479,6 +505,7 @@
          await this.loadVersion();
          await this.initAuth();
          this.loadSavedFilters();
+          this.loadPanelState();
          if (!this.authConfig?.auth_enabled || this.accessToken) {
            await this.loadFilterOptions();
            await this.loadSavedSearches();
@@ -508,6 +535,27 @@
          } catch {}
        },

+        loadPanelState() {
+          try {
+            const saved = localStorage.getItem('aoc_panels');
+            if (saved) {
+              const parsed = JSON.parse(saved);
+              Object.keys(parsed).forEach((k) => { if (this.panelState[k] !== undefined) this.panelState[k] = parsed[k]; });
+            }
+          } catch {}
+        },
+
+        savePanelState() {
+          try {
+            localStorage.setItem('aoc_panels', JSON.stringify(this.panelState));
+          } catch {}
+        },
+
+        togglePanel(key) {
+          this.panelState[key] = !this.panelState[key];
+          this.savePanelState();
+        },
+
        async loadVersion() {
          try {
            const res = await fetch('/api/version');
@@ -543,9 +591,15 @@
        async initAuth() {
          try {
            const res = await fetch('/api/config/auth');
-            this.authConfig = await res.json();
-          } catch {
-            this.authConfig = { auth_enabled: false };
+            if (!res.ok) {
+              console.error('Auth config fetch failed:', res.status, res.statusText);
+              this.authConfig = { auth_enabled: false, _error: res.status };
+            } else {
+              this.authConfig = await res.json();
+            }
+          } catch (err) {
+            console.error('Auth config fetch error:', err);
+            this.authConfig = { auth_enabled: false, _error: 'network' };
          }

          try {
@@ -566,7 +620,17 @@
          }

          if (!this.authConfig?.auth_enabled) {
-            this.authBtnText = '';
+            this.authBtnText = 'Auth: OFF';
+            console.warn('AOC auth is disabled. Set AUTH_ENABLED=true in .env to enable login.');
+            return;
+          }
+
+          const tenantId = this.authConfig.tenant_id;
+          const clientId = this.authConfig.client_id;
+          if (!clientId || !tenantId) {
+            this.authBtnText = 'Auth: misconfigured';
+            this.statusText = 'Auth is enabled but client_id or tenant_id is missing. Check .env configuration.';
+            console.error('AOC auth misconfigured: missing client_id or tenant_id in /api/config/auth');
            return;
          }

@@ -575,8 +639,6 @@
            return;
          }

-          const tenantId = this.authConfig.tenant_id;
-          const clientId = this.authConfig.client_id;
          const baseScope = this.authConfig.scope || "";
          this.authScopes = Array.from(new Set(['openid', 'profile', 'email', ...baseScope.split(/[ ,]+/).filter(Boolean)]));
          const authority = `https://login.microsoftonline.com/${tenantId}`;
@@ -1212,5 +1274,6 @@
      };
    }
  </script>
+
 </body>
 </html>
--- a/backend/frontend/style.css
+++ b/backend/frontend/style.css
@@ -274,6 +274,31 @@ input {
  margin-bottom: 8px;
 }

+.panel-header--collapsible {
+  cursor: pointer;
+  user-select: none;
+  padding: 4px 0;
+  margin-bottom: 0;
+}
+
+.panel-header--collapsible:hover {
+  opacity: 0.85;
+}
+
+.panel-toggle {
+  display: inline-block;
+  font-size: 14px;
+  color: var(--muted);
+  transition: transform 0.2s ease;
+  transform: rotate(-90deg);
+  width: 16px;
+  text-align: center;
+}
+
+.panel-toggle--open {
+  transform: rotate(0deg);
+}
+
 #count {
  color: var(--muted);
  font-size: 14px;
--- a/backend/main.py
+++ b/backend/main.py
@@ -1,12 +1,22 @@
 import asyncio
+import ipaddress
 import logging
+import os
 import time
 from contextlib import suppress
 from pathlib import Path

 import structlog
 from audit_trail import log_action
-from config import AI_FEATURES_ENABLED, CORS_ORIGINS, ENABLE_PERIODIC_FETCH, FETCH_INTERVAL_MINUTES
+from config import (
+    AI_FEATURES_ENABLED,
+    AUTH_ENABLED,
+    CORS_ORIGINS,
+    DOCS_ENABLED,
+    ENABLE_PERIODIC_FETCH,
+    FETCH_INTERVAL_MINUTES,
+    METRICS_ALLOWED_IPS,
+)
 from database import setup_indexes
 from fastapi import FastAPI, HTTPException, Request
 from fastapi.middleware.cors import CORSMiddleware
@@ -50,13 +60,28 @@ def configure_logging():
 configure_logging()
 logger = structlog.get_logger("aoc.fetcher")

-app = FastAPI()
+# Disable OpenAPI docs in production by default
+app = FastAPI(
+    docs_url="/docs" if DOCS_ENABLED else None,
+    redoc_url="/redoc" if DOCS_ENABLED else None,
+    openapi_url="/openapi.json" if DOCS_ENABLED else None,
+)
+
+# CORS: when auth is enabled, never allow credentials with wildcard origins
+_effective_cors = CORS_ORIGINS
+_cors_credentials = True
+if AUTH_ENABLED and "*" in _effective_cors:
+    logger.warning(
+        "CORS wildcard (*) is insecure with AUTH_ENABLED=true and allow_credentials. "
+        "Disabling credentials. Set CORS_ORIGINS to your actual origin(s)."
+    )
+    _cors_credentials = False

 app.add_middleware(CorrelationIdMiddleware)
 app.add_middleware(
    CORSMiddleware,
-    allow_origins=CORS_ORIGINS,
-    allow_credentials=True,
+    allow_origins=_effective_cors,
+    allow_credentials=_cors_credentials,
    allow_methods=["*"],
    allow_headers=["*"],
 )
@@ -73,34 +98,58 @@ async def prometheus_middleware(request: Request, call_next):


@app.middleware("http")
-async def cache_control_middleware(request: Request, call_next):
+async def security_headers_middleware(request: Request, call_next):
    response = await call_next(request)
    # Prevent caching of HTML and API responses by default
    if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
        response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
        response.headers["Pragma"] = "no-cache"
        response.headers["Expires"] = "0"
+    # Basic CSP for the UI and API (allows MSAL auth flows)
+    if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
+        response.headers["Content-Security-Policy"] = (
+            "default-src 'self'; "
+            "script-src 'self' 'unsafe-inline' 'unsafe-eval' cdn.jsdelivr.net alcdn.msauth.net; "
+            "style-src 'self' 'unsafe-inline'; "
+            "connect-src 'self' https://login.microsoftonline.com; "
+            "frame-src 'self' https://login.microsoftonline.com; "
+            "form-action 'self' https://login.microsoftonline.com; "
+            "img-src 'self' data:; "
+            "font-src 'self' data:;"
+        )
+    # Additional security headers
+    response.headers["X-Content-Type-Options"] = "nosniff"
+    response.headers["X-Frame-Options"] = "DENY"
+    response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
+    response.headers["Permissions-Policy"] = (
+        "accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()"
+    )
    return response


+@app.middleware("http")
+async def rate_limit_middleware(request: Request, call_next):
+    """Apply Redis-backed rate limiting before processing the request."""
+    # Exempt config and health endpoints from rate limiting
+    exempt_paths = {"/api/config/auth", "/api/config/features", "/health", "/metrics"}
+    if request.url.path.startswith("/api/") and request.url.path not in exempt_paths:
+        from rate_limiter import check_rate_limit
+
+        await check_rate_limit(request)
+    return await call_next(request)
+
+
@app.middleware("http")
 async def audit_middleware(request: Request, call_next):
    response = await call_next(request)
    if request.url.path.startswith("/api/") and request.method in ("POST", "PATCH", "PUT", "DELETE"):
-        from auth import AUTH_ENABLED
-
        user = "anonymous"
        if AUTH_ENABLED:
-            auth_header = request.headers.get("authorization", "")
-            if auth_header.lower().startswith("bearer "):
-                try:
-                    from jose import jwt
+            from auth import _auth_context

-                    token = auth_header.split(" ", 1)[1]
-                    claims = jwt.get_unverified_claims(token)
-                    user = claims.get("sub", "unknown")
-                except Exception:
-                    pass
+            claims = _auth_context.get(None)
+            if isinstance(claims, dict):
+                user = claims.get("sub", "unknown")
        log_action(
            action=request.method.lower(),
            resource=request.url.path,
@@ -140,18 +189,66 @@ async def health_check():
        raise HTTPException(status_code=503, detail="Database unavailable") from exc


+def _client_ip(request: Request) -> str:
+    """Best-effort client IP: X-Forwarded-For first hop, or direct client host."""
+    forwarded = request.headers.get("x-forwarded-for")
+    if forwarded:
+        return forwarded.split(",")[0].strip()
+    return request.client.host if request.client else ""
+
+
+def _is_metrics_allowed(ip: str) -> bool:
+    """Check if IP is in the configured metrics allowlist."""
+    if not METRICS_ALLOWED_IPS:
+        return True
+    try:
+        client_addr = ipaddress.ip_address(ip)
+    except ValueError:
+        return False
+    for network in METRICS_ALLOWED_IPS.split(","):
+        network = network.strip()
+        if not network:
+            continue
+        try:
+            if client_addr in ipaddress.ip_network(network, strict=False):
+                return True
+        except ValueError:
+            continue
+    return False
+
+
@app.get("/metrics")
-async def metrics():
+async def metrics(request: Request):
+    client_ip = _client_ip(request)
+    if not _is_metrics_allowed(client_ip):
+        raise HTTPException(status_code=403, detail="Forbidden")
    return Response(content=prometheus_metrics(), media_type="text/plain")


@app.get("/api/version")
 async def version():
-    import os
-
    return {"version": os.environ.get("VERSION", "unknown")}


+@app.exception_handler(Exception)
+async def generic_exception_handler(request: Request, exc: Exception):
+    """Return generic error messages for unhandled exceptions to avoid info leakage."""
+    if isinstance(exc, HTTPException):
+        from fastapi.responses import JSONResponse
+
+        return JSONResponse(
+            status_code=exc.status_code,
+            content={"detail": exc.detail},
+            headers=getattr(exc, "headers", None) or {},
+        )
+    logger.error("Unhandled exception", path=request.url.path, error=str(exc))
+    return Response(
+        content='{"detail":"Internal server error"}',
+        status_code=500,
+        media_type="application/json",
+    )
+
+
 frontend_dir = Path(__file__).parent / "frontend"
 app.mount("/", StaticFiles(directory=frontend_dir, html=True), name="frontend")

@@ -172,6 +269,12 @@ async def start_periodic_fetch():
    from rules import seed_default_rules

    seed_default_rules()
+    logger.info(
+        "AOC startup",
+        version=os.environ.get("VERSION", "unknown"),
+        auth_enabled=AUTH_ENABLED,
+        ai_enabled=AI_FEATURES_ENABLED,
+    )
    if ENABLE_PERIODIC_FETCH:
        app.state.fetch_task = asyncio.create_task(_periodic_fetch())

--- a/backend/mcp_server.py
+++ b/backend/mcp_server.py
@@ -41,6 +41,15 @@ from mcp_common import (
    handle_search_events,
 )

+# Security warning: this standalone stdio server has no authentication.
+# Only run it in trusted environments (e.g. local Claude Desktop) and
+# ensure the MongoDB connection uses authenticated credentials.
+print("=" * 60, file=sys.stderr)
+print("AOC MCP Server (stdio transport)", file=sys.stderr)
+print("WARNING: No authentication layer. Only run in trusted", file=sys.stderr)
+print("environments or behind a VPN. See AGENTS.md for details.", file=sys.stderr)
+print("=" * 60, file=sys.stderr)
+
 app = Server("aoc")


--- a/backend/models/api.py
+++ b/backend/models/api.py
@@ -63,12 +63,18 @@ class CommentAddRequest(BaseModel):
    text: str


+class AlertCondition(BaseModel):
+    field: str
+    op: str  # eq, neq, contains, in, after_hours
+    value: str | list[str] | None = None
+
+
 class AlertRuleResponse(BaseModel):
    id: str | None = None
    name: str
    enabled: bool
    severity: str
-    conditions: list[dict]
+    conditions: list[AlertCondition]
    message: str


--- a/backend/rate_limiter.py
+++ b/backend/rate_limiter.py
@@ -0,0 +1,83 @@
+"""Simple Redis-backed fixed-window rate limiter."""
+
+import time
+
+import structlog
+from config import RATE_LIMIT_ENABLED, RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW_SECONDS
+from fastapi import HTTPException, Request
+from redis_client import get_redis
+
+logger = structlog.get_logger("aoc.rate_limit")
+
+
+class RateLimitExceeded(HTTPException):
+    def __init__(self, retry_after: int):
+        super().__init__(
+            status_code=429,
+            detail="Rate limit exceeded. Please slow down.",
+            headers={"Retry-After": str(retry_after)},
+        )
+
+
+def _get_identifier(request: Request) -> str:
+    """Best-effort client identifier: authenticated sub, or X-Forwarded-For, or client host."""
+    user = getattr(request.state, "user", None)
+    if user and isinstance(user, dict):
+        sub = user.get("sub")
+        if sub and sub != "anonymous":
+            return f"user:{sub}"
+
+    forwarded = request.headers.get("x-forwarded-for")
+    if forwarded:
+        return f"ip:{forwarded.split(',')[0].strip()}"
+
+    return f"ip:{request.client.host if request.client else 'unknown'}"
+
+
+def _get_path_category(path: str) -> str:
+    """Bucket paths into rate-limit categories."""
+    if path.startswith("/api/fetch"):
+        return "fetch"
+    if path.startswith("/api/ask"):
+        return "ask"
+    if path.startswith("/api/events/bulk-tags"):
+        return "write"
+    return "default"
+
+
+def _limit_for_category(category: str) -> tuple[int, int]:
+    """Return (max_requests, window_seconds) for a category."""
+    if category == "fetch":
+        return (10, 3600)  # 10 per hour
+    if category == "ask":
+        return (30, 60)  # 30 per minute
+    if category == "write":
+        return (20, 60)  # 20 per minute
+    return (RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW_SECONDS)
+
+
+async def check_rate_limit(request: Request):
+    """Raise RateLimitExceeded if the client has exceeded their quota."""
+    if not RATE_LIMIT_ENABLED:
+        return
+
+    category = _get_path_category(request.url.path)
+    limit, window = _limit_for_category(category)
+
+    identifier = _get_identifier(request)
+    now = int(time.time())
+    window_key = now // window
+    redis_key = f"rate_limit:{identifier}:{category}:{window_key}"
+
+    try:
+        redis = await get_redis()
+        count = await redis.incr(redis_key)
+        if count == 1:
+            await redis.expire(redis_key, window)
+        if count > limit:
+            raise RateLimitExceeded(retry_after=window - (now % window))
+    except RateLimitExceeded:
+        raise
+    except Exception as exc:
+        logger.warning("Rate limiter Redis error; failing closed", error=str(exc))
+        raise RateLimitExceeded(retry_after=60) from None
--- a/backend/routes/ask.py
+++ b/backend/routes/ask.py
@@ -397,8 +397,31 @@ def _format_events_for_llm(
    return "\n".join(lines)


+def _validate_llm_url(url: str):
+    """Prevent SSRF by rejecting internal/reserved addresses."""
+    from urllib.parse import urlparse
+
+    parsed = urlparse(url)
+    if parsed.scheme != "https":
+        raise RuntimeError("LLM_BASE_URL must use HTTPS")
+    hostname = (parsed.hostname or "").lower()
+    if not hostname:
+        raise RuntimeError("LLM_BASE_URL must have a valid hostname")
+    blocked = {"localhost", "127.0.0.1", "0.0.0.0", "::1", "169.254.169.254"}
+    if hostname in blocked:
+        raise RuntimeError(f"LLM_BASE_URL hostname '{hostname}' is not allowed")
+    # Block link-local and private IP ranges
+    import ipaddress
+
+    try:
+        ip = ipaddress.ip_address(hostname)
+        if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
+            raise RuntimeError(f"LLM_BASE_URL IP '{hostname}' is not allowed")
+    except ValueError:
+        pass  # hostname is not an IP, which is fine
+
+
 def _build_chat_url(base_url: str, api_version: str) -> str:
-    """Construct the chat completions URL, handling Azure OpenAI endpoints."""
    base = base_url.rstrip("/")
    url = base if base.endswith("/chat/completions") else f"{base}/chat/completions"
    if api_version:
@@ -424,6 +447,9 @@ async def _call_llm(
        },
    ]

+    # SSRF guard: only allow known public HTTPS endpoints
+    _validate_llm_url(LLM_BASE_URL)
+
    url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION)
    headers = {
        "Content-Type": "application/json",
@@ -570,6 +596,8 @@ async def _explain_event(event: dict, related: list[dict]) -> str:
        },
    ]

+    _validate_llm_url(LLM_BASE_URL)
+
    url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION)
    headers = {"Content-Type": "application/json"}
    if "azure" in LLM_BASE_URL.lower() or "cognitiveservices" in LLM_BASE_URL.lower():
@@ -731,7 +759,7 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
        raw_events = list(cursor)
    except Exception as exc:
        logger.error("Failed to query events for ask", error=str(exc))
-        raise HTTPException(status_code=500, detail=f"Database query failed: {exc}") from exc
+        raise HTTPException(status_code=500, detail="Database query failed") from exc

    for e in raw_events:
        e["_id"] = str(e.get("_id", ""))
@@ -803,7 +831,6 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
                    "total_matched": total,
                    "services_queried": query_services,
                    "excluded_services": excluded_services,
-                    "mongo_query": json.dumps(query, default=str),
                },
                llm_used=False,
                llm_error=None,
@@ -863,7 +890,6 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
            "total_matched": total,
            "services_queried": query_services,
            "excluded_services": excluded_services,
-            "mongo_query": json.dumps(query, default=str),
        },
        llm_used=llm_used,
        llm_error=llm_error,
--- a/backend/routes/config.py
+++ b/backend/routes/config.py
@@ -1,3 +1,4 @@
+import structlog
 from config import (
    AI_FEATURES_ENABLED,
    AUTH_CLIENT_ID,
@@ -9,14 +10,16 @@ from config import (
 from fastapi import APIRouter

 router = APIRouter()
+logger = structlog.get_logger("aoc.config")


@router.get("/config/auth")
 def auth_config():
+    logger.debug("Auth config requested", auth_enabled=AUTH_ENABLED)
    return {
        "auth_enabled": AUTH_ENABLED,
-        "tenant_id": AUTH_TENANT_ID,
-        "client_id": AUTH_CLIENT_ID,
+        "tenant_id": AUTH_TENANT_ID if AUTH_ENABLED else "",
+        "client_id": AUTH_CLIENT_ID if AUTH_ENABLED else "",
        "scope": AUTH_SCOPE,
        "redirect_uri": None,  # frontend uses window.location.origin by default
    }
--- a/backend/routes/events.py
+++ b/backend/routes/events.py
@@ -158,7 +158,7 @@ def list_events(
        cursor_query = events_collection.find(query).sort([("timestamp", -1), ("_id", -1)]).limit(safe_page_size)
        events = list(cursor_query)
    except Exception as exc:
-        raise HTTPException(status_code=500, detail=f"Failed to query events: {exc}") from exc
+        raise HTTPException(status_code=500, detail="Failed to query events") from exc

    next_cursor = None
    if len(events) == safe_page_size:
@@ -241,9 +241,17 @@ def bulk_tags(
    update = {"$set": {"tags": tags}} if body.mode == "replace" else {"$addToSet": {"tags": {"$each": tags}}}

    try:
+        matched = events_collection.count_documents(query, limit=10001)
+        if matched > 10000:
+            raise HTTPException(
+                status_code=400,
+                detail="Bulk tag update matches too many events (>10000). Narrow your filters.",
+            )
        result_obj = events_collection.update_many(query, update)
+    except HTTPException:
+        raise
    except Exception as exc:
-        raise HTTPException(status_code=500, detail=f"Failed to update tags: {exc}") from exc
+        raise HTTPException(status_code=500, detail="Failed to update tags") from exc

    log_action(
        "bulk_tags",
@@ -268,7 +276,7 @@ def filter_options(
        actor_upns = sorted([a for a in events_collection.distinct("actor_upn") if a])[:safe_limit]
        devices = sorted([a for a in events_collection.distinct("target_displays") if isinstance(a, str)])[:safe_limit]
    except Exception as exc:
-        raise HTTPException(status_code=500, detail=f"Failed to load filter options: {exc}") from exc
+        raise HTTPException(status_code=500, detail="Failed to load filter options") from exc

    if not user_can_access_privacy_services(user):
        services = [s for s in services if s not in PRIVACY_SERVICES]
--- a/backend/routes/fetch.py
+++ b/backend/routes/fetch.py
@@ -1,5 +1,6 @@
 import time

+import structlog
 from audit_trail import log_action
 from auth import require_auth
 from config import ALERTS_ENABLED
@@ -15,6 +16,8 @@ from sources.intune_audit import fetch_intune_audit
 from sources.unified_audit import fetch_unified_audit
 from watermark import get_watermark, set_watermark

+logger = structlog.get_logger("aoc.fetch")
+
 router = APIRouter(dependencies=[Depends(require_auth)])


@@ -85,5 +88,8 @@ def fetch_logs(
            user.get("sub", "anonymous"),
        )
        return result
+    except HTTPException:
+        raise
    except Exception as exc:
-        raise HTTPException(status_code=502, detail=str(exc)) from exc
+        logger.error("Fetch failed", error=str(exc))
+        raise HTTPException(status_code=502, detail="Failed to fetch audit logs") from exc
--- a/backend/routes/webhooks.py
+++ b/backend/routes/webhooks.py
@@ -1,4 +1,5 @@
 import structlog
+from config import WEBHOOK_CLIENT_SECRET
 from fastapi import APIRouter, Request, Response

 router = APIRouter()
@@ -10,10 +11,21 @@ async def graph_webhook(request: Request):
    """
    Receive Microsoft Graph change notifications.
    Handles the validation handshake by echoing validationToken.
+    Validates clientState on notifications to prevent spoofing.
    """
    validation_token = request.query_params.get("validationToken")
    if validation_token:
-        return Response(content=validation_token, media_type="text/plain")
+        # Microsoft sends validationToken as a query param during subscription creation.
+        # Echo it back as plain text to prove endpoint ownership.
+        # Validate to prevent content injection if endpoint is hit directly.
+        if len(validation_token) > 1024 or not validation_token.isascii():
+            logger.warning("Invalid validationToken rejected", length=len(validation_token))
+            return Response(status_code=400)
+        return Response(
+            content=validation_token,
+            media_type="text/plain",
+            headers={"X-Content-Type-Options": "nosniff"},
+        )

    try:
        body = await request.json()
@@ -21,12 +33,26 @@ async def graph_webhook(request: Request):
        logger.warning("Invalid webhook payload", error=str(exc))
        return Response(status_code=400)

-    for notification in body.get("value", []):
+    notifications = body.get("value", [])
+    if not isinstance(notifications, list):
+        logger.warning("Invalid webhook payload structure")
+        return Response(status_code=400)
+
+    for notification in notifications:
+        client_state = notification.get("clientState")
+        if WEBHOOK_CLIENT_SECRET and client_state != WEBHOOK_CLIENT_SECRET:
+            logger.warning(
+                "Graph webhook rejected: invalid clientState",
+                change_type=notification.get("changeType"),
+                resource=notification.get("resource"),
+            )
+            return Response(status_code=401)
+
        logger.info(
            "Received Graph notification",
            change_type=notification.get("changeType"),
            resource=notification.get("resource"),
-            client_state=notification.get("clientState"),
+            client_state=client_state,
        )

    return {"status": "accepted"}
--- a/backend/rules.py
+++ b/backend/rules.py
@@ -12,6 +12,7 @@ from datetime import UTC, datetime, timedelta
 import structlog
 from config import ALERT_DEDUPE_MINUTES, ALERT_WEBHOOK_FORMAT, ALERT_WEBHOOK_URL
 from database import db
+from pymongo import ASCENDING

 logger = structlog.get_logger("aoc.rules")
 rules_collection = db["alert_rules"]
@@ -136,9 +137,15 @@ def _create_alert(rule: dict, event: dict):


 def seed_default_rules():
-    """Insert pre-built admin-ops rule templates if the collection is empty."""
-    if rules_collection.count_documents({}) > 0:
-        return
+    """Upsert pre-built admin-ops rule templates. Safe for concurrent startup."""
+    # One-time cleanup: remove duplicates by name, keep the oldest (_id ascending)
+    pipeline = [
+        {"$sort": {"_id": ASCENDING}},
+        {"$group": {"_id": "$name", "first_id": {"$first": "$_id"}}},
+    ]
+    seen = {doc["_id"]: doc["first_id"] for doc in rules_collection.aggregate(pipeline)}
+    for name, keep_id in seen.items():
+        rules_collection.delete_many({"name": name, "_id": {"$ne": keep_id}})

    defaults = [
        {
@@ -261,8 +268,17 @@ def seed_default_rules():
        },
    ]

-    try:
-        rules_collection.insert_many(defaults)
-        logger.info("Default admin-ops rules seeded", count=len(defaults))
-    except Exception as exc:
-        logger.warning("Failed to seed default rules", error=str(exc))
+    inserted = 0
+    for rule in defaults:
+        try:
+            result = rules_collection.replace_one(
+                {"name": rule["name"]},
+                rule,
+                upsert=True,
+            )
+            if result.upserted_id:
+                inserted += 1
+        except Exception as exc:
+            logger.warning("Failed to seed rule", rule=rule["name"], error=str(exc))
+    if inserted:
+        logger.info("Default admin-ops rules seeded", inserted=inserted, total=len(defaults))
--- a/backend/tests/conftest.py
+++ b/backend/tests/conftest.py
@@ -51,18 +51,32 @@ def client(mock_events_collection, mock_watermarks_collection, monkeypatch):

    # Mock Redis so tests don't require a running Redis server
    class FakeRedis:
+        _store = {}
+
        async def get(self, key):
-            return None
+            return self._store.get(key)

        async def setex(self, key, ttl, value):
+            self._store[key] = value
+
+        async def incr(self, key):
+            self._store[key] = self._store.get(key, 0) + 1
+            return self._store[key]
+
+        async def expire(self, key, ttl):
            pass

    async def fake_get_arq_pool():
        return FakeRedis()

+    async def fake_get_redis():
+        return FakeRedis()
+
    monkeypatch.setattr("redis_client.get_arq_pool", fake_get_arq_pool)
+    monkeypatch.setattr("redis_client.get_redis", fake_get_redis)
    monkeypatch.setattr("routes.ask.get_arq_pool", fake_get_arq_pool)
-    monkeypatch.setattr("routes.jobs.get_redis", fake_get_arq_pool)
+    monkeypatch.setattr("routes.jobs.get_redis", fake_get_redis)
+    monkeypatch.setattr("rate_limiter.get_redis", fake_get_redis)

    from main import app

--- a/backend/tests/test_api.py
+++ b/backend/tests/test_api.py
@@ -268,7 +268,7 @@ def test_health(client):


 def test_metrics(client):
-    response = client.get("/metrics")
+    response = client.get("/metrics", headers={"X-Forwarded-For": "127.0.0.1"})
    assert response.status_code == 200
    assert "aoc_request_duration_seconds" in response.text

--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -3,8 +3,7 @@ services:
    image: valkey/valkey:8-alpine
    container_name: aoc-redis
    restart: always
-    ports:
-      - "6379:6379"
+    # Ports not exposed to host; backend and worker connect via Docker network
    volumes:
      - redis_data:/data

@@ -12,8 +11,7 @@ services:
    image: mongo:7
    container_name: aoc-mongo
    restart: always
-    ports:
-      - "27017:27017"
+    # Ports not exposed to host; backend and worker connect via Docker network
    environment:
      MONGO_INITDB_ROOT_USERNAME: ${MONGO_ROOT_USERNAME}
      MONGO_INITDB_ROOT_PASSWORD: ${MONGO_ROOT_PASSWORD}
Author	SHA1	Message	Date
Tomas Kracmar	35eca65234	v1.7.13: switch Alpine.js to CSP build, remove unsafe-eval from CSP All checks were successful Release / build-and-push (push) Successful in 40s Details CI / lint-and-test (push) Successful in 33s Details	2026-04-27 16:08:34 +02:00
Tomas Kracmar	07a841615b	v1.7.12: security hardening — CORS fix, security headers, fail-closed rate limiter, OpenAPI docs disabled by default, config auth privacy, webhook validation All checks were successful Release / build-and-push (push) Successful in 44s Details CI / lint-and-test (push) Successful in 22s Details	2026-04-27 14:19:28 +02:00
Tomas Kracmar	c086fa4260	hotfix(v1.7.11): add unsafe-eval to CSP for Alpine.js All checks were successful CI / lint-and-test (push) Successful in 1m26s Details Release / build-and-push (push) Successful in 3m1s Details	2026-04-27 10:39:33 +02:00
Tomas Kracmar	be700fefc3	hotfix(v1.7.10): add font-src to CSP for data URI fonts All checks were successful CI / lint-and-test (push) Successful in 1m29s Details Release / build-and-push (push) Successful in 2m53s Details	2026-04-27 10:32:35 +02:00
Tomas Kracmar	e2cea50d87	hotfix(v1.7.9): auth diagnostics and rate-limit exemptions All checks were successful CI / lint-and-test (push) Successful in 2m30s Details Release / build-and-push (push) Successful in 4m46s Details - Exempt /api/config/auth, /api/config/features, /health, /metrics from rate limiting - Fix generic exception handler to return proper JSON for HTTPException instead of re-raising - Add startup log with auth_enabled and version - Add frontend console logging for auth config fetch errors - Show 'Auth: OFF' or 'Auth: misconfigured' on auth button instead of empty text - Add backend debug logging to /api/config/auth endpoint	2026-04-27 10:09:44 +02:00
Tomas Kracmar	7fe53f882a	hotfix(v1.7.8): restore CORS wildcard and fix CSP for MSAL auth All checks were successful CI / lint-and-test (push) Successful in 51s Details Release / build-and-push (push) Successful in 2m4s Details - Revert automatic CORS wildcard stripping that broke production deployments with CORS_ORIGINS=* (now logs a warning but preserves the config) - Expand CSP headers to allow MSAL auth flows: - connect-src: login.microsoftonline.com - frame-src: login.microsoftonline.com - form-action: login.microsoftonline.com	2026-04-27 09:41:28 +02:00
Tomas Kracmar	d01e7801ed	security: v1.7.7 hardening release All checks were successful CI / lint-and-test (push) Successful in 51s Details Release / build-and-push (push) Successful in 1m57s Details - Add WEBHOOK_CLIENT_SECRET validation for Graph webhooks - Add Redis-backed rate limiting (fetch/ask/write/default tiers) - Validate LLM_BASE_URL to prevent SSRF (HTTPS only, block private IPs) - Enforce non-wildcard CORS when AUTH_ENABLED=true - Add Content-Security-Policy headers - Fix audit middleware to use verified JWT claims via contextvars - Cap bulk_tags updates to 10,000 documents - Return generic error messages to clients (no internal detail leakage) - Strict AlertCondition Pydantic model for alert rules - Security warning on MCP stdio server startup - Remove MongoDB/Redis host ports from docker-compose - Remove mongo_query from /ask API response	2026-04-27 09:16:57 +02:00
Tomas Kracmar	7cd7709b4a	fix: dedupe alert_rules before creating unique index in setup_indexes() All checks were successful CI / lint-and-test (push) Successful in 1m7s Details Release / build-and-push (push) Successful in 2m25s Details The unique index on alert_rules.name was being created before duplicates were cleaned up, causing DuplicateKeyError on startup when existing duplicates were present. Move deduplication into setup_indexes() so it runs before the unique index is created. v1.7.6	2026-04-22 15:20:19 +02:00
Tomas Kracmar	9cd50d1257	chore: bump version to 1.7.5 All checks were successful CI / lint-and-test (push) Successful in 30s Details Release / build-and-push (push) Successful in 1m29s Details	2026-04-22 15:13:55 +02:00
Tomas Kracmar	646d61f72e	fix: dedupe existing rules + unique index to prevent duplicates - Add unique index on alert_rules.name in setup_indexes() - seed_default_rules() now removes duplicates by name before upserting - Keeps the oldest document (_id ascending) when deduping	2026-04-22 15:13:41 +02:00
Tomas Kracmar	5f7a98f21c	chore: bump version to 1.7.4 All checks were successful CI / lint-and-test (push) Successful in 28s Details Release / build-and-push (push) Successful in 1m30s Details	2026-04-22 14:57:06 +02:00
Tomas Kracmar	19ed231a31	fix: prevent duplicate default rules on multi-worker startup - Replace insert_many with replace_one(..., upsert=True) keyed by rule name - Safe for concurrent startup with multiple gunicorn workers	2026-04-22 14:56:53 +02:00
Tomas Kracmar	f812fda150	chore: bump version to 1.7.3 All checks were successful CI / lint-and-test (push) Successful in 44s Details Release / build-and-push (push) Successful in 1m40s Details	2026-04-22 14:48:17 +02:00
Tomas Kracmar	a194c78c59	feat: all panels are now collapsible - Source Health, Alerts, Alert Rules, Filters, Ask, Events panels all collapsible - Click panel header to expand/collapse - Chevron indicator rotates to show state - Collapsed state persisted to localStorage (aoc_panels key)	2026-04-22 14:48:03 +02:00
@@ -1 +1 @@
 .7.2
 .7.13