aoc/AGENTS.md

# Admin Operations Center (AOC)

## Project Overview

AOC is a FastAPI microservice that ingests Microsoft Entra (Azure AD) audit logs, Intune audit logs, and Exchange/SharePoint/Teams admin audits (via the Office 365 Management Activity API) into MongoDB. It deduplicates events, enriches them with readable names from Microsoft Graph, and exposes a REST API plus a minimal web UI for searching, filtering, and reviewing events.

## Technology Stack

- **Runtime**: Python 3.11 (3.14 for tests)
- **Web Framework**: FastAPI + Uvicorn (Gunicorn in production)
- **Database**: MongoDB (PyMongo)
- **Cache/Queue**: Valkey/Redis 8 (caching + arq async job queue)
- **Frontend**: Alpine.js + HTML/CSS (served as static files from `backend/frontend/`)
- **Authentication**: Optional OIDC Bearer token validation against Microsoft Entra (using `python-jose` and MSAL.js on the frontend)
- **External APIs**: Microsoft Graph API, Office 365 Management Activity API, Azure OpenAI / MS Foundry
- **Deployment**: Docker Compose (dev), Docker Compose + nginx (prod)
- **CI/CD**: Gitea Actions (lint + test + Docker build + release)
- **Secrets Storage**: Environment variables (`.env`) or optional Azure Key Vault

## Project Structure

```
backend/
  main.py              # FastAPI app, router registration, background periodic fetch
  config.py            # Pydantic Settings configuration (loads .env + optional Key Vault)
  database.py          # MongoClient setup (db = micro_soc, collection = events)
  auth.py              # OIDC Bearer token validation, JWKS caching, role/group checks
  secrets_manager.py   # Optional Azure Key Vault integration for secrets
  rate_limiter.py      # Redis-backed fixed-window rate limiter (fail-closed)
  requirements.txt     # Python dependencies
  Dockerfile           # python:3.11-slim image, non-root user, version baked at build
  mcp_server.py        # Standalone MCP server for Claude Desktop / Cursor integration
  routes/
    fetch.py           # GET /api/fetch-audit-logs, run_fetch()
    events.py          # GET /api/events, GET /api/filter-options, PATCH tags, POST comments
    config.py          # GET /api/config/auth, GET /api/config/features
    ask.py             # POST /api/ask — natural language query with LLM
    health.py          # GET /health, GET /metrics
    rules.py           # Rule-based alerting endpoints
    webhooks.py        # Microsoft Graph change notification webhooks
    alerts.py          # Alert management endpoints
    saved_searches.py  # Saved filter presets
    jobs.py            # Async job status polling
  graph/
    auth.py            # Client credentials token acquisition for Graph
    audit_logs.py      # Fetch and enrich directory audit logs from Graph
    resolve.py         # Resolve directory object IDs to human-readable names
  sources/
    unified_audit.py   # Office 365 Management Activity API (Exchange/SharePoint/Teams)
    intune_audit.py    # Intune audit events from Graph
  models/
    event_model.py     # normalize_event() — transforms raw events to stored schema
  mapping_loader.py    # Loads mappings.yml (cached) with fallback defaults
  mappings.yml         # User-editable category labels and summary templates
  maintenance.py       # CLI for re-normalization and deduplication of stored events
  frontend/
    index.html         # Single-page UI with filters, pagination, ask panel, raw-event modal
    style.css          # Dark-themed stylesheet
```

## Configuration

Copy `.env.example` to `.env` at the repo root and fill in values:

```bash
cp .env.example .env
```

### Core variables
- `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` — Microsoft app registration credentials (application permissions)
- `AUTH_ENABLED` — set `true` to protect API/UI with OIDC Bearer tokens
- `AUTH_TENANT_ID`, `AUTH_CLIENT_ID` — token validation audience/issuer
- `AUTH_ALLOWED_ROLES`, `AUTH_ALLOWED_GROUPS` — comma-separated access control lists
- `ENABLE_PERIODIC_FETCH`, `FETCH_INTERVAL_MINUTES` — background ingestion scheduler
- `MONGO_ROOT_USERNAME`, `MONGO_ROOT_PASSWORD`, `MONGO_PORT` — used by Docker Compose for MongoDB

### AI / LLM variables
- `AI_FEATURES_ENABLED` — set `false` to completely disable AI endpoints and UI (default `true`)
- `LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`, `LLM_MAX_EVENTS`, `LLM_TIMEOUT_SECONDS` — LLM provider settings
- `LLM_API_VERSION` — required for Azure OpenAI / MS Foundry endpoints
- `LLM_ALLOWED_DOMAINS` — comma-separated domain allowlist for LLM endpoints (e.g. `api.openai.com,*.openai.azure.com`)

### Security variables
- `CORS_ORIGINS` — comma-separated allowed origins (default `*`; set explicit origins in production)
- `DOCS_ENABLED` — set `true` to expose `/docs`, `/redoc`, `/openapi.json` (default `false`)
- `METRICS_ALLOWED_IPS` — comma-separated CIDRs allowed to access `/metrics` (default: private networks + loopback)
- `WEBHOOK_CLIENT_SECRET` — secret for validating Graph webhook `clientState`
- `SIEM_ENABLED`, `SIEM_WEBHOOK_URL` — optional SIEM forwarding
- `SIEM_ALLOWED_DOMAINS` — comma-separated domain allowlist for SIEM webhook URLs
- `RATE_LIMIT_ENABLED`, `RATE_LIMIT_REQUESTS`, `RATE_LIMIT_WINDOW_SECONDS` — Redis-backed rate limiting

### Optional Azure Key Vault
- `AZURE_KEY_VAULT_NAME` — name of the Azure Key Vault to load secrets from
- When set, AOC fetches these secrets at startup:
  - `aoc-client-secret` → `CLIENT_SECRET`
  - `aoc-llm-api-key` → `LLM_API_KEY`
  - `aoc-mongo-uri` → `MONGO_URI`
  - `aoc-webhook-client-secret` → `WEBHOOK_CLIENT_SECRET`
- Requires `azure-identity` and `azure-keyvault-secrets` (uncomment in `requirements.txt`)

### Privacy / access control
- `PRIVACY_SERVICES` — comma-separated services to hide from non-privileged users (e.g. `Exchange,Teams`)
- `PRIVACY_SENSITIVE_OPERATIONS` — comma-separated operations to gate
- `PRIVACY_SERVICE_ROLES` — comma-separated Entra roles that grant access to privacy data

## Build and Run Commands

**Docker Compose (recommended):**
```bash
docker compose up --build
```
- API/UI: http://localhost:8000
- MongoDB: localhost:27017

**Local development (without Docker):**
```bash
# 1) Start MongoDB
docker run --rm -p 27017:27017 -e MONGO_INITDB_ROOT_USERNAME=root -e MONGO_INITDB_ROOT_PASSWORD=example mongo:7

# 2) Run backend
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export $(cat ../.env | xargs)
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```

## API Endpoints

- `GET /api/fetch-audit-logs?hours=168` — pulls last N hours (capped at 720 / 30 days) from all sources, normalizes, dedupes, and upserts into MongoDB
- `GET /api/events` — list stored events with filters (`service`, `actor`, `operation`, `result`, `start`, `end`, `search`) and cursor-based pagination
- `GET /api/filter-options` — best-effort distinct values for UI dropdowns
- `GET /api/config/auth` — auth configuration exposed to the frontend
- `GET /api/config/features` — feature flags (`ai_features_enabled`)
- `POST /api/ask` — natural language query; returns LLM narrative + referenced events (only when `AI_FEATURES_ENABLED=true`)
- `GET /health` — liveness probe with DB connectivity
- `GET /metrics` — Prometheus metrics (IP-restricted by default)
- `GET /api/source-health` — last fetch status per ingestion source
- `GET /api/version` — running version

## MCP Server

A standalone MCP server (`backend/mcp_server.py`) exposes audit log tools for Claude Desktop, Cursor, and other MCP clients.

Available tools:
- `search_events` — Search by entity, service, operation, result, time range
- `get_event` — Retrieve a single event by ID (raw JSON)
- `get_summary` — Aggregated counts by service, operation, result, actor
- `ask` — Natural language question (returns recent events + guidance)

**Claude Desktop config** (`~/.config/claude/claude_desktop_config.json`):
```json
{
  "mcpServers": {
    "aoc": {
      "command": "python",
      "args": ["/path/to/aoc/backend/mcp_server.py"],
      "env": {"MONGO_URI": "mongodb://root:example@localhost:27017/"}
    }
  }
}
```

The MCP server imports `database.py` directly and does not go through the FastAPI layer, so it shares the same MongoDB connection but bypasses auth.

## AI Feature Flag

Set `AI_FEATURES_ENABLED=false` in `.env` to:
- Prevent the `ask` router from being registered in FastAPI
- Hide the "Ask a question" panel in the frontend
- Return `ai_features_enabled: false` from `/api/config/features`

This is intended for the open-core monetization split: core features (ingestion, filtering, search, export) are always available; premium AI features (NLQ, MCP) can be disabled.

## Code Conventions

- Python modules use absolute imports within the `backend/` package (e.g., `from graph.auth import get_access_token`). When running locally, ensure the working directory is `backend/` so these resolve correctly.
- The project uses `ruff` for linting and formatting. Run `ruff check . && ruff format .` before committing.
- Keep changes consistent with the existing style: simple functions, explicit exception handling, and informative docstrings.
- The frontend is a single HTML file with inline JavaScript and Alpine.js.

## Testing

Tests run with pytest and mongomock (no real MongoDB required):

```bash
cd backend
python -m venv .venv_test
source .venv_test/bin/activate
pip install -r requirements.txt
pytest tests/ -q
```

When adding new features or bug fixes, add or update tests in `backend/tests/`. The test suite covers:
- Event normalization and deduplication
- Auth middleware and token validation
- API endpoints (`/api/events`, `/api/fetch-audit-logs`, `/api/ask`)
- NLQ time range extraction, entity extraction, query building
- Rate limiting behavior

## Security Considerations

- **Secrets**: `CLIENT_SECRET`, `LLM_API_KEY`, and other credentials come from `.env` or Azure Key Vault. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer/audience claims. Tokens are decoded with RS256 signature verification.
- **Role/Group gating**: Access is allowed if the token's `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed — a startup warning is logged in this case.
- **CORS**: When `AUTH_ENABLED=true` and `CORS_ORIGINS="*"`, `allow_credentials` is forced to `false` to prevent cross-origin token leakage.
- **Rate limiting**: Redis-backed fixed-window rate limiting with per-category limits (fetch=10/hr, ask=30/min, write=20/min, default=120/min). Fails closed (returns 429) when Redis is unavailable.
- **Pagination limits**: `page_size` is clamped to a maximum of 500 to prevent large queries.
- **Fetch window cap**: `hours` is clamped to 720 (30 days) to avoid runaway API calls.
- **LLM SSRF guard**: `LLM_BASE_URL` must be HTTPS and cannot point to private IPs. Optional `LLM_ALLOWED_DOMAINS` restricts to specific domains.
- **SIEM SSRF guard**: `SIEM_WEBHOOK_URL` has the same validation as LLM URLs, plus optional `SIEM_ALLOWED_DOMAINS`.
- **Metrics IP gating**: `/metrics` is restricted to private/loopback IPs by default via `METRICS_ALLOWED_IPS`.
- **OpenAPI docs**: Disabled by default (`DOCS_ENABLED=false`). Enable only in development.
- **CSP**: Content-Security-Policy headers are set on all responses. `unsafe-eval` is required for Alpine.js v3 expression evaluation.
- **SRI**: CDN scripts (Alpine.js, MSAL.js) include Subresource Integrity hashes to prevent supply chain compromise.
- **MCP server**: The MCP server bypasses auth entirely. Only run it in trusted environments or behind a VPN.

### Security Documentation

- `PEN_TEST_REPORT_v1.7.11.md` — Internal soft penetration test findings and remediation
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra/token abuse vectors

## Maintenance and Operations

The `backend/maintenance.py` script provides two CLI commands useful for backfilling or correcting stored data:

```bash
# Re-run Graph enrichment + normalization on stored events
docker compose run --rm backend python maintenance.py renormalize --limit 500

# Remove duplicate events based on dedupe_key
docker compose run --rm backend python maintenance.py dedupe
```

Both commands operate directly against the MongoDB collection configured in `config.py`.