233 lines
12 KiB
Markdown
233 lines
12 KiB
Markdown
# Admin Operations Center (AOC)
|
||
|
||
FastAPI microservice that ingests Microsoft Entra (Azure AD) and other admin audit logs into MongoDB, dedupes them, and exposes a UI/API to fetch, search, and review events.
|
||
|
||
## Components
|
||
- FastAPI app under `backend/` with routes to fetch audit logs and list stored events.
|
||
- MongoDB for persistence (provisioned via Docker Compose).
|
||
- Microsoft Graph client (client credentials) for retrieving directory audit events and Intune audit events.
|
||
- Office 365 Management Activity API client for Exchange/SharePoint/Teams admin audit logs.
|
||
- Frontend served from the backend for filtering/searching events and viewing raw entries.
|
||
- Optional OIDC bearer auth (Entra) to protect the API/UI and gate access by roles/groups.
|
||
- Natural language query (`/api/ask`) powered by LLM (OpenAI, Azure OpenAI, or any compatible API).
|
||
- MCP server for Claude Desktop / Cursor integration.
|
||
- Optional Azure Key Vault integration for secrets storage.
|
||
|
||
## Prerequisites (macOS)
|
||
- Python 3.11
|
||
- Docker Desktop (for the quickest start) or a local MongoDB instance
|
||
- An Entra app registration with **Application** permission `AuditLog.Read.All` and admin consent granted
|
||
- Also required to fetch other sources:
|
||
- `https://manage.office.com/.default` (Audit API) with `ActivityFeed.Read`/`ActivityFeed.ReadDlp` (built into the app registration's API permissions for Office 365 Management APIs)
|
||
- Intune audit: `DeviceManagementConfiguration.Read.All` (or broader) for `/deviceManagement/auditEvents`
|
||
- Optional API protection: configure `AUTH_ENABLED=true` and set `AUTH_TENANT_ID`/`AUTH_CLIENT_ID` (the audience) plus allowed roles/groups.
|
||
|
||
## Configuration
|
||
Create a `.env` file at the repo root (copy `.env.example`) and fill in your Microsoft Graph app credentials. The provided `MONGO_URI` works with the bundled MongoDB container; change it if you use a different Mongo instance.
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
# edit .env to add TENANT_ID, CLIENT_ID, CLIENT_SECRET (and MONGO_URI if needed)
|
||
# optional: enable auth & periodic fetch
|
||
# AUTH_ENABLED=true
|
||
# AUTH_TENANT_ID=...
|
||
# AUTH_CLIENT_ID=...
|
||
# AUTH_ALLOWED_ROLES=Admins,SecurityOps
|
||
# ENABLE_PERIODIC_FETCH=true
|
||
# FETCH_INTERVAL_MINUTES=60
|
||
|
||
# Optional: data retention (auto-expire old events via MongoDB TTL)
|
||
# RETENTION_DAYS=90
|
||
|
||
# Optional: CORS origins if the frontend is served separately
|
||
# CORS_ORIGINS=http://localhost:3000,https://app.example.com
|
||
|
||
# Optional: enable AI/natural-language features (/api/ask, MCP server)
|
||
# AI_FEATURES_ENABLED=true
|
||
|
||
# Optional: LLM configuration for natural language querying
|
||
# LLM_API_KEY=...
|
||
# LLM_BASE_URL=https://api.openai.com/v1
|
||
# LLM_MODEL=gpt-4o-mini
|
||
# LLM_TIMEOUT_SECONDS=30
|
||
# LLM_ALLOWED_DOMAINS=api.openai.com,*.openai.azure.com
|
||
|
||
# Optional: SIEM forwarding
|
||
# SIEM_ENABLED=true
|
||
# SIEM_WEBHOOK_URL=https://your-siem.com/webhook
|
||
# SIEM_ALLOWED_DOMAINS=your-siem.com
|
||
|
||
# Optional: Azure Key Vault for secrets storage
|
||
# AZURE_KEY_VAULT_NAME=your-keyvault-name
|
||
```
|
||
|
||
### Using Azure Key Vault for secrets
|
||
Instead of storing `CLIENT_SECRET`, `LLM_API_KEY`, `MONGO_URI`, and `WEBHOOK_CLIENT_SECRET` in `.env`, you can store them in Azure Key Vault:
|
||
|
||
1. Create a Key Vault and add secrets with these names:
|
||
- `aoc-client-secret` → your Graph app `CLIENT_SECRET`
|
||
- `aoc-llm-api-key` → your `LLM_API_KEY`
|
||
- `aoc-mongo-uri` → your `MONGO_URI`
|
||
- `aoc-webhook-client-secret` → your `WEBHOOK_CLIENT_SECRET`
|
||
2. Uncomment `azure-identity` and `azure-keyvault-secrets` in `backend/requirements.txt`
|
||
3. Set `AZURE_KEY_VAULT_NAME=your-keyvault-name` in `.env`
|
||
4. Ensure the container has Azure identity credentials (managed identity, service principal, or Azure CLI auth)
|
||
|
||
## Security Hardening Checklist
|
||
|
||
Before deploying to production:
|
||
|
||
- [ ] Set `AUTH_ENABLED=true` and configure `AUTH_ALLOWED_ROLES` or `AUTH_ALLOWED_GROUPS` to restrict access
|
||
- [ ] Set explicit `CORS_ORIGINS` (do not use `*` in production with auth enabled)
|
||
- [ ] Set `DOCS_ENABLED=false` (default) to hide OpenAPI docs
|
||
- [ ] Configure `WEBHOOK_CLIENT_SECRET` to validate Graph webhook notifications
|
||
- [ ] Set `LLM_ALLOWED_DOMAINS` if using AI features to prevent data exfiltration
|
||
- [ ] Set `SIEM_ALLOWED_DOMAINS` if using SIEM forwarding
|
||
- [ ] Review `METRICS_ALLOWED_IPS` — defaults to private networks only
|
||
- [ ] Consider Azure Key Vault instead of `.env` for secrets
|
||
- [ ] Review the threat model: `THREAT_MODEL_v1.7.13.md`
|
||
|
||
## Run with Docker Compose (recommended)
|
||
```bash
|
||
docker compose up --build
|
||
```
|
||
- API: http://localhost:8000
|
||
- Frontend: http://localhost:8000
|
||
- Health: http://localhost:8000/health
|
||
- Mongo: localhost:27017 (root/example)
|
||
|
||
## Run locally without Docker
|
||
1) Start MongoDB (e.g. with Docker):
|
||
`docker run --rm -p 27017:27017 -e MONGO_INITDB_ROOT_USERNAME=root -e MONGO_INITDB_ROOT_PASSWORD=example mongo:7`
|
||
|
||
2) Prepare the backend environment:
|
||
```bash
|
||
cd backend
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate
|
||
pip install -r requirements.txt
|
||
export $(cat ../.env | xargs) # or set env vars manually
|
||
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||
```
|
||
|
||
## API
|
||
- `GET /health` — health check with MongoDB connectivity status.
|
||
- `GET /metrics` — Prometheus metrics for request latency, fetch volume, and errors (IP-restricted).
|
||
- `GET /api/version` — running version (baked into the Docker image at build time).
|
||
- `GET /api/fetch-audit-logs` — pulls the last 7 days by default (override with `?hours=N`, capped to 30 days) of:
|
||
- Entra directory audit logs (`/auditLogs/directoryAudits`)
|
||
- Exchange/SharePoint/Teams admin audits (via Office 365 Management Activity API)
|
||
- Intune audit logs (`/deviceManagement/auditEvents`)
|
||
Dedupes on a stable key (source id or timestamp/category/operation/target). Returns count and per-source warnings.
|
||
- **Incremental fetch**: each source remembers its last successful fetch time in MongoDB (`watermarks` collection). Subsequent calls fetch only new events since the watermark.
|
||
- **Alerting**: if `ALERTS_ENABLED=true`, events are evaluated against stored rules during ingestion.
|
||
- **SIEM export**: if `SIEM_ENABLED=true`, each ingested event is forwarded to `SIEM_WEBHOOK_URL`.
|
||
- `GET /api/events` — list stored events with filters:
|
||
- `service`, `actor`, `operation`, `result`, `start`, `end`, `search` (free text over raw/summary/actor/targets)
|
||
- Pagination: `cursor`-based (`page_size` defaults to 50, max 500). Pass `cursor` from `next_cursor` to paginate forward.
|
||
- `GET /api/filter-options` — best-effort distinct values for services, operations, results, actors (used by UI dropdowns).
|
||
- `POST /api/webhooks/graph` — receive Microsoft Graph change notifications. Echoes `validationToken` when present.
|
||
- `GET /api/source-health` — last fetch status for each ingestion source (`directory`, `unified`, `intune`).
|
||
- `PATCH /api/events/{id}/tags` — update tags on an event (e.g., `investigating`, `false_positive`).
|
||
- `POST /api/events/{id}/comments` — add a comment to an event.
|
||
- `POST /api/events/{id}/explain` — AI explanation of a single audit event with security context (requires `LLM_API_KEY`).
|
||
- `POST /api/ask` — natural language query. Returns a narrative answer + referenced events. Supports time ranges, entity names, and respects active UI filters. Only available when `AI_FEATURES_ENABLED=true`.
|
||
- `GET /api/config/features` — feature flags (`ai_features_enabled`).
|
||
- `GET /api/rules` — list alert rules.
|
||
- `POST /api/rules` — create an alert rule.
|
||
- `PUT /api/rules/{id}` — update an alert rule.
|
||
- `DELETE /api/rules/{id}` — delete an alert rule.
|
||
|
||
### MCP Server
|
||
AOC exposes an MCP interface in two forms:
|
||
|
||
**1. HTTP/SSE (production)** — mounted at `/mcp` inside the FastAPI app, behind OIDC auth:
|
||
- `GET /mcp/sse` — establish SSE stream (requires Bearer token if `AUTH_ENABLED=true`)
|
||
- `POST /mcp/messages/?session_id=...` — send tool calls
|
||
|
||
This is the recommended way to use MCP against a remote deployment like `aoc.cqre.net`. Any MCP client that supports SSE transport (e.g. Cursor, Claude Desktop with an SSE bridge, or custom scripts) can connect using the same Entra token as the web UI.
|
||
|
||
**2. stdio (local development)** — `python backend/mcp_server.py`:
|
||
- Runs as a local subprocess for Claude Desktop
|
||
- Connects directly to MongoDB (bypasses FastAPI auth)
|
||
- Useful for local development when you have the repo cloned and MongoDB running locally
|
||
|
||
Available tools (both transports):
|
||
- `search_events` — filter by entity, service, operation, result, time range.
|
||
- `get_event` — retrieve raw event JSON by ID.
|
||
- `get_summary` — aggregated summary (service, operation, result, actor counts) for the last N days.
|
||
- `ask` — natural language query returning recent events.
|
||
|
||
Stored document shape (collection `micro_soc.events`):
|
||
```json
|
||
{
|
||
"id": "...", // original source id
|
||
"timestamp": "...", // activityDateTime
|
||
"service": "...", // category
|
||
"operation": "...", // activityDisplayName
|
||
"result": "...",
|
||
"actor_display": "...", // resolved user/app name
|
||
"target_displays": [ ... ],
|
||
"display_summary": "...",
|
||
"dedupe_key": "...", // used for upserts
|
||
"actor": { ... }, // initiatedBy
|
||
"targets": [ ... ], // targetResources
|
||
"raw": { ... }, // full source event
|
||
"raw_text": "..." // raw as string for text search
|
||
}
|
||
```
|
||
|
||
## Development
|
||
|
||
### Linting and formatting
|
||
We use `ruff` for linting and formatting.
|
||
|
||
```bash
|
||
cd backend
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate
|
||
pip install -r requirements.txt -r requirements-dev.txt
|
||
ruff check ..
|
||
ruff format ..
|
||
```
|
||
|
||
### Running tests
|
||
```bash
|
||
cd backend
|
||
pytest -q
|
||
```
|
||
|
||
## Quick smoke tests
|
||
With the server running:
|
||
```bash
|
||
curl http://localhost:8000/health
|
||
curl http://localhost:8000/api/events
|
||
curl http://localhost:8000/api/fetch-audit-logs
|
||
```
|
||
- Visit the UI at http://localhost:8000 to filter by user/service/action/result/time, search raw text, paginate, and view raw events.
|
||
|
||
## Maintenance (Dockerized)
|
||
Use the backend image so you don't need a local venv:
|
||
```bash
|
||
# ensure Mongo + backend network are up
|
||
docker compose up -d mongo
|
||
# re-run enrichment/normalization on stored events (uses .env for Graph/Mongo)
|
||
docker compose run --rm backend python maintenance.py renormalize --limit 500
|
||
# deduplicate existing events (optional)
|
||
docker compose run --rm backend python maintenance.py dedupe
|
||
```
|
||
Omit `--limit` to process all events. You can also run commands inside a running backend container with `docker compose exec backend ...`.
|
||
|
||
## Security Documentation
|
||
- `PEN_TEST_REPORT_v1.7.11.md` — Penetration test findings and remediation
|
||
- `THREAT_MODEL_v1.7.13.md` — Comprehensive threat model covering Entra application abuse, token handling, data exfiltration vectors
|
||
|
||
## Notes / Troubleshooting
|
||
- Ensure `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET` match an app registration with `AuditLog.Read.All` (application) permission and admin consent.
|
||
- Additional permissions: Office 365 Management Activity (`ActivityFeed.Read`), and Intune audit (`DeviceManagementConfiguration.Read.All`).
|
||
- Auth: if `AUTH_ENABLED=true`, issued tokens must be from `AUTH_TENANT_ID`, audience = `AUTH_CLIENT_ID`; access is granted if roles or groups overlap `AUTH_ALLOWED_ROLES`/`AUTH_ALLOWED_GROUPS` (if set). A startup warning is logged if auth is enabled but no roles/groups are configured.
|
||
- Backfill limits: Management Activity API typically exposes ~7 days of history via API (longer if your tenant has extended/Advanced Audit retention). Directory/Intune audit retention follows your tenant policy (commonly 30–90 days, longer with Advanced Audit).
|
||
- If you change Mongo credentials/ports, update `MONGO_URI` in `.env` (Docker Compose passes it through to the backend).
|
||
- The service uses the `micro_soc` database and `events` collection by default; adjust in `backend/config.py` if needed.
|
||
- If using Azure Key Vault, ensure the runtime identity (managed identity, service principal, or local Azure CLI) has `Get` permission on secrets.
|