feat: implement Phase 1 hardening

- Verify JWT signatures via JWKS in auth.py
- Fix broken frontend auth button references
- Add Pydantic Settings for env validation (RETENTION_DAYS, CORS_ORIGINS)
- Create MongoDB indexes + TTL on startup
- Add /health endpoint and CORS middleware
- Escape regex input in event queries
- Fix dedupe() return calculation in maintenance.py
- Replace basic logging with structured structlog JSON logs
- Update README and add ROADMAP.md
This commit is contained in:
2026-04-14 11:48:29 +02:00
parent f9f1399f57
commit 4f6e16d64d
12 changed files with 392 additions and 46 deletions

132
AGENTS.md Normal file
View File

@@ -0,0 +1,132 @@
# Admin Operations Center (AOC)
## Project Overview
AOC is a FastAPI microservice that ingests Microsoft Entra (Azure AD) audit logs, Intune audit logs, and Exchange/SharePoint/Teams admin audits (via the Office 365 Management Activity API) into MongoDB. It deduplicates events, enriches them with readable names from Microsoft Graph, and exposes a REST API plus a minimal web UI for searching, filtering, and reviewing events.
## Technology Stack
- **Runtime**: Python 3.11
- **Web Framework**: FastAPI + Uvicorn
- **Database**: MongoDB (PyMongo)
- **Frontend**: Vanilla HTML/CSS/JS (served as static files from `backend/frontend/`)
- **Authentication**: Optional OIDC Bearer token validation against Microsoft Entra (using `python-jose` and MSAL.js on the frontend)
- **External APIs**: Microsoft Graph API, Office 365 Management Activity API
- **Deployment**: Docker Compose
## Project Structure
```
backend/
main.py # FastAPI app, router registration, background periodic fetch
config.py # Environment-based configuration (loads .env)
database.py # MongoClient setup (db = micro_soc, collection = events)
auth.py # OIDC Bearer token validation, JWKS caching, role/group checks
requirements.txt # Python dependencies
Dockerfile # python:3.11-slim image
routes/
fetch.py # GET /api/fetch-audit-logs, run_fetch()
events.py # GET /api/events, GET /api/filter-options
config.py # GET /api/config/auth
graph/
auth.py # Client credentials token acquisition for Graph
audit_logs.py # Fetch and enrich directory audit logs from Graph
resolve.py # Resolve directory object IDs to human-readable names
sources/
unified_audit.py # Office 365 Management Activity API (Exchange/SharePoint/Teams)
intune_audit.py # Intune audit events from Graph
models/
event_model.py # normalize_event() — transforms raw events to stored schema
mapping_loader.py # Loads mappings.yml (cached) with fallback defaults
mappings.yml # User-editable category labels and summary templates
maintenance.py # CLI for re-normalization and deduplication of stored events
frontend/
index.html # Single-page UI with filters, pagination, raw-event modal
style.css # Dark-themed stylesheet
```
## Configuration
Copy `.env.example` to `.env` at the repo root and fill in values:
```bash
cp .env.example .env
```
Key variables:
- `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` — Microsoft app registration credentials (application permissions)
- `AUTH_ENABLED` — set `true` to protect API/UI with OIDC Bearer tokens
- `AUTH_TENANT_ID`, `AUTH_CLIENT_ID` — token validation audience/issuer
- `AUTH_ALLOWED_ROLES`, `AUTH_ALLOWED_GROUPS` — comma-separated access control lists
- `ENABLE_PERIODIC_FETCH`, `FETCH_INTERVAL_MINUTES` — background ingestion scheduler
- `MONGO_ROOT_USERNAME`, `MONGO_ROOT_PASSWORD`, `MONGO_PORT` — used by Docker Compose for MongoDB
## Build and Run Commands
**Docker Compose (recommended):**
```bash
docker compose up --build
```
- API/UI: http://localhost:8000
- MongoDB: localhost:27017
**Local development (without Docker):**
```bash
# 1) Start MongoDB
docker run --rm -p 27017:27017 -e MONGO_INITDB_ROOT_USERNAME=root -e MONGO_INITDB_ROOT_PASSWORD=example mongo:7
# 2) Run backend
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export $(cat ../.env | xargs)
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```
## API Endpoints
- `GET /api/fetch-audit-logs?hours=168` — pulls last N hours (capped at 720 / 30 days) from all sources, normalizes, dedupes, and upserts into MongoDB
- `GET /api/events` — list stored events with filters (`service`, `actor`, `operation`, `result`, `start`, `end`, `search`) and pagination (`page`, `page_size`)
- `GET /api/filter-options` — best-effort distinct values for UI dropdowns
- `GET /api/config/auth` — auth configuration exposed to the frontend
## Code Conventions
- Python modules use absolute imports within the `backend/` package (e.g., `from graph.auth import get_access_token`). When running locally, ensure the working directory is `backend/` so these resolve correctly.
- No formal formatter or linter is configured. Keep changes consistent with the existing style: simple functions, explicit exception handling, and informative docstrings.
- The frontend is a single HTML file with inline JavaScript. It relies on the MSAL.js CDN (`https://alcdn.msauth.net/browser/2.37.0/js/msal-browser.min.js`).
## Testing
There are currently **no automated tests** in this repository. When adding new features or bug fixes, verify behavior manually:
1. Start the server (Docker Compose or local uvicorn).
2. Run a smoke test:
```bash
curl http://localhost:8000/api/events
curl http://localhost:8000/api/fetch-audit-logs
```
3. Open http://localhost:8000 in a browser, apply filters, paginate, and click "View raw event".
## Security Considerations
- **Secrets**: `CLIENT_SECRET` and other credentials come from `.env`. Never commit `.env`.
- **Auth validation**: When `AUTH_ENABLED=true`, the backend fetches JWKS from `https://login.microsoftonline.com/{AUTH_TENANT_ID}/v2.0/.well-known/openid-configuration`, caches keys for 1 hour, and validates tenant/issuer claims. Tokens are decoded without strict signature verification (`jwt.get_unverified_claims`), so the tenant and issuer checks are the primary gate.
- **Role/Group gating**: Access is allowed if the tokens `roles` intersect `AUTH_ALLOWED_ROLES` or `groups` intersect `AUTH_ALLOWED_GROUPS`. If neither list is configured, all authenticated users are allowed.
- **Pagination limits**: `page_size` is clamped to a maximum of 500 to prevent large queries.
- **Fetch window cap**: `hours` is clamped to 720 (30 days) to avoid runaway API calls.
## Maintenance and Operations
The `backend/maintenance.py` script provides two CLI commands useful for backfilling or correcting stored data:
```bash
# Re-run Graph enrichment + normalization on stored events
docker compose run --rm backend python maintenance.py renormalize --limit 500
# Remove duplicate events based on dedupe_key
docker compose run --rm backend python maintenance.py dedupe
```
Both commands operate directly against the MongoDB collection configured in `config.py`.