Sync from dev @ 497baf0
Source: main (497baf0) Excluded: live tenant exports, generated artifacts, and dev-only tooling.
This commit is contained in:
245
infra/change-probe/README.md
Normal file
245
infra/change-probe/README.md
Normal file
@@ -0,0 +1,245 @@
|
||||
# ASTRAL Change Probe
|
||||
|
||||
Event-driven backup trigger for ASTRAL. Monitors Intune and Entra ID audit logs via Microsoft Graph, debounces change bursts, and queues the Azure DevOps backup pipeline only when actual drift is detected.
|
||||
|
||||
## Why this exists
|
||||
|
||||
Microsoft Graph change notifications and delta queries do **not** support Intune device management or Conditional Access resources. The only viable event-driven approach is polling the Graph audit log APIs, which have a 5–15 minute propagation delay. This probe implements a debouncer on top of that polling to avoid backup storms during bulk changes.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ 5 min ┌──────────────┐ quiet window ┌─────────────────┐
|
||||
│ Timer Trigger │ ─────────────► │ probe_timer │ ─────────────────► │ backup-trigger │
|
||||
│ (probe_timer) │ │ (debouncer) │ (15 min armed) │ -queue │
|
||||
└─────────────────┘ └──────────────┘ └────────┬────────┘
|
||||
│ │
|
||||
│ load/save state │ dequeue
|
||||
│ (Azure Table Storage) ▼
|
||||
│ ┌─────────────────┐
|
||||
│ │ queue_consumer │
|
||||
└──────────────────────────────────────────────────────────────►│ (ADO REST API) │
|
||||
└─────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Azure DevOps │
|
||||
│ backup pipeline│
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### `probe_timer` (Timer Trigger)
|
||||
|
||||
- **Schedule**: every 5 minutes (`0 */5 * * * *`)
|
||||
- **Input**: `TimerRequest` from Functions runtime
|
||||
- **Output**: queue message to `backup-trigger-queue` (via `func.Out[str]`)
|
||||
- **Actions**:
|
||||
1. Load debouncer state from Azure Table Storage (`ProbeState` / `singleton` / `default`).
|
||||
2. Run `scripts/probe_tenant_changes.py` via subprocess.
|
||||
3. Save updated state back to Table Storage.
|
||||
4. If `trigger=true`, emit a queue message.
|
||||
|
||||
### `queue_consumer` (Queue Trigger)
|
||||
|
||||
- **Input**: `QueueMessage` from `backup-trigger-queue`
|
||||
- **Actions**:
|
||||
1. Parse JSON payload (`reason`, `checked_at`).
|
||||
2. Call Azure DevOps REST API to queue the backup pipeline run.
|
||||
3. Raise on failure so the Functions runtime handles retry and poison-queue logic.
|
||||
|
||||
### `scripts/probe_tenant_changes.py`
|
||||
|
||||
Standalone CLI script that can also be run locally. It:
|
||||
|
||||
- Queries Intune (`deviceManagement/auditEvents`) and Entra (`directoryAudits`) audit logs.
|
||||
- Implements a three-state debouncer: `idle` → `armed` → `cooldown`.
|
||||
- Returns JSON with `trigger`, `reason`, and `new_state`.
|
||||
|
||||
### `scripts/trigger_backup_pipeline.py`
|
||||
|
||||
Standalone CLI script that queues an Azure DevOps pipeline run via REST API. Can be used locally or from the queue consumer.
|
||||
|
||||
## Debouncer State Machine
|
||||
|
||||
| State | Condition to transition | Output |
|
||||
|---|---|---|
|
||||
| **idle** | Audit log shows a new change | → `armed` |
|
||||
| **armed** | Quiet window elapsed (default 15 min) with no newer events | → `cooldown`, `trigger=true` |
|
||||
| **armed** | Newer event arrives while armed | Stay `armed`, extend quiet window |
|
||||
| **cooldown** | Cooldown elapsed (default 30 min) | → `idle` |
|
||||
| **cooldown** | New event arrives | Stay `cooldown` (change is buffered until cooldown ends) |
|
||||
|
||||
## Configuration
|
||||
|
||||
All settings are provided via Function App application settings (environment variables):
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `AzureWebJobsStorage` | Yes | — | Storage account connection string (tables + queues) |
|
||||
| `PROBE_APP_ID` | Yes* | — | Entra app registration client ID |
|
||||
| `PROBE_APP_SECRET` | Yes* | — | Entra app client secret |
|
||||
| `TENANT_ID` | Yes* | — | Microsoft 365 tenant ID |
|
||||
| `GRAPH_TOKEN` | No | — | Optional passthrough token ( skips client credentials flow ) |
|
||||
| `ADO_ORGANIZATION` | Yes | — | Azure DevOps organization name |
|
||||
| `ADO_PROJECT` | Yes | — | Azure DevOps project name |
|
||||
| `ADO_PIPELINE_ID` | Yes | — | Backup pipeline definition ID |
|
||||
| `ADO_TOKEN` | Yes | — | Azure DevOps PAT with **Build (read & execute)** |
|
||||
| `ADO_BRANCH` | No | `main` | Git ref to queue the pipeline against |
|
||||
| `PROBE_QUIET_WINDOW_MINUTES` | No | `15` | Minutes to wait for change burst to settle |
|
||||
| `PROBE_COOLDOWN_MINUTES` | No | `30` | Minutes between successive triggers |
|
||||
|
||||
\* Required unless `GRAPH_TOKEN` is provided.
|
||||
|
||||
## Local Development
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.11+
|
||||
- [Azure Functions Core Tools](https://learn.microsoft.com/en-us/azure/azure-functions/functions-run-local)
|
||||
- An Azure Storage account (or Azurite for local emulation)
|
||||
|
||||
### Install dependencies
|
||||
|
||||
```bash
|
||||
cd infra/change-probe
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Copy shared scripts
|
||||
|
||||
The probe reuses scripts from the repository root. Copy them into this directory before building or running locally:
|
||||
|
||||
```bash
|
||||
cp ../../scripts/common.py scripts/
|
||||
cp ../../scripts/probe_tenant_changes.py scripts/
|
||||
cp ../../scripts/trigger_backup_pipeline.py scripts/
|
||||
```
|
||||
|
||||
### Run locally
|
||||
|
||||
```bash
|
||||
# Start Azurite (Storage emulator)
|
||||
azurite --silent --location ./azurite --debug ./azurite/debug.log
|
||||
|
||||
# Copy local settings template
|
||||
cp local.settings.json.example local.settings.json
|
||||
# Edit local.settings.json with your values
|
||||
|
||||
# Start the Functions host
|
||||
func start
|
||||
```
|
||||
|
||||
### Run the probe script standalone
|
||||
|
||||
```bash
|
||||
cd ../..
|
||||
python3 scripts/probe_tenant_changes.py \
|
||||
--client-id "$PROBE_APP_ID" \
|
||||
--client-secret "$PROBE_APP_SECRET" \
|
||||
--tenant-id "$TENANT_ID" \
|
||||
--state-file ./probe-state.json \
|
||||
--output ./probe-result.json
|
||||
```
|
||||
|
||||
### Trigger the backup pipeline standalone
|
||||
|
||||
```bash
|
||||
python3 scripts/trigger_backup_pipeline.py \
|
||||
--organization "contoso" \
|
||||
--project "Intune" \
|
||||
--pipeline-id 1 \
|
||||
--token "$ADO_TOKEN" \
|
||||
--branch refs/heads/main
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Use the unified provisioning script:
|
||||
|
||||
```powershell
|
||||
.\deploy\provision-change-probe.ps1 `
|
||||
-TenantName "contoso.onmicrosoft.com" `
|
||||
-ResourceGroupName "rg-astral-probe" `
|
||||
-Location "westeurope" `
|
||||
-DeployFunctionApp
|
||||
```
|
||||
|
||||
The script will:
|
||||
|
||||
1. Register an Entra app (or reuse an existing one).
|
||||
2. Grant admin consent for Graph permissions.
|
||||
3. Create a client secret.
|
||||
4. Provision Resource Group, Storage Account, and Function App (Linux Consumption, Python 3.11).
|
||||
5. Configure application settings.
|
||||
6. Build and deploy the function package.
|
||||
|
||||
### Manual deployment (zip package)
|
||||
|
||||
If you prefer to deploy manually:
|
||||
|
||||
```bash
|
||||
cd infra/change-probe
|
||||
|
||||
# Copy shared scripts into the package directory
|
||||
cp ../../scripts/common.py scripts/
|
||||
cp ../../scripts/probe_tenant_changes.py scripts/
|
||||
cp ../../scripts/trigger_backup_pipeline.py scripts/
|
||||
|
||||
# Install production dependencies into the package
|
||||
pip install -r requirements.txt --target .python_packages/lib/site-packages
|
||||
|
||||
# Build the zip (Linux Consumption requires .python_packages/lib/site-packages, NOT python3.11/)
|
||||
zip -r function-package.zip \
|
||||
probe_timer/ queue_consumer/ scripts/ .python_packages/ \
|
||||
host.json requirements.txt \
|
||||
-x "*.pyc" -x "__pycache__/*"
|
||||
|
||||
# Upload and set WEBSITE_RUN_FROM_PACKAGE
|
||||
az functionapp deployment source config-zip \
|
||||
--resource-group rg-astral-probe \
|
||||
--name func-astral-probe \
|
||||
--src function-package.zip
|
||||
```
|
||||
|
||||
## Permissions
|
||||
|
||||
### Entra App (Graph access)
|
||||
|
||||
The probe requires the same read permissions as the main backup pipeline:
|
||||
|
||||
- `DeviceManagementConfiguration.Read.All`
|
||||
- `DeviceManagementApps.Read.All`
|
||||
- `AuditLog.Read.All`
|
||||
- `Directory.Read.All`
|
||||
|
||||
### Azure DevOps PAT
|
||||
|
||||
The `ADO_TOKEN` must have:
|
||||
|
||||
- **Build** → *Read & execute*
|
||||
|
||||
## Monitoring
|
||||
|
||||
Check the `ProbeState` table for current debouncer state:
|
||||
|
||||
```bash
|
||||
az storage entity query --table-name ProbeState --account-name <storage>
|
||||
```
|
||||
|
||||
Check the queue depth:
|
||||
|
||||
```bash
|
||||
az storage queue list --account-name <storage>
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---|---|---|
|
||||
| Timer fires but no state update | `schedule_status["last"]` case mismatch (fixed in current version) | Ensure deployed code uses `.get("Last")` |
|
||||
| Probe script `ModuleNotFoundError` | Bundled packages in wrong path | Use `.python_packages/lib/site-packages`, not `python3.11/site-packages` |
|
||||
| Queue message lands in poison queue | `ADO_TOKEN` missing or invalid | Verify token in Function App settings and restart |
|
||||
| Probe never triggers | No audit events in Graph window | Normal if tenant is idle; verify `AuditLog.Read.All` permission |
|
||||
| Duplicate pipeline runs | Multiple messages queued | Check debouncer state; cooldown should prevent this |
|
||||
15
infra/change-probe/host.json
Normal file
15
infra/change-probe/host.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"version": "2.0",
|
||||
"logging": {
|
||||
"applicationInsights": {
|
||||
"samplingSettings": {
|
||||
"isEnabled": true,
|
||||
"excludedTypes": "Request"
|
||||
}
|
||||
}
|
||||
},
|
||||
"extensionBundle": {
|
||||
"id": "Microsoft.Azure.Functions.ExtensionBundle",
|
||||
"version": "[4.*, 5.0.0)"
|
||||
}
|
||||
}
|
||||
19
infra/change-probe/local.settings.json.example
Normal file
19
infra/change-probe/local.settings.json.example
Normal file
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"IsEncrypted": false,
|
||||
"Values": {
|
||||
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
|
||||
"FUNCTIONS_WORKER_RUNTIME": "python",
|
||||
"PROBE_APP_ID": "",
|
||||
"PROBE_APP_SECRET": "",
|
||||
"TENANT_ID": "",
|
||||
"GRAPH_TOKEN": "",
|
||||
"ADO_ORGANIZATION": "",
|
||||
"ADO_PROJECT": "",
|
||||
"ADO_PIPELINE_ID": "",
|
||||
"ADO_TOKEN": "",
|
||||
"ADO_BRANCH": "main",
|
||||
"PROBE_QUIET_WINDOW_MINUTES": "15",
|
||||
"PROBE_COOLDOWN_MINUTES": "30",
|
||||
"REPO_ROOT": "../../"
|
||||
}
|
||||
}
|
||||
137
infra/change-probe/probe_timer/__init__.py
Normal file
137
infra/change-probe/probe_timer/__init__.py
Normal file
@@ -0,0 +1,137 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Azure Function timer trigger that probes tenant audit logs and queues a backup run when changes are detected."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from typing import Any
|
||||
|
||||
import azure.functions as func
|
||||
from azure.data.tables import TableServiceClient
|
||||
|
||||
_TABLE_NAME = "ProbeState"
|
||||
_PARTITION_KEY = "singleton"
|
||||
_ROW_KEY = "default"
|
||||
|
||||
|
||||
def _repo_root() -> str:
|
||||
"""Resolve the repository root so we can invoke scripts/probe_tenant_changes.py."""
|
||||
env_root = os.environ.get("REPO_ROOT", "").strip()
|
||||
if env_root:
|
||||
return os.path.abspath(env_root)
|
||||
return os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
|
||||
def _load_state(connection_string: str) -> dict[str, Any]:
|
||||
"""Load persisted probe state from Azure Table Storage."""
|
||||
try:
|
||||
service = TableServiceClient.from_connection_string(conn_str=connection_string)
|
||||
table = service.get_table_client(table_name=_TABLE_NAME)
|
||||
entity = table.get_entity(partition_key=_PARTITION_KEY, row_key=_ROW_KEY)
|
||||
raw = entity.get("state", "{}")
|
||||
return json.loads(raw) if isinstance(raw, str) else dict(raw)
|
||||
except Exception as exc:
|
||||
logging.warning(f"Unable to load state from Table Storage ({exc}); starting fresh.")
|
||||
return {}
|
||||
|
||||
|
||||
def _save_state(connection_string: str, state: dict[str, Any]) -> None:
|
||||
"""Persist probe state to Azure Table Storage."""
|
||||
service = TableServiceClient.from_connection_string(conn_str=connection_string)
|
||||
table = service.get_table_client(table_name=_TABLE_NAME)
|
||||
table.upsert_entity(
|
||||
{
|
||||
"PartitionKey": _PARTITION_KEY,
|
||||
"RowKey": _ROW_KEY,
|
||||
"state": json.dumps(state),
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def main(mytimer: func.TimerRequest, msg: func.Out[str]) -> None:
|
||||
utc_now = mytimer.schedule_status.get("Last", "n/a") if mytimer.schedule_status else "n/a"
|
||||
logging.info(f"Probe timer triggered at {utc_now}")
|
||||
|
||||
client_id = os.environ.get("PROBE_APP_ID", "").strip()
|
||||
client_secret = os.environ.get("PROBE_APP_SECRET", "").strip()
|
||||
tenant_id = os.environ.get("TENANT_ID", "").strip()
|
||||
token = os.environ.get("GRAPH_TOKEN", "").strip()
|
||||
|
||||
auth_args: list[str] = []
|
||||
if token:
|
||||
auth_args = ["--token", token]
|
||||
elif client_id and client_secret and tenant_id:
|
||||
auth_args = [
|
||||
"--client-id", client_id,
|
||||
"--client-secret", client_secret,
|
||||
"--tenant-id", tenant_id,
|
||||
]
|
||||
else:
|
||||
logging.error("No Graph authentication configured (PROBE_APP_ID/SECRET/TENANT_ID or GRAPH_TOKEN).")
|
||||
return
|
||||
|
||||
connection_string = os.environ.get("AzureWebJobsStorage", "").strip()
|
||||
if not connection_string:
|
||||
logging.error("AzureWebJobsStorage connection string is missing.")
|
||||
return
|
||||
|
||||
state = _load_state(connection_string)
|
||||
state_json = json.dumps(state) if state else ""
|
||||
quiet_window = os.environ.get("PROBE_QUIET_WINDOW_MINUTES", "15")
|
||||
cooldown = os.environ.get("PROBE_COOLDOWN_MINUTES", "30")
|
||||
|
||||
probe_script = os.path.join(_repo_root(), "scripts", "probe_tenant_changes.py")
|
||||
if not os.path.exists(probe_script):
|
||||
logging.error(f"Probe script not found at {probe_script}")
|
||||
return
|
||||
|
||||
cmd = [
|
||||
sys.executable,
|
||||
probe_script,
|
||||
*auth_args,
|
||||
"--quiet-window-minutes", quiet_window,
|
||||
"--cooldown-minutes", cooldown,
|
||||
]
|
||||
if state_json:
|
||||
cmd.extend(["--state-json", state_json])
|
||||
|
||||
logging.info(f"Running probe script: {probe_script}")
|
||||
try:
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
|
||||
except subprocess.TimeoutExpired:
|
||||
logging.error("Probe script timed out after 60 seconds.")
|
||||
return
|
||||
except Exception as exc:
|
||||
logging.error(f"Failed to run probe script ({exc}).")
|
||||
return
|
||||
|
||||
if result.returncode != 0:
|
||||
logging.error(f"Probe script failed (exit {result.returncode}): {result.stderr}")
|
||||
return
|
||||
|
||||
try:
|
||||
output = json.loads(result.stdout)
|
||||
except json.JSONDecodeError as exc:
|
||||
logging.error(f"Probe script returned invalid JSON ({exc}): {result.stdout[:500]}")
|
||||
return
|
||||
|
||||
new_state = output.get("new_state", state)
|
||||
_save_state(connection_string, new_state)
|
||||
|
||||
trigger = output.get("trigger", False)
|
||||
reason = output.get("reason", "no reason given")
|
||||
logging.info(f"Probe result: trigger={trigger}, reason={reason}")
|
||||
|
||||
if trigger:
|
||||
queue_payload = json.dumps(
|
||||
{
|
||||
"reason": reason,
|
||||
"checked_at": output.get("checked_at", ""),
|
||||
}
|
||||
)
|
||||
msg.set(queue_payload)
|
||||
logging.info("Queued backup trigger message.")
|
||||
18
infra/change-probe/probe_timer/function.json
Normal file
18
infra/change-probe/probe_timer/function.json
Normal file
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"scriptFile": "__init__.py",
|
||||
"bindings": [
|
||||
{
|
||||
"name": "mytimer",
|
||||
"type": "timerTrigger",
|
||||
"direction": "in",
|
||||
"schedule": "0 */5 * * * *"
|
||||
},
|
||||
{
|
||||
"name": "msg",
|
||||
"type": "queue",
|
||||
"direction": "out",
|
||||
"queueName": "backup-trigger-queue",
|
||||
"connection": "AzureWebJobsStorage"
|
||||
}
|
||||
]
|
||||
}
|
||||
77
infra/change-probe/queue_consumer/__init__.py
Normal file
77
infra/change-probe/queue_consumer/__init__.py
Normal file
@@ -0,0 +1,77 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Azure Function queue trigger that calls the Azure DevOps REST API to queue a backup pipeline run."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
import azure.functions as func
|
||||
|
||||
|
||||
def _repo_root() -> str:
|
||||
"""Resolve the repository root so we can invoke scripts/trigger_backup_pipeline.py."""
|
||||
env_root = os.environ.get("REPO_ROOT", "").strip()
|
||||
if env_root:
|
||||
return os.path.abspath(env_root)
|
||||
return os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
|
||||
def main(msg: func.QueueMessage) -> None:
|
||||
body = msg.get_body().decode("utf-8")
|
||||
logging.info(f"Queue consumer received message: {body}")
|
||||
|
||||
org = os.environ.get("ADO_ORGANIZATION", "").strip()
|
||||
project = os.environ.get("ADO_PROJECT", "").strip()
|
||||
pipeline_id = os.environ.get("ADO_PIPELINE_ID", "").strip()
|
||||
token = os.environ.get("ADO_TOKEN", "").strip()
|
||||
branch = os.environ.get("ADO_BRANCH", "main").strip()
|
||||
|
||||
if not all([org, project, pipeline_id, token]):
|
||||
logging.error("Missing one or more ADO configuration variables (ADO_ORGANIZATION, ADO_PROJECT, ADO_PIPELINE_ID, ADO_TOKEN).")
|
||||
# Re-raising causes the Functions runtime to retry the message after the visibility timeout.
|
||||
raise RuntimeError("Incomplete ADO configuration")
|
||||
|
||||
trigger_script = os.path.join(_repo_root(), "scripts", "trigger_backup_pipeline.py")
|
||||
if not os.path.exists(trigger_script):
|
||||
logging.error(f"Trigger script not found at {trigger_script}")
|
||||
raise RuntimeError("Trigger script missing")
|
||||
|
||||
cmd = [
|
||||
sys.executable,
|
||||
trigger_script,
|
||||
"--organization",
|
||||
org,
|
||||
"--project",
|
||||
project,
|
||||
"--pipeline-id",
|
||||
pipeline_id,
|
||||
"--token",
|
||||
token,
|
||||
"--branch",
|
||||
branch,
|
||||
]
|
||||
|
||||
logging.info(f"Triggering ADO pipeline {pipeline_id} ...")
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=60,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logging.error("Trigger script timed out after 60 seconds.")
|
||||
raise
|
||||
except Exception as exc:
|
||||
logging.error(f"Failed to run trigger script ({exc}).")
|
||||
raise
|
||||
|
||||
if result.returncode != 0:
|
||||
logging.error(f"Trigger script failed (exit {result.returncode}): {result.stderr}")
|
||||
raise RuntimeError(f"Trigger script failed: {result.stderr}")
|
||||
|
||||
logging.info(f"Trigger script succeeded: {result.stdout.strip()}")
|
||||
12
infra/change-probe/queue_consumer/function.json
Normal file
12
infra/change-probe/queue_consumer/function.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"scriptFile": "__init__.py",
|
||||
"bindings": [
|
||||
{
|
||||
"name": "msg",
|
||||
"type": "queueTrigger",
|
||||
"direction": "in",
|
||||
"queueName": "backup-trigger-queue",
|
||||
"connection": "AzureWebJobsStorage"
|
||||
}
|
||||
]
|
||||
}
|
||||
3
infra/change-probe/requirements.txt
Normal file
3
infra/change-probe/requirements.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
azure-functions
|
||||
azure-data-tables
|
||||
azure-storage-queue
|
||||
Reference in New Issue
Block a user