Files
astral/infra/change-probe/README.md
Tomas Kracmar 2c41eaca44 Sync from dev @ 497baf0
Source: main (497baf0)
Excluded: live tenant exports, generated artifacts, and dev-only tooling.
2026-04-21 22:21:43 +02:00

9.5 KiB
Raw Permalink Blame History

ASTRAL Change Probe

Event-driven backup trigger for ASTRAL. Monitors Intune and Entra ID audit logs via Microsoft Graph, debounces change bursts, and queues the Azure DevOps backup pipeline only when actual drift is detected.

Why this exists

Microsoft Graph change notifications and delta queries do not support Intune device management or Conditional Access resources. The only viable event-driven approach is polling the Graph audit log APIs, which have a 515 minute propagation delay. This probe implements a debouncer on top of that polling to avoid backup storms during bulk changes.

Architecture

┌─────────────────┐     5 min      ┌──────────────┐    quiet window    ┌─────────────────┐
│  Timer Trigger  │ ─────────────► │  probe_timer │ ─────────────────► │  backup-trigger │
│  (probe_timer)  │                │  (debouncer) │   (15 min armed)   │  -queue         │
└─────────────────┘                └──────────────┘                    └────────┬────────┘
        │                                                                        │
        │  load/save state                                                       │ dequeue
        │  (Azure Table Storage)                                                 ▼
        │                                                               ┌─────────────────┐
        │                                                               │ queue_consumer  │
        └──────────────────────────────────────────────────────────────►│  (ADO REST API) │
                                                                        └─────────────────┘
                                                                                │
                                                                                ▼
                                                                        ┌─────────────────┐
                                                                        │  Azure DevOps   │
                                                                        │  backup pipeline│
                                                                        └─────────────────┘

Components

probe_timer (Timer Trigger)

  • Schedule: every 5 minutes (0 */5 * * * *)
  • Input: TimerRequest from Functions runtime
  • Output: queue message to backup-trigger-queue (via func.Out[str])
  • Actions:
    1. Load debouncer state from Azure Table Storage (ProbeState / singleton / default).
    2. Run scripts/probe_tenant_changes.py via subprocess.
    3. Save updated state back to Table Storage.
    4. If trigger=true, emit a queue message.

queue_consumer (Queue Trigger)

  • Input: QueueMessage from backup-trigger-queue
  • Actions:
    1. Parse JSON payload (reason, checked_at).
    2. Call Azure DevOps REST API to queue the backup pipeline run.
    3. Raise on failure so the Functions runtime handles retry and poison-queue logic.

scripts/probe_tenant_changes.py

Standalone CLI script that can also be run locally. It:

  • Queries Intune (deviceManagement/auditEvents) and Entra (directoryAudits) audit logs.
  • Implements a three-state debouncer: idlearmedcooldown.
  • Returns JSON with trigger, reason, and new_state.

scripts/trigger_backup_pipeline.py

Standalone CLI script that queues an Azure DevOps pipeline run via REST API. Can be used locally or from the queue consumer.

Debouncer State Machine

State Condition to transition Output
idle Audit log shows a new change armed
armed Quiet window elapsed (default 15 min) with no newer events cooldown, trigger=true
armed Newer event arrives while armed Stay armed, extend quiet window
cooldown Cooldown elapsed (default 30 min) idle
cooldown New event arrives Stay cooldown (change is buffered until cooldown ends)

Configuration

All settings are provided via Function App application settings (environment variables):

Setting Required Default Description
AzureWebJobsStorage Yes Storage account connection string (tables + queues)
PROBE_APP_ID Yes* Entra app registration client ID
PROBE_APP_SECRET Yes* Entra app client secret
TENANT_ID Yes* Microsoft 365 tenant ID
GRAPH_TOKEN No Optional passthrough token ( skips client credentials flow )
ADO_ORGANIZATION Yes Azure DevOps organization name
ADO_PROJECT Yes Azure DevOps project name
ADO_PIPELINE_ID Yes Backup pipeline definition ID
ADO_TOKEN Yes Azure DevOps PAT with Build (read & execute)
ADO_BRANCH No main Git ref to queue the pipeline against
PROBE_QUIET_WINDOW_MINUTES No 15 Minutes to wait for change burst to settle
PROBE_COOLDOWN_MINUTES No 30 Minutes between successive triggers

* Required unless GRAPH_TOKEN is provided.

Local Development

Prerequisites

Install dependencies

cd infra/change-probe
pip install -r requirements.txt

Copy shared scripts

The probe reuses scripts from the repository root. Copy them into this directory before building or running locally:

cp ../../scripts/common.py scripts/
cp ../../scripts/probe_tenant_changes.py scripts/
cp ../../scripts/trigger_backup_pipeline.py scripts/

Run locally

# Start Azurite (Storage emulator)
azurite --silent --location ./azurite --debug ./azurite/debug.log

# Copy local settings template
cp local.settings.json.example local.settings.json
# Edit local.settings.json with your values

# Start the Functions host
func start

Run the probe script standalone

cd ../..
python3 scripts/probe_tenant_changes.py \
  --client-id "$PROBE_APP_ID" \
  --client-secret "$PROBE_APP_SECRET" \
  --tenant-id "$TENANT_ID" \
  --state-file ./probe-state.json \
  --output ./probe-result.json

Trigger the backup pipeline standalone

python3 scripts/trigger_backup_pipeline.py \
  --organization "contoso" \
  --project "Intune" \
  --pipeline-id 1 \
  --token "$ADO_TOKEN" \
  --branch refs/heads/main

Deployment

Use the unified provisioning script:

.\deploy\provision-change-probe.ps1 `
  -TenantName "contoso.onmicrosoft.com" `
  -ResourceGroupName "rg-astral-probe" `
  -Location "westeurope" `
  -DeployFunctionApp

The script will:

  1. Register an Entra app (or reuse an existing one).
  2. Grant admin consent for Graph permissions.
  3. Create a client secret.
  4. Provision Resource Group, Storage Account, and Function App (Linux Consumption, Python 3.11).
  5. Configure application settings.
  6. Build and deploy the function package.

Manual deployment (zip package)

If you prefer to deploy manually:

cd infra/change-probe

# Copy shared scripts into the package directory
cp ../../scripts/common.py scripts/
cp ../../scripts/probe_tenant_changes.py scripts/
cp ../../scripts/trigger_backup_pipeline.py scripts/

# Install production dependencies into the package
pip install -r requirements.txt --target .python_packages/lib/site-packages

# Build the zip (Linux Consumption requires .python_packages/lib/site-packages, NOT python3.11/)
zip -r function-package.zip \
  probe_timer/ queue_consumer/ scripts/ .python_packages/ \
  host.json requirements.txt \
  -x "*.pyc" -x "__pycache__/*"

# Upload and set WEBSITE_RUN_FROM_PACKAGE
az functionapp deployment source config-zip \
  --resource-group rg-astral-probe \
  --name func-astral-probe \
  --src function-package.zip

Permissions

Entra App (Graph access)

The probe requires the same read permissions as the main backup pipeline:

  • DeviceManagementConfiguration.Read.All
  • DeviceManagementApps.Read.All
  • AuditLog.Read.All
  • Directory.Read.All

Azure DevOps PAT

The ADO_TOKEN must have:

  • BuildRead & execute

Monitoring

Check the ProbeState table for current debouncer state:

az storage entity query --table-name ProbeState --account-name <storage>

Check the queue depth:

az storage queue list --account-name <storage>

Troubleshooting

Symptom Likely cause Fix
Timer fires but no state update schedule_status["last"] case mismatch (fixed in current version) Ensure deployed code uses .get("Last")
Probe script ModuleNotFoundError Bundled packages in wrong path Use .python_packages/lib/site-packages, not python3.11/site-packages
Queue message lands in poison queue ADO_TOKEN missing or invalid Verify token in Function App settings and restart
Probe never triggers No audit events in Graph window Normal if tenant is idle; verify AuditLog.Read.All permission
Duplicate pipeline runs Multiple messages queued Check debouncer state; cooldown should prevent this