feat: Add vulnerability-management arc — Book VII, quantum framework, ORION, and kill-chain assessment tool

feat: Add four consultant assignments (identity, CA, Intune, collaboration)
feat: Add management overlay pattern (Nebula T0 / Tailscale T1) and cloud admin VM guidance
2026-06-15 07:56:50 +02:00 · 2026-06-09 16:56:48 +02:00 · 2026-06-09 14:40:34 +02:00
17 changed files with 2871 additions and 22 deletions
@@ -34,11 +34,13 @@ Most security and resilience frameworks optimize for **robustness**—the abilit
 │   ├── executive-summary.md          # One-page board brief
 │   ├── executive-summary-cs.md       # Czech version of board brief (Výkonné shrnutí)
 │   ├── c-suite-conversation-guide.md # Persuasion scripts for top management
-│   └── t0-asset-framework.md       # Tier 0 asset classification and protection
+│   ├── t0-asset-framework.md       # Tier 0 asset classification and protection
 │   └── quantum-vulnerability-management.md # Time-budgeted quanta model for the exploitation-first era (Book VII companion)
 ├── playbooks/                      # Executable modernisation and response plans
 │   ├── rapid-modernisation-plan.md # 30-60-90-180 day transformation roadmap
 │   ├── endpoint-management-entry-vector.md # Intune/device management as engagement entry point
 │   ├── ai-assisted-tvm.md          # AI-powered vulnerability management blueprint
 │   ├── kill-chain-assessment-app.md # Spec for the offline kill-chain mapping tool (tools/kill-chain-assessment.html)
 │   ├── zero-budget-vulnerability-discovery.md # Script-based vuln discovery without commercial scanners
 │   ├── perimeter-scanning-capability.md # External attack surface scanning strategy
 │   ├── osquery-custom-platform.md    # Build a sovereign vuln/asset discovery platform on osquery
@@ -66,6 +68,10 @@ Most security and resilience frameworks optimize for **robustness**—the abilit
 │   ├── vertical-power-utilities.md # Power generation, transmission, water utilities
 │   ├── vertical-telco.md           # Telecommunications and mobile operators
 │   └── vertical-banking.md         # Financial services regulatory alignment
 ├── tools/                          # Standalone runnable instruments (offline, single-file)
 │   ├── README.md                   # Tool index and design constraints
 │   └── kill-chain-assessment.html  # Maps unknown estates → shortest existential path → quanta
 ├── books/                          # The Antifragile Handbook (Books I–VII + field guides)
 └── assets/                         # Diagrams, visuals, and presentation materials
 ```
@@ -96,7 +96,18 @@ Nothing here replaces the governing question from Book I:
 - `[LOOK AT]` How many Domain Admins and Enterprise Admins exist, and are they all justified with named owners?
 - `[ASK]` When was the privileged account list last reviewed, and by whom?
-### B2. PIM / JIT
+### B2. Admin workstations and management plane
 - `[ASK]` What do admins use to reach a domain controller remotely? Is that path independent of the AD it manages, or does it depend on AD for authentication?
 - `[LOOK AT]` Do admins use the same device for privileged work (DC management, PIM activation) and daily tasks (email, browsing)?
 - `[ASK]` Is there a dedicated admin workstation — physical PAW or cloud admin VM (Windows 365 / AVD) — that is used only for privileged tasks?
 - `[LOOK AT]` If a cloud admin VM exists: is it enrolled in Intune with a hardened profile? Is it excluded from email and general browsing? Is it the device scoped in the CA policy restricting privileged role access?
 - `[LOOK AT]` Is there a management overlay (Nebula, Tailscale, Headscale) providing the admin access path to on-prem Tier 0 systems?
 - `[ASK]` If a Nebula T0 overlay exists: where is the CA key stored? Who can sign new node certificates? When was the last signing ceremony?
 - `[ASK]` If a Tailscale T1 overlay exists: is key expiry configured? Does re-authentication require phishing-resistant MFA via Entra?
 - `[LOOK AT]` For multi-cloud clients without a physical data centre: is the management plane explicitly designed, or is access to cloud management consoles and on-prem servers done ad hoc (VPN, direct RDP, per-cloud bastion, no unified plane)?
 ### B3. PIM / JIT
 - `[LOOK AT]` Is Entra PIM deployed and enforced for Entra administrative roles?
 - `[LOOK AT]` Are Entra roles set to eligible (not active) by default?
@@ -106,7 +117,7 @@ Nothing here replaces the governing question from Book I:
 - `[LOOK AT]` Is PIM alert configuration enabled (Roles activated without MFA, Redundant assignments, etc.)?
 - `[ASK]` For on-prem DA/EA: is there any JIT or time-limited elevation mechanism in place?
-### B3. Service Accounts (On-Prem)
+### B4. Service Accounts (On-Prem)
 - `[LOOK AT]` Are there service accounts with SPNs and static passwords older than 12 months? (Kerberoastable)
 - `[LOOK AT]` Which service accounts are over-permissioned (e.g., Domain Admin, local admin on all servers)?
@@ -114,7 +125,7 @@ Nothing here replaces the governing question from Book I:
 - `[LOOK AT]` Are there service accounts nobody can identify a current owner for?
 - `[TEST]` Run a Kerberoast simulation: do ticket requests for service account SPNs generate any detection?
-### B4. Service Principals & App Registrations (Cloud)
+### B5. Service Principals & App Registrations (Cloud)
 - `[LOOK AT]` Which app registrations hold escalation-grade Graph permissions (application permissions): `RoleManagement.ReadWrite.Directory`, `AppRoleAssignment.ReadWrite.All`, `Application.ReadWrite.All`, `Directory.ReadWrite.All`?
 - `[LOOK AT]` Which app registrations have non-expiring client secrets?
@@ -122,14 +133,14 @@ Nothing here replaces the governing question from Book I:
 - `[LOOK AT]` Which apps have tenant-wide admin consent, and is each justified and reviewed?
 - `[LOOK AT]` Which Azure workloads use client secrets instead of managed identities where managed identities are available?
-### B5. Tier Model / Clean Source
+### B6. Tier Model / Clean Source
 - `[LOOK AT]` Do Domain Admins / Enterprise Admins authenticate from standard workstations used for email and browsing?
 - `[LOOK AT]` Is ADCS (Active Directory Certificate Services) deployed? If so, is it on a Tier 0 or hardened host, or on a standard server?
 - `[LOOK AT]` Are there shared administrative jump boxes that cross tier boundaries (used for both Tier 0 and Tier 1 work)?
 - `[LOOK AT]` Do cloud admins use the same device for privileged Entra work as for daily activity?
-### B6. Escalation Paths
+### B7. Escalation Paths
 - `[LOOK AT]` Are there accounts with `GenericAll`, `WriteDACL`, or `WriteOwner` on high-value AD objects (domain root, DCs, admin groups) that are not themselves Tier 0?
 - `[LOOK AT]` Are there computers with unconstrained delegation enabled (excluding DCs)?
@@ -137,7 +148,7 @@ Nothing here replaces the governing question from Book I:
 - `[LOOK AT]` Is LAPS (Windows LAPS preferred) deployed across all workstations and servers? What is the coverage percentage?
 - `[TEST]` Run BloodHound (or equivalent) and count attack paths to Domain Admin. Note the number as a baseline. Is it going up or down over time?
-### B7. Break-Glass
+### B8. Break-Glass
 - `[LOOK AT]` Do cloud-only break-glass Global Admin accounts exist?
 - `[LOOK AT]` Is phishing-resistant authentication (FIDO2 or certificate) configured on break-glass accounts?
@@ -146,7 +157,7 @@ Nothing here replaces the governing question from Book I:
 - `[TEST]` Sign in to the break-glass account in a controlled drill. Does it work? Does the alert fire? Does someone respond?
 - `[ASK]` Where are the break-glass credentials stored, and can they be retrieved without the systems they recover?
-### B8. Phishing-Resistant MFA for Admins
+### B9. Phishing-Resistant MFA for Admins
 - `[LOOK AT]` What MFA method is enforced for Global Admins: FIDO2, certificate-based auth, or push/SMS?
 - `[LOOK AT]` Push-approve and SMS are not acceptable for administrative accounts. If they are in use, that is a P0.
@@ -77,8 +77,9 @@ Privilege is the domain where deletion is the entire strategy. Adding "privilege
 **Paranoid protection for privileged work means, non-negotiably:**
- **PAWs** — privileged access workstations. All Tier 0 / Global Admin work happens from a clean, hardened, single-purpose device that never reads email or browses the web. The admin's normal laptop is Tier 2 and stays there.
+- **PAWs — the principle and the practical reality.** The principle: all Tier 0 / Global Admin work from a clean, hardened, single-purpose device that never reads email or browses the web. The admin's normal laptop is Tier 2. This is right. The practical reality: physical PAWs almost never get deployed. The hardware procurement, the second device on the desk, the behaviour change — all of it defeats the project before it starts. The deployable alternative that preserves the essential properties is a **cloud-hosted admin workstation** — a Windows 365 or Azure Virtual Desktop VM provisioned from a hardened template, enrolled in the management overlay, used only for privileged tasks. The admin connects from their normal device via browser or RDP. Privileged credentials live in the cloud VM, not on the admin's local device. If the VM is compromised: wipe it, reprovision from template in 20 minutes. The security property is the same — credentials isolated from the daily-use device — without the hardware problem. This is the practical PAW. Recommend it before recommending a dedicated physical device; it will actually get deployed.
- **Phishing-resistant MFA only** for admins — FIDO2 / passkeys / certificate- based. SMS and push-approve are not admin-grade; they're phishable, and admins are the phishing prize.
+- **The management overlay** connects the admin workstation (cloud VM or physical PAW) to the systems it manages without exposing those systems to the general network. The T0/T1 split matters here and maps directly to the tier model: T0 systems (DCs, ADCS, sync server) get an overlay with no external runtime dependency (Nebula with pre-distributed certificates); T1 systems (member servers, cloud workloads, multi-cloud resources) get an overlay with identity-aware access and per-session MFA (Tailscale with Entra OIDC). The realistic T0 node count for a 5,000-person organisation is 15–25 nodes — small enough to manage with a documented certificate ceremony and a spreadsheet, not a full PKI team. The management overlay is what makes remote and hybrid admin work possible without either a traditional VPN's flat-network problem or physical-presence-only access.
 - **Phishing-resistant MFA only** for admins — FIDO2 / passkeys / certificate-based. SMS and push-approve are not admin-grade; they're phishable, and admins are the phishing prize. For the management overlay, this means Tailscale configured with key expiry and an Entra OIDC IdP enforcing FIDO2 — so the WireGuard device trust and a per-session identity assertion are both present, not just the device key.
 - **Separate, cloud-only privileged identities** for cloud admin (the Book II firebreak, enforced here). On-prem admin identity must not be the cloud admin identity.
 - **JIT for everything** via PIM: eligible-not-active, time-boxed, MFA on activation, justification logged, and **approval workflow on the crown roles**.
 - **Conditional Access scoped to admins** — privileged roles usable only from PAWs / compliant devices / named locations.
@@ -116,6 +117,8 @@ Stable and Lindy (teach with confidence): standing privilege is the core risk; t
 What moves, and what you must verify against current Microsoft documentation:
 - **The management overlay pattern** (covered in §3 above) is stable in principle — the T0/T1 split, the clean-source reasoning for isolating the management plane, the cloud admin VM as the deployable PAW substitute. What moves: the specific tooling. Nebula's CA and ACL model, Tailscale's per-session MFA configuration and OIDC integration, and the Windows 365 / AVD provisioning model all evolve. Verify current implementation guidance before deploying, and confirm Tailscale's key-expiry and IdP enforcement behaviour is still available as described.
 - **PIM capabilities, role definitions, and the risk classification of specific Graph permissions** evolve continually. Confirm which scopes are escalation-grade *today* rather than trusting a 2026 list.
 - **On-prem JIT/PAM tooling is genuinely weaker and more fragmented than the cloud story.** Native time-bound group membership, MIM PAM, and third-party PAM all have trade-offs that shift. Don't promise a client a clean AD-native JIT experience without checking current reality — and be honest that on-prem eligibility is harder than PIM makes cloud look.
 - **gMSA vs dMSA.** gMSA is the established, Lindy answer for managed service accounts. **dMSA** (delegated managed service accounts, introduced with the Windows Server 2025 generation) targets the real gap — migrating a standing service account and disabling the original — but newer mechanisms carry newer attack surface, and there has been published privilege-escalation research against the dMSA migration path. **Verify current patch and hardening guidance before you recommend dMSA**; this is exactly the kind of new-and-shiny that Book I principle 8 warns about. gMSA until you've checked dMSA's current state.
@@ -136,6 +139,8 @@ If a client's safety hinges on a current specific, look it up and cite it. "I ne
 - Is ADCS treated as Tier 0? When was KRBTGT last rotated? Is LAPS deployed?
 - Break-glass: does it exist, is it monitored to scream on use, and when was it last *tested* — not created, tested?
 - How many paths to Domain Admin / Global Admin exist right now, and is that number going up or down?
 - What does an admin use to reach a domain controller remotely — and if that path is compromised, what does the attacker get? Is the management access path independent of the estate it manages?
 - Are privileged credentials ever typed into or stored on a device that is also used for email and browsing? If yes, the session isolation that PAWs are meant to provide does not exist, regardless of what the policy says.
 ---
@@ -0,0 +1,203 @@
 # The Antifragile Handbook for M365 & Active Directory
 ## Book VII — Vulnerability Management
 > *The patch cycle was built for a world where you had weeks. That world is gone. Exploitation now arrives in hours, the patch arrives in days, and no amount of "patch faster" closes a gap that runs the wrong way by two orders of magnitude. Stop racing the attacker to the patch. Change the race.*
 ---
 ## The governing question
 The first six books were written for a world in which the dominant way into an estate was a person — phished, tricked, talked past the controls. That assumption is now wrong. As of the 2026 Verizon DBIR, **exploitation of vulnerabilities is the leading initial-access vector in confirmed breaches — roughly twice phishing, for the first time in the report's history.** The front door changed. This book changes the lens to match.
 The governing question is the same as everywhere else in the handbook, pointed at the vulnerability surface:
 > **When — not if — a vulnerability on your estate is exploited, does the estate come back weaker, the same, or stronger?**
 A fragile estate treats every CVE as a race it has already lost and patches by score until the analyst burns out. A robust estate patches the important ones fast and survives. An antifragile estate **stops treating the vulnerability list as the unit of work at all** — it asks where the vulnerability sits on the kill chain, removes the false urgency that hides the real targets, contains the few that matter in hours, and feeds every exploited path back into architecture so the *next* vulnerability on that path is a non-event.
 The reframe that powers the book: **you cannot win a speed race against machine-speed exploitation by moving your humans faster, and you do not have to.** The winning move is not to patch the long tail before the attacker reaches it — that is arithmetically impossible and getting worse. The winning move is to make most vulnerabilities not matter (blast-radius and reachability), contain the few that do in the time you actually have (hours, not weeks), and convert every near-miss into a permanently shorter kill chain.
 ---
 ## Why the old model is finished — the arithmetic
 Four numbers end the debate, and they are worth saying out loud to a client in a room:
 - **Time-to-exploit has collapsed** from a median of 771 days in 2018 to roughly **4 hours** by 2024. The window the entire patch-management model was built around — the weeks between disclosure and exploitation — has effectively closed.
 - **Patching still takes weeks.** The 2026 DBIR puts median remediation of edge-device vulnerabilities at **43 days**, with only **54% remediated within a year.** 43 days versus 4 hours is the whole story.
 - **Volume has gone vertical.** ~59,000 new CVEs were projected for 2025, a ~50% year-on-year increase, and 2026 is on pace to exceed it. The enrichment infrastructure has buckled under the load — NIST reclassified ~29,000 backlogged CVEs to "Not Scheduled," meaning the data you relied on to prioritise is arriving late or never.
 - **Exploitation is being automated.** Autonomous exploitation research has demonstrated AI systems exploiting 174 of 178 CISA Known-Exploited Vulnerabilities at an average of ~21 minutes each, with no human in the loop, and an ~87% success rate against one-day vulnerabilities in real software. The attacker side automates faster than the defender side because generating a working exploit for a known bug is a clean, verifiable, deterministic problem — exactly what machines are good at — while *defending* requires environmental context, which is exactly what they have historically been bad at.
 The honest conclusion: **a human-paced, score-sorted patch programme is now structurally incapable of keeping pace.** This is not a maturity problem to be solved with more analysts. It is a model that has run out of road. Everything below is the replacement.
 One piece of good news hides in the data, and the whole framework leans on it: **roughly 90% of "critical" vulnerabilities are not actually exploitable in a given environment once compensating controls, reachability, and segmentation are properly mapped.** The fragility is not that you have 40,000 criticals. It is that you cannot yet tell which ~10% are real, so you treat all 40,000 as equally urgent and drown. Antifragile vulnerability management is, before anything else, the discipline of removing the 90% of false urgency so the real targets become visible.
 ---
 ## 1. Fragility inventory — where vulnerability management rots
 ### CVSS as the prioritisation engine
 The original sin. CVSS scores *severity in the abstract* — it knows nothing about whether the vulnerable asset is internet-reachable, whether it sits on the kill chain, whether an exploit exists, or whether an existing control already neutralises it. A 9.8 on a segmented, non-privileged, unreachable host is noise; a 7.5 on an internet-facing box one hop from a domain controller is a P0. Sorting 40,000 findings by CVSS produces a list that is precisely uncorrelated with where the attacker will actually go. It feels like prioritisation. It is sorting by the wrong key.
 ### The infinite, undifferentiated backlog
 "We have 40,000 criticals" is not a vulnerability problem; it is a *triage* problem wearing a vulnerability costume. An undifferentiated backlog has no front — every item looks equally urgent and equally hopeless — so the team either patches by score (wrong key) or freezes. The backlog grows faster than any human process can drain it, which means a backlog-draining strategy is a strategy to fall behind forever.
 ### Patch velocity treated as the only lever
 The reflex when the AI-exploitation story lands is "we need to patch faster." It is the wrong reflex, and it is the most expensive one. You cannot out-patch a 4-hour exploitation window with a 43-day cycle by trimming the cycle to 30 days. Velocity is a real lever for the long tail, but as the *primary* response to the speed problem it is a fragilizing illusion — it consumes the entire budget defending a race you mathematically cannot win, and leaves nothing for the moves that actually change the outcome (reachability, blast radius, containment, architecture).
 ### The half-done remediation — the ghost patch
 Book I's ghost-policy corollary, applied to vulnerabilities. A patch deployed to 80% of the fleet, a compensating rule applied but never verified to actually block, a "remediated" ticket closed against a host that quietly rolled back — these are *worse* than an open finding, because the open finding is at least honest. A remediation that displays as done while enforcing nothing is a vulnerability with a clean bill of health. **A vulnerability that is partly fixed is not partly safe; it is fully exploitable and now invisible.**
 ### The unscanned and the unscannable
 You cannot prioritise what you cannot see. The fleet you don't scan (Book IV's shadow and dark device populations), the appliance whose firmware no scanner reads, the SaaS you don't own, the dependency buried three layers into a container image — these are the dangerous quanta precisely because they carry no score at all. An estate that congratulates itself on draining the *known* backlog while the unknown surface grows is optimising the lit area under the streetlight.
 ### Reachability and compensating controls left unmapped
 If you have not mapped which assets are internet-reachable, which sit behind a WAF or EDR, which are segmented away from the crown jewels, then you have no way to perform the one subtraction that matters — collapsing 40,000 criticals to the ~10% that are genuinely exploitable here. Without reachability and control context, every finding is theoretically critical and therefore practically un-prioritisable.
 ### Remediation as the silent bottleneck
 Detection is largely solved — most teams are *drowning* in findings, not short of them. The bottleneck is everything after: triage, ownership, change windows, approvals, deployment, verification. Each human handoff in that chain costs hours or days, and there are usually five or six of them. In a world of 4-hour exploitation, a six-handoff remediation pipeline *is* the vulnerability.
 ### Detection without a feedback path to architecture
 A vuln gets exploited (or nearly), it gets patched, the ticket closes, and the *path* the attacker used — the flat segment, the over-privileged service account, the reachable management interface — stays exactly as it was, waiting for the next CVE to land on it. The incident produced a patch but no structural change. The disorder was wasted. This is the Book VI failure mode pointed at the vulnerability layer, and it is the difference between a programme that gets stronger and one that runs in place forever.
 ---
 ## 2. Via negativa — what to remove
 The defining act of antifragile vulnerability management is **subtraction before addition.** You remove false urgency, false comfort, and false work before you add a single new tool.
 1. **Remove CVSS as the sort key.** It does not go away — it stays as one input — but it stops being the thing that orders the queue. The queue is ordered by kill-chain position and exploitability in *this* environment.
 2. **Remove the ~90% of criticals that aren't exploitable here.** Map reachability and compensating controls and *delete the false urgency* on everything segmented, unreachable, or already neutralised. This is the single highest-leverage move in the entire programme: it turns "40,000 criticals" into "400 that are real and 40 that are on fire," and it is pure subtraction.
 3. **Remove the undifferentiated backlog.** A backlog with no structure is itself a fragility. Replace it with quanta (Section 3) — time-budgeted, atomic, completable units. An item that cannot be placed in a quantum is either not real (delete it) or not yet understood (route it to discovery).
 4. **Remove "patch faster" as the headline strategy.** Demote velocity to what it is — a lever for the long tail — and stop letting it consume the budget that belongs to reachability, blast radius, and containment.
 5. **Remove the half-done remediation from the "done" column.** A fix is not done until it is *verified to enforce* against a real test, not until the ticket is closed. Every quantum closes with a signal or it does not close. (Book I: validate by observation, never by inspection.)
 6. **Remove human handoffs from the hours-lane.** The steps in the critical-quantum pipeline that require no judgement — detection, reachability assessment, work-item generation, routing — get automated within policy guardrails so the scarce human judgement is spent only where judgement is actually required. You are not removing the human; you are removing the human from the steps that were only ever latency.
 ---
 ## 3. Quantum vulnerability management — the core model
 Here is the model the rest of the book turns on, and the direct answer to "how do we size remediation to a world that moves in hours."
 A **quantum** is the smallest unit of remediation that (a) fully closes a specific exploitable path, (b) is sized to a time budget it can *actually be completed within*, and (c) ends in a verifiable signal. The word is deliberate. A quantum is *atomic* — you cannot ship half of it and claim half the protection (that is the ghost patch). And it is *discrete* — work is packetised into units that fit the time you have, not smeared across an infinite backlog.
 The sort key is not severity. It is **time-to-existential-impact**, which is a function of three things the estate actually determines:
 > **kill-chain position × reachability × exploit availability**
 A vulnerability that sits on the path to existential compromise, is reachable by the adversary, and has a working exploit in the wild has a time-to-impact measured in hours. The same vulnerability, segmented away and unreachable, has a time-to-impact measured in months — or never. **The vulnerability is identical; its quantum is different, because its position is different.** This is the Book I principle (kill-chain position changes priority, not the CVE) made operational.
 That sort produces three live quanta and one that is more dangerous than all of them:
 ### Critical quantum — the hours lane
 On the kill chain, reachable, exploitable now. The time budget is **hours**, and that fact dictates the response: **you cannot wait for a patch cycle, so the critical quantum is closed by a compensating control, not necessarily the patch.** Block it at the edge, sever the reachability, disable the vulnerable feature, isolate the host, pull it behind the WAF. The patch follows later in the standard lane on the normal change calendar. The critical quantum's job is to **move the asset out of the hours-window** — to convert a 4-hour time-to-impact into a non-urgent one — by the cheapest fast control available. This is the lane that must be partly autonomous (Section 6), because human-paced execution cannot meet an hours budget.
 ### Severe quantum — the days lane
 Material risk, reachable with friction, or where a compensating control already buys partial cover. The time budget is **days**. These are batched into a days-sized packet of work that can be fully completed and verified inside a single short change window — not started and left at 80%.
 ### Standard quantum — the sprint lane
 The long, real, non-urgent tail. The time budget is a **sprint**. The discipline here is batching: the long tail is drained in sprint-sized quanta of work that *can actually be finished*, each one atomic and verified, rather than as an ever-growing list nobody ever reaches the bottom of. This is the only lane where "patch velocity" is the right tool, and it is fine for it to be slow, because by definition nothing in it is on fire.
 ### Dark quantum — the unsized unknown
 The most dangerous quantum is the one you cannot size, because you cannot yet see the asset, cannot establish reachability, or cannot determine exploitability. An unsized quantum is not a low priority — it is an *uncharacterised* one, and uncharacterised risk on an unknown asset is exactly how estates die. The antifragile response is not to ignore it (it has no score, so the old model does) but to **route it to discovery and to the Kill Chain Assessment** — to spend effort turning a dark quantum into a sized one, because a known severe is safer than an unknown nothing. This lane is why discovery (Book IV, the zero-budget discovery playbooks, the Kill Chain Assessment app) is part of vulnerability management and not separate from it.
 **The quantum discipline in one line:** size every remediation to the time you actually have, make each unit atomic and verifiable, and spend your scarce judgement converting dark quanta into sized ones — not re-sorting the known list by the wrong key.
 ---
 ## 4. The barbell — fast containment and deep architecture, nothing in the fragile middle
 The vulnerability barbell has two ends and a lethal middle.
 **One end: cheap, fast, reversible containment.** The hours-lane compensating controls — edge blocks, reachability cuts, feature disables, isolation. Low cost, high speed, applied within policy, reversible when the patch lands. This end exists to win the time race the patch can never win.
 **The other end: slow, structural, blast-radius reduction.** Segmentation, least privilege, T0 protection, assume-breach architecture (the whole of Books II–V). This is the end that makes the ~90% of vulnerabilities *not matter*, because a vulnerability that cannot reach anything important and cannot pivot is a finding, not an incident. It is slow and expensive and it is the only durable bet — architecture beats velocity in the vulnerability race, and it is the only race you can actually win.
 **The fragile middle to avoid: the aging critical-patch backlog.** A months-long queue of "critical" patches is neither fast containment nor structural fix. It is the worst of both — it carries the urgency of the hours-lane but moves at the speed of the sprint-lane, so it spends maximum anxiety for minimum protection while the attacker clears it for you, one exploited host at a time. The barbell says: contain it fast *or* architect it away. Do not let it sit in the middle, aging, pretending that "we're working through the criticals" is a posture.
 The asymmetric-payoff reading (Pillar 5): a few hours of compensating-control work on a kill-chain node prevents a catastrophe, and a segmentation project that costs a quarter makes a thousand future CVEs irrelevant. Both ends of the barbell are convex. The fragile middle is concave — maximum cost, minimum return.
 ---
 ## 5. Optionality & recovery — designing so most vulnerabilities can't matter
 - **Reachability as a control surface.** If you can cut a vulnerable asset off from the adversary faster than you can patch it — and you almost always can — then reachability *is* your fastest remediation. Build the capability to sever reachability quickly (edge policy as code, network isolation on demand) and you have an answer to every hours-lane finding that does not depend on a vendor patch existing yet.
 - **Compensating-control inventory, mapped in advance.** The ~90% reduction only works if you already know, per asset, what controls are in front of it. Map EDR coverage, WAF rules, segmentation, and internet reachability *before* the incident, so that when a zero-day drops you can answer "are we actually exposed?" in minutes instead of days. This map is the single most valuable artefact in the programme.
 - **Blast-radius limitation as vulnerability management.** Every segmentation boundary and every collapsed standing privilege is a vulnerability-management control, because it converts "exploit one thing, own everything" into "exploit one thing, contain it." The cheapest way to manage a vulnerability is to have already made it survivable.
 - **Known-good baselines and config-as-code (ASTRAL).** When a vulnerability is exploited, the ability to restore the affected control plane to a verified baseline collapses the cost of exploitation. A reachable, recoverable, version-controlled estate treats a successful exploit as an inconvenience, not a catastrophe.
 - **The pre-made "isolate vs patch vs rebuild" decision.** Decide the criteria before the incident: when do we contain-and-wait, when do we emergency-patch, when do we rebuild from known-good? Deciding under fire is how the half-done remediation gets created.
 ---
 ## 6. Stressor — the autonomy and the feedback loop
 Two stressors run this book, and the second is the one that makes it antifragile rather than merely fast.
 ### Autonomy in the hours-lane — matching machine speed with machine speed
 The article that prompted this book is right about the core asymmetry: **attackers are executing at machine speed and defenders are still running remediation through human-paced processes designed for a world with weeks of lead time.** The hours-lane cannot be served by a pipeline with five human handoffs. So the critical quantum's execution — detect the new exposure, cross-reference the asset inventory, assess reachability and compensating controls, generate the work item with context, route it, and in the clear cases *apply the compensating control* — runs autonomously **within human-defined guardrails.**
 The repo's standing scepticism applies and sharpens the point rather than contradicting it: **AI on a broken foundation is expensive noise.** Autonomy without environmental context just generates tickets faster — "faster noise," the exact toil that makes developers dread security. The autonomy only works *because* the foundation is in place: the compensating-control map, the reachability model, the known-good baseline, the segmented architecture. Autonomy is the accelerator on the hours-lane; architecture is still the durable bet. The human role moves up a level — from doing the remediation to **governing the policy**: which classes of action the system may take, which severity thresholds trigger automated containment, which changes still require a human. That is a better use of scarce security talent and the only operating model that survives the volume. The concrete blueprint for this lane is in [AI-Assisted TVM](../playbooks/ai-assisted-tvm.md); this book is the principle, that playbook is the build.
 The guardrail is the whole game. Autonomous does not mean uncontrolled. The most defensible implementations keep the human at the policy boundary and delegate only execution — and they apply compensating controls (reversible, contained) far more readily than irreversible changes. Start the autonomy on the safest, highest-value action: cutting reachability on a confirmed-exploitable, internet-facing, kill-chain asset.
 ### The feedback loop — every exploited path becomes a shorter kill chain
 This is the climax, and it is the same machine as Book VI. A vulnerability that was exploited, or nearly exploited, is the cheapest penetration test you will ever get — honest, real-world data about exactly where a path to the crown jewels was open. Patching the CVE wastes that data. The antifragile move is to **sever the path**: the flat segment gets a boundary, the over-privileged service account gets collapsed, the reachable management interface gets pulled behind the bastion — so that the *next* vulnerability that lands on that path is a non-event before it is ever disclosed.
 Measure the loop, not just the lane. MTTR tells you how fast you patch; it does not tell you whether you are getting stronger. The antifragile metric is: **after each exploited-or-near vulnerability, did the kill chain get shorter?** If the last ten vulnerability incidents produced ten patches and zero severed paths, the loop is broken and you are merely fast. If they produced ten patches and six structurally shortened kill chains, the estate is getting harder to compromise every time it is tested — which is the only honest definition of antifragile.
 ---
 ## Honest uncertainty (verify the moving parts)
 Stable and Lindy (teach with confidence): CVSS is not a priority; kill-chain position is. Most criticals aren't reachable. A half-done remediation is a hidden full vulnerability. You cannot out-patch machine-speed exploitation; you can make most vulnerabilities not matter and contain the few that do. Every exploited path should shorten the kill chain. None of that churns — it is the architecture-beats-velocity thesis applied to vulnerabilities, and it will outlive every tool named here.
 What moves, and what you must verify:
 - **The headline statistics churn annually.** The "exploitation is #1, ~2× phishing" finding is the 2026 DBIR; the 4-hour and 43-day figures, the ~59,000-CVE projection, the autonomous-exploitation benchmarks — all of these are point-in-time and will move. The *direction* (exploitation rising, time-to-exploit collapsing, volume exploding) is the stable signal; the specific numbers need re-checking against the current year's DBIR, M-Trends, and FIRST/CVE data before you put them on a slide.
 - **The enrichment infrastructure is actively degrading.** NVD's backlog and the "Not Scheduled" reclassification mean the data you use to prioritise is itself unreliable and getting worse. Verify what enrichment you can actually trust *today*, and lean harder on your own reachability and exploitability signals precisely because the public ones are thinning.
 - **The autonomous-execution tooling is immature and fast-moving.** The Zero-Day-Agent-class pattern (autonomous detect → reachability assessment → compensating control) is real and operational but the products, their accuracy, and their guardrail models are evolving monthly. Verify current capability and, more importantly, current *failure modes* before you delegate any action — and start with reversible compensating controls, never irreversible change.
 - **The ~90%-not-exploitable figure is environment-specific.** It is a defensible industry estimate, not a law. The real number depends entirely on how well your compensating controls are actually mapped and enforced — and a mapped control that has rotted into a ghost is a false negative that will hurt you. Test the controls you are counting on, do not trust the map.
 - **Exploit-availability and threat-intelligence feeds** (CISA KEV, exploit databases, vendor advisories) are reliable in principle but vary in latency and coverage — verify which feeds are current and how fast they update before you wire them into the hours-lane.
 If a prioritisation decision hinges on a current specific, verify it and test it. "We confirmed this asset is internet-reachable and the EDR rule actually blocks the exploit" beats any CVSS score ever published.
 ---
 ## Consolidated judgement prompts
 - When a vulnerability on this estate is exploited, do we come back weaker, the same, or stronger? What's the mechanism that makes it stronger?
 - Are we sorting by CVSS, or by kill-chain position × reachability × exploit availability?
 - Of our "criticals," how many are actually reachable by an adversary right now? If we don't know, that is the first finding.
 - For our top exploitable findings: can we sever reachability faster than we can patch? If yes, why are we waiting for the patch?
 - Is anything in the "done" column a ghost patch — closed but never verified to enforce?
 - What is sitting in the fragile middle — the aging critical-patch backlog that is neither contained fast nor architected away?
 - How many human handoffs are in our hours-lane, and which of them require actual judgement versus just adding latency?
 - What's in the dark quantum — the unscanned, the unscannable, the unowned — and what are we doing to size it?
 - For the last ten vulnerability incidents: how many produced a severed path versus just a patch? Is the kill chain getting shorter?
 ---
 ## Where this book sits in the arc
 Books II–V harden the containers and contents; Book VI builds the loop that makes shocks pay. Book VII is what happens when the dominant shock stops being a phished human and becomes an exploited vulnerability arriving at machine speed. The answer is not a seventh thing bolted on — it is the same antifragile lens (subtract the false, protect the irreplaceable, contain the few that matter, feed every shock back into structure) applied to the surface the attacker now prefers. The vulnerability list was never the unit of work. The kill chain always was.
 Move fast and fix things.
 ---
 *Book VII of the Antifragile Handbook. Pairs with the [Quantum Vulnerability Management](../core/quantum-vulnerability-management.md) framework and the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md); the build-level companion is the [AI-Assisted TVM Blueprint](../playbooks/ai-assisted-tvm.md).*
@@ -68,7 +68,17 @@ For most estates the honest answer to "can you see where it went?" is no. That's
 The capstone, because it decides whether everything before it was merely robust or genuinely antifragile. Detection and recovery are not the sad afterthought — they're the feedback loop that changes the structure of the estate after every shock. An org that buries incidents stays fragile. An org that treats them as fuel becomes antifragile. This book covers the recovery lies the industry tells itself (untested backups, undocumented break-glass, AD forest recovery nobody has practised), builds the detection architecture, and — most importantly — describes the machine that turns incidents, alerts, and near-misses into structural improvement.
-Read this last. It only makes sense once you've built something worth protecting.
+Read this once you've built something worth protecting — it closes the original defensive arc (Books I–VI).
 ---
 ### [Book VII — Vulnerability Management](06-vulnerability-management.md)
 *The patch cycle was built for a world where you had weeks. That world is gone. Stop racing the attacker to the patch — change the race.*
 The first six books assume the dominant way into an estate is a phished human. As of the 2026 Verizon DBIR that assumption is wrong: **exploitation of vulnerabilities is now the leading initial-access vector, roughly twice phishing.** This book changes the lens to match. It refuses the two losing moves — sorting 40,000 findings by CVSS, and trying to "patch faster" against a 4-hour exploitation window — and replaces them with the antifragile alternative: subtract the ~90% of criticals that aren't actually reachable, size the rest into **quanta** by time-to-existential-impact (hours / days / sprint, plus the dangerous *dark* quantum you can't yet size), contain the few that matter with compensating controls rather than waiting for a patch, and feed every exploited path back into a shorter kill chain.
 It pairs with the [Quantum Vulnerability Management](../core/quantum-vulnerability-management.md) framework and the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md). Read it when the threat landscape — not the maturity model — forces the question.
 ---
@@ -221,11 +221,65 @@ ADCS is Tier 0. It sits on whatever server it runs on, and that server should ha
 ---
-### Privileged Access Workstations — scope the conversation honestly
+### Admin workstations — the cloud VM is the deployable PAW
-PAWs are right in principle. In 2026, the practical conversation with most mid-market clients is: **dedicated devices for Tier 0 administration** (Global Admins and Domain Admins use a separate machine for those tasks, even if that machine is just a hardened Windows device or a VM they launch for admin work).
+Physical PAWs are right in principle and almost never get deployed. Hardware procurement, second device, behaviour change — the project does not survive contact with a real IT budget. Do not open the conversation with "you need a dedicated PAW laptop." Open it with the cloud admin VM.
-The minimum viable version: a dedicated Intune-enrolled, Entra-joined device with no email, no browser for general use, and a Conditional Access policy that restricts Global Admin and Domain Admin-equivalent activity to that device only. Not perfect PAW architecture but a massive improvement over "I use my laptop for everything."
+**The cloud admin VM:** a Windows 365 or Azure Virtual Desktop instance provisioned from a hardened template. The admin connects from their normal device via browser or RDP. Privileged credentials — including WireGuard keys for the management overlay — live in the cloud VM, not on the admin's local device. Compromise response: wipe it, reprovision from template in under 20 minutes.
 **Provisioning the cloud admin VM:**
 1. Create a Windows 365 or AVD instance from a hardened base image (CIS L2 baseline or equivalent)
 2. Enrol in Intune, apply a configuration profile: no internet browsing, no personal email, no Microsoft Store apps, screen lock on idle, BitLocker enforced
 3. Scope a CA policy restricting Global Admin and privileged role activation to this device (device compliance + named Intune group)
 4. Install the Nebula client (if deploying T0 overlay) and distribute the pre-signed node certificate
 5. Install the Tailscale client (if deploying T1 overlay) and enrol with the Entra OIDC identity
 **Minimum viable without the overlay:** a dedicated Intune-enrolled, Entra-joined cloud VM with no email and no general browsing, and a CA policy restricting GA activation to it. Not perfect, but it will actually get deployed and maintained.
 ---
 ### Management overlay — Nebula for T0, Tailscale for T1
 **When a client needs this:** SME and mid-market clients with multi-cloud resources, DevOps workloads, or remote admins — and no physical data centre with a proper management VLAN. The overlay builds the management plane that the physical network cannot provide.
 **When a client does not need this:** organisations with their own data centres and physical network infrastructure already in place. Traditional management VLAN segmentation plus jump boxes is the right answer there. Adding an overlay creates a new Tier 0 component without proportional benefit.
 **The T0 overlay — Nebula:**
 Nebula has no coordinator in the runtime path. Once certificates are distributed, the overlay runs with zero external dependencies. This is the right property for T0: a compromised or unavailable external service cannot affect access to your domain controllers.
 Deployment steps:
 1. Provision the Nebula CA on a dedicated air-gapped machine (a dedicated laptop that is never networked, or a cheap PC kept in a drawer)
 2. Generate and sign node certificates for each T0 node (DCs, sync server, ADCS, cloud admin VMs/PAWs)
 3. Distribute the signed certificates and the CA certificate to each node
 4. Configure the Nebula ACL policy: cloud admin VMs can reach DCs on port 3389 (RDP) and 5985/5986 (WinRM); nothing else. DCs do not reach each other through Nebula (they have their own replication channel)
 5. Start the Nebula service on each node. Test connectivity from the cloud admin VM to a DC
 6. Document the CA signing ceremony: who can sign new certs, what approval is needed, where the CA key is stored, how to revoke (distribute updated blocklist to all nodes)
 **Realistic T0 node count:** 15–25 nodes for a 5,000-person organisation. Certificate management is a documented ceremony run a few times a year, not an ongoing operational burden.
 **The T1 overlay — Tailscale:**
 Tailscale with Entra OIDC + key expiry gives you device trust (WireGuard node key) plus per-session identity assertion (Entra MFA on re-authentication). Configure key expiry to force re-authentication on a schedule aligned with the session risk tolerance (8–24 hours for admin access).
 Deployment steps:
 1. Create a Tailscale account or deploy Headscale (for sovereign requirements)
 2. Configure the OIDC integration with Entra ID. Set the MFA requirement to phishing-resistant (FIDO2) in the Entra Conditional Access policy that governs Tailscale authentication
 3. Set key expiry: 8–24 hours for admin nodes, 24–72 hours for standard nodes
 4. Define ACL policy: cloud admin VMs reach T1 servers on management ports only; standard user devices do not appear in the T1 ACL
 5. Enrol cloud admin VMs as nodes. Enrol T1 servers (member servers, cloud management hosts, K8s API server endpoints)
 6. Test: attempt to reach a T1 server from a non-enrolled device. Expected: no route. From an enrolled cloud admin VM: connected
 **What Tailscale carries for multi-cloud:** kubectl access to K8s clusters, SSH/RDP to member servers and cloud VMs, cloud CLI access where the management API is behind a private endpoint. It does not carry M365 admin traffic — that goes direct to Microsoft over the internet, gated by Conditional Access.
 **The Nebula CA — the one critical operation:**
 The CA key is the trust anchor for the entire T0 overlay. Its compromise means an attacker can enrol their own node and grant it access to every DC. Treat it accordingly:
 - Air-gapped machine, never networked after initial setup
 - CA key encrypted at rest on the machine and backed up separately
 - Certificate lifetime: 180 days maximum, so non-renewal handles most revocation cases
 - Revocation: generate and distribute an updated `blocklist.pem` to all nodes if a PAW is lost or an admin departs before cert expiry
 - At least two named people who know the ceremony and can perform it
 ---
@@ -0,0 +1,133 @@
 # Quantum Vulnerability Management
 > *"You do not have 40,000 critical vulnerabilities. You have ~400 that are real, ~40 that are on fire, and a process that cannot tell them apart. Quantum vulnerability management is the discipline of sizing remediation to the time you actually have — and of admitting that the unit of work was never the vulnerability. It was the path."*
 This is the operating framework behind [Book VII — Vulnerability Management](../books/06-vulnerability-management.md). Book VII is the philosophy; this is the model a consultant runs in an engagement. It pairs with the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md) (which sizes the quanta) and the [AI-Assisted TVM Blueprint](../playbooks/ai-assisted-tvm.md) (which automates the hours-lane).
 ---
 ## The problem in one paragraph
 Time-to-exploit has collapsed to roughly **4 hours** while median remediation sits at **43 days**; CVE volume has gone past **59,000/year** and the public enrichment data (NVD) is degrading; and as of the **2026 Verizon DBIR, vulnerability exploitation is the #1 initial-access vector, roughly twice phishing.** A human-paced, CVSS-sorted patch programme cannot close a gap that runs the wrong way by two orders of magnitude. The answer is not "patch faster." It is to **stop using the vulnerability list as the unit of work**, size remediation into time-budgeted quanta, contain the few that matter in hours, make the rest not matter through architecture, and feed every exploited path back into a shorter kill chain.
 ---
 ## What a quantum is
 A **quantum** is the smallest unit of remediation that:
 1. **Fully closes a specific exploitable path** — not a CVE in the abstract, a path an adversary could actually walk.
 2. **Is sized to a time budget it can actually be completed within** — hours, days, or a sprint.
 3. **Ends in a verifiable signal** — a test that proves the path is closed, not a ticket marked done.
 The word is chosen deliberately:
 - **Atomic.** You cannot ship half a quantum and claim half the protection. A patch on 80% of the fleet, or a rule applied but never verified to block, is a *ghost patch* — fully exploitable and now invisible. A quantum is all-or-nothing.
 - **Discrete.** Work is packetised into units that fit the time available, not smeared across an infinite backlog. An undifferentiated backlog has no front; quanta give it one.
 ---
 ## The sort key: time-to-existential-impact
 Quanta are ordered not by severity but by **time-to-existential-impact**, a function of three things the *environment* determines — not the CVE:
 > **time-to-existential-impact = f( kill-chain position, reachability, exploit availability )**
 | Factor | Question | Where it comes from |
 |--------|----------|---------------------|
 | **Kill-chain position** | Does this sit on a path to existential compromise? | [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md), BloodHound, the diagnostic |
 | **Reachability** | Can the adversary actually get to it (internet-facing, one hop from T0, behind segmentation)? | Network topology, external scan, [Perimeter Scanning](../playbooks/perimeter-scanning-capability.md) |
 | **Exploit availability** | Is there a working exploit in the wild now? | CISA KEV, exploit databases, threat intel |
 The same CVE has a different quantum on different assets, because position, not severity, sets the clock. **A 9.8 on a segmented, unreachable, non-privileged host is a sprint quantum. A 7.5 on an internet-facing box one hop from a domain controller is an hours quantum.** This is the Book I principle — kill-chain position changes the priority, not the score — made operational.
 ---
 ## The four quanta
 | Quantum | Time budget | What's in it | The response | Lane character |
 |---------|-------------|--------------|--------------|----------------|
 | **Critical** | **Hours** | On the kill chain, reachable, exploit available now | **Compensating control, not the patch** — sever reachability, edge-block, isolate, disable feature. Patch follows later. | Must be partly **autonomous**; human at policy boundary |
 | **Severe** | **Days** | Material risk; reachable with friction, or partial compensating cover | Batched, completed and verified inside one short change window | Human-run, tightly scheduled |
 | **Standard** | **Sprint** | The long, real, non-urgent tail | Drained in sprint-sized batches that can actually be finished; this is where patch velocity is the right tool | Routine engineering rhythm |
 | **Dark** | **Unsized** | Can't see the asset, can't establish reachability, can't determine exploitability | **Route to discovery** — turn an uncharacterised risk into a sized quantum | Discovery, not remediation |
 ### Why "compensating control, not the patch" for the critical quantum
 You cannot meet an hours budget with a vendor patch cycle, and often the patch does not exist yet. So the critical quantum's job is **not to fix the vulnerability — it is to move the asset out of the hours-window** by the cheapest fast control available: cut the reachability, block at the edge, isolate the host, disable the vulnerable feature, pull it behind the WAF. A 4-hour time-to-impact becomes a non-urgent one, and the actual patch drops into the standard lane on the normal change calendar. Reachability is almost always faster to change than a patch is to ship — which makes **reachability the fastest remediation you own.**
 ### Why the dark quantum is the most dangerous
 The old model ignores the dark quantum because it has no score. That is exactly backwards: an uncharacterised risk on an unknown asset is how estates die. A *known* severe is safer than an *unknown* nothing, because you can plan around the known one. The antifragile move is to spend judgement converting dark quanta into sized ones — which is why discovery (the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md), [zero-budget discovery](../playbooks/zero-budget-vulnerability-discovery.md), osquery) is part of vulnerability management, not separate from it.
 ---
 ## The barbell: contain fast or architect away — never the fragile middle
 ```
  CHEAP / FAST / REVERSIBLE                                 SLOW / STRUCTURAL / DURABLE
  Hours-lane compensating controls                          Segmentation, least privilege,
  (edge block, isolate, cut reachability)                   T0 protection, assume-breach
  ── wins the time race the patch can't ──                  ── makes ~90% of vulns not matter ──
            ◄──────────────  THE FRAGILE MIDDLE TO AVOID  ──────────────►
            The aging "critical patch backlog": carries hours-lane urgency,
            moves at sprint-lane speed. Max anxiety, min protection,
            and the attacker clears it for you one exploited host at a time.
 ```
 Both ends of the barbell are convex (small cost, large payoff — Pillar 5). The fragile middle is concave (maximum cost, minimum return). The rule: **contain it fast, or architect it away. Never let it age in the middle.**
 ---
 ## The ~90% subtraction — via negativa applied to the list
 The single highest-leverage move, and it is pure subtraction. Industry data suggests **roughly 90% of "critical" vulnerabilities are not exploitable in a given environment** once compensating controls, reachability, and segmentation are mapped. So before adding any work:
 1. Map, per asset: internet reachability, EDR coverage, WAF rules, segmentation distance from T0.
 2. Delete the false urgency on everything segmented, unreachable, or already neutralised.
 3. What remains — the genuinely reachable, genuinely exploitable ~10% — is the only thing the hours- and days-lanes ever touch.
 This turns "40,000 criticals" into a few hundred real findings and a few dozen on fire. The compensating-control map that makes it possible is **the single most valuable artefact in the programme** — build it before the incident, because during a zero-day it answers "are we actually exposed?" in minutes instead of days. The caveat (Book I): a mapped control that has rotted into a ghost is a false negative. **Test the controls you are counting on; do not trust the map.**
 ---
 ## The feedback loop — the antifragile difference
 A vulnerability that was exploited or nearly exploited is the cheapest penetration test you will ever get. Patching the CVE wastes the data. The antifragile move is to **sever the path** the attacker used — boundary the flat segment, collapse the over-privileged service account, pull the reachable management interface behind the bastion — so the *next* vulnerability that lands there is a non-event before it is even disclosed.
 **The metric is not MTTR. It is: did the kill chain get shorter?** Ten incidents that produce ten patches and zero severed paths mean you are merely fast. Ten incidents that produce six structurally shortened kill chains mean the estate is getting harder to compromise every time it is tested — the only honest definition of antifragile.
 ---
 ## Running it in an engagement — the sequence
 1. **Discover** — run the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md) to map assets, reachability, and the shortest existential path. Anything you cannot characterise is a dark quantum; route it to deeper discovery.
 2. **Subtract** — apply the ~90% reduction using the compensating-control and reachability map. Delete false urgency.
 3. **Size** — place every remaining real finding into a quantum (critical / severe / standard) by time-to-existential-impact.
 4. **Contain the hours-lane** — apply compensating controls to the critical quantum *today*, autonomously where guardrails allow ([AI-Assisted TVM](../playbooks/ai-assisted-tvm.md)). Verify each closes with a signal.
 5. **Batch the rest** — days-lane in the next change window, sprint-lane in the engineering rhythm.
 6. **Architect away the middle** — feed the recurring paths into segmentation and least-privilege work (Books II–V) so the same class of vulnerability stops mattering.
 7. **Close the loop** — after every exploited-or-near finding, ask what path got shorter, and track that number over time.
 ---
 ## What to measure
 | Metric | Why it matters | Antifragile target |
 |--------|----------------|--------------------|
 | Critical-quantum containment time | The hours-lane is the race you must not lose | Hours, trending down |
 | % of "criticals" confirmed reachable | Proves the ~90% subtraction is real, not assumed | Known, not "unknown" |
 | Ghost-patch rate (closed-but-unverified) | Half-done remediation is hidden full exposure | Zero — every quantum closes with a signal |
 | Dark-quantum count | Uncharacterised risk is the dangerous kind | Shrinking; each one converted to sized |
 | **Kill-chain length after incidents** | The only measure of getting *stronger* | Shorter after each exploited-or-near event |
 | Items aging in the fragile middle | The concave zone the barbell forbids | Zero — contained or architected, never aging |
 ---
 ## Honest uncertainty
 The headline statistics (the 4-hour, 43-day, ~59,000-CVE, ~90%-not-exploitable, and "#1, ~2× phishing" figures) are point-in-time and churn annually — re-check them against the current DBIR, M-Trends, and FIRST/CVE data before putting them on a slide. The *direction* is the stable signal; the numbers move. The autonomous-execution tooling for the hours-lane is real but immature and fast-moving — verify current capability and failure modes, and start with reversible compensating controls, never irreversible change. What does not churn: kill-chain position beats CVSS, most criticals aren't reachable, a half-done remediation is a hidden full vulnerability, and every exploited path should shorten the chain.
 ---
 *See [Book VII — Vulnerability Management](../books/06-vulnerability-management.md) for the full philosophy, [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md) for sizing the quanta in unknown territory, and [AI-Assisted TVM Blueprint](../playbooks/ai-assisted-tvm.md) for automating the hours-lane.*
@@ -42,6 +42,7 @@ Operational and persuasion documents used in engagements. **Start every new clie
 | [Antifragile Manifest](core/antifragile-manifest.md) | Five pillars of antifragile enterprise | Executives, Architects, Consultants |
 | [AI Sovereignty Framework](core/ai-sovereignty-framework.md) | Strategic arguments and implementation for local AI | CISOs, CTOs, Security Architects |
 | [T0 Asset Framework](core/t0-asset-framework.md) | Tier 0 classification and protection for critical assets | Security Architects, Infrastructure Leads |
 | [Quantum Vulnerability Management](core/quantum-vulnerability-management.md) | Sizing remediation into time-budgeted quanta (hours/days/sprint/dark) for the exploitation-first era; companion to Book VII | CISOs, Vulnerability Management, Consultants |
 | [Spontaneous Order Principles](core/spontaneous-order-principles.md) | Philosophical foundation for the five pillars | Executives, Architects, Strategists |
 ## Playbooks
@@ -51,6 +52,7 @@ Operational and persuasion documents used in engagements. **Start every new clie
 | [Rapid Modernisation Plan](playbooks/rapid-modernisation-plan.md) | 30-60-90-180 day transformation roadmap | Program Managers, Consultants, CISOs |
 | [Endpoint Management Entry Vector](playbooks/endpoint-management-entry-vector.md) | Intune/device management as the ideal engagement entry point | M365 Consultants, Account Managers |
 | [AI-Assisted TVM Blueprint](playbooks/ai-assisted-tvm.md) | AI-powered vulnerability management for AI-powered adversaries | CTOs, CISOs, Vulnerability Management |
 | [Kill Chain Assessment App](playbooks/kill-chain-assessment-app.md) | Spec for the offline tool that maps unknown estates into an attack graph, computes the shortest existential path, and sizes quanta. Tool: [`tools/kill-chain-assessment.html`](tools/kill-chain-assessment.html) | Consultants, Assessors, Security Architects |
 | [Zero-Budget Vulnerability Discovery](playbooks/zero-budget-vulnerability-discovery.md) | Script-based and osquery-based server/container vuln discovery without Tenable/Qualys | Security Engineers, Consultants |
 | [Perimeter Scanning Capability](playbooks/perimeter-scanning-capability.md) | External attack surface strategy: build, partner, or hybrid | Security Architects, Consultants |
 | [Osquery: The Sovereign Discovery Platform](playbooks/osquery-custom-platform.md) | Build a custom vulnerability and asset inventory platform on osquery | Security Engineers, Consultants, CTOs |
@@ -0,0 +1,292 @@
 # Assignment: Conditional Access Architecture
 > *CA policies are enforcement points, not audit tools. A policy in report-only mode is a sensor. A policy in enabled mode is a wall. Know which you're building before you start.*
 This is a **scoped assignment package** — a complete, principled delivery guide for one specific client brief. It can be delivered standalone or immediately after [Assignment: Identity Baseline](assignment-identity-baseline.md). If identity baseline has not been completed, the prerequisites section below applies first.
 ---
 ## The Brief
 Client requests that fall within this scope:
 - *"Review our Conditional Access policies — we're not sure they're right"*
 - *"We need to enforce MFA properly, not just per-user MFA"*
 - *"Our auditor wants evidence of access controls"*
 - *"We got a new employee and nobody knows how access actually works"*
 - *"We bought E5 and want to use the CA features"*
 - *"We need compliant devices to be required for access"* (if Intune baseline is already deployed)
 This assignment does not require executive sponsorship. It requires one named IT lead with Global Administrator access, tolerance for a 72-hour report-only period per policy before enforcement, and awareness that policy changes affect all users.
 ---
 ## Scope Boundary
 **In scope:**
 - Audit of all existing CA policies (coverage, gaps, naming, exclusions, mode)
 - Design and documentation of a complete CA policy set
 - Staged deployment of the baseline policy set (identity-level controls)
 - Device compliance integration if Intune compliance policies are already active
 - Named locations configuration
 - Authentication strengths configuration (phishing-resistant MFA for admins)
 **Out of scope:**
 - Intune compliance policy configuration → [Assignment: Intune Security Baseline](assignment-intune-security-baseline.md)
 - Microsoft Defender for Cloud Apps session controls (app-enforced restrictions are in scope; MDCA-dependent session policies are not)
 - Privileged Identity Management configuration → privileged access engagement
 - Identity Baseline (MFA registration, legacy auth, admin account hygiene) → [Assignment: Identity Baseline](assignment-identity-baseline.md)
 **Dependency:** This assignment can configure device compliance as a CA signal, but only if Intune compliance policies are already active and returning compliance state for enrolled devices. If Intune is not deployed, the device-compliance policies in this assignment are designed in report-only mode and left for activation when Intune is ready. Do not activate device-compliance CA policies against an environment where device enrollment is incomplete — the result is a broad lockout.
 ---
 ## Before You Touch Anything
 **1. Break-glass confirmation.**
 Before touching any CA policy, confirm that two cloud-only break-glass Global Admin accounts exist and are excluded from all CA policies. If they do not exist, create them and configure sign-in alerts before proceeding. See [Assignment: Identity Baseline](assignment-identity-baseline.md) for the break-glass standard. This step is non-negotiable — a misconfigured CA policy with no break-glass is a full tenant lockout.
 **2. CAExporter baseline.**
 Export all existing CA policies using [CAExporter](https://github.com/merill/caexporter). Store the JSON export as the before-state. Every change is measurable against it. This is also the rollback reference.
 **3. Per-user MFA audit.**
 Run the per-user MFA state report (Entra admin center → Users → Per-user MFA). If per-user MFA is enabled for any accounts, document it. Per-user MFA and CA-enforced MFA operate on separate control planes and interact unpredictably: a user with per-user MFA *enforced* may bypass some CA policies. Resolution is part of Step 3 below.
 **4. Sign-in log baseline.**
 Export 30 days of sign-in logs. Note the distribution of authentication methods in use, client application types (modern vs. legacy), and any conditional access results (success, failure, report-only). This is the baseline against which policy impact is measured.
 ---
 ## Principles Applied
 **Automation over procedure.**
 A CA policy enforces MFA whether or not anyone remembers to ask for it. A checklist does not. Every identity control in this assignment is implemented as a CA policy — self-enforcing, continuous, requiring no human decision to operate after deployment.
 **Kill chain first.**
 The policy set in this assignment is sequenced by structural impact. Legacy auth block and universal MFA enforcement come first because they close the widest attack path. Device compliance, location controls, and session policies come after. If the engagement ends early, the first two policies are the ones that matter.
 **Explicit design, documented intent.**
 Every policy deployed in this assignment has a documented name, purpose, conditions, grant controls, exclusions, and the date it was set to enabled. A CA policy with no documented intent is a liability: nobody can safely modify it, nobody knows if it can be removed, and future administrators work around it rather than through it. The leave-behind package for this assignment is the policy design document — not just the JSON export.
 **Report-only before enforcement.**
 Every new policy goes to report-only mode for a minimum of 48–72 hours. Sign-in logs are reviewed during that window to confirm expected behavior before enforcement. This is not optional. The cost of a production lockout — even for 30 minutes — is higher than the cost of 72 hours' delay.
 ---
 ## Delivery Architecture
 ### Step 1 — Audit (no changes)
 Document the current state honestly. The finding is not a criticism of the IT team — it is the starting point.
 | Action | Output |
 |--------|--------|
 | CAExporter export | CA policy baseline JSON and human-readable summary |
 | Per-user MFA state export | Accounts with per-user MFA enforced vs. disabled vs. not configured |
 | Policy coverage matrix | Every policy: name, state (enabled/report-only/disabled), conditions, grant, exclusions, last modified, named owner |
 | Gap analysis | Conditions with no coverage; duplicate coverage; exclusion lists with individual accounts |
 | Sign-in log review | Authentication methods in use; legacy auth clients; CA policy results |
 | Named locations inventory | Trusted IPs and named locations configured, if any |
 Deliver the audit findings to the named client lead before writing any policies. The coverage matrix should be readable without technical background — each row is one policy, each column answers one question. Include a plain-language summary: "You have 14 policies. Three are disabled and appear forgotten. Two overlap in ways that may create gaps. Five have no named owner and no documented purpose. Legacy authentication is not blocked at the CA level."
 ---
 ### Step 2 — Design
 Before deploying anything, produce the complete policy set design on paper (or in a document). Every policy defined, every exclusion justified, every interaction between policies mapped. Review with the named client lead before deployment begins.
 The policy set is designed in three layers. Deploy them in order.
 **Layer 1 — Identity controls (no device dependency)**
 These work immediately, without Intune or any device management. Deploy first.
 **Layer 2 — Admin controls (elevated requirements for privileged roles)**
 Stricter controls applied specifically to accounts holding privileged roles. Deploy after Layer 1 is stable.
 **Layer 3 — Device and session controls (Intune dependency)**
 Require device compliance as a CA signal. Deploy only when Intune compliance policies are active and returning results. Design these policies now; activate them when the Intune assignment is complete.
 ---
 ### Step 3 — Deploy Layer 1 (staged)
 Each policy follows the same deployment sequence:
 1. Create policy in **report-only** mode
 2. Wait 48–72 hours; review sign-in logs for the policy's report-only results
 3. Identify any legitimate traffic that would be blocked; create exclusion groups or refine conditions
 4. Switch to **enabled**
 5. Monitor sign-in logs for 24 hours
 6. Only then move to the next policy
 Do not deploy multiple policies simultaneously. Each policy change has independent blast radius; sequential deployment makes causality clear when something breaks.
 **Legacy authentication block first.** This is the one control that cannot afford to be partially deployed. If legacy auth is blocked via CA but not via Entra authentication policies, a policy gap in CA can allow legacy auth through. Confirm after deployment that the sign-in log shows zero legacy auth sign-ins. Zero is the only acceptable result.
 **Per-user MFA resolution.** After CA-enforced MFA is active for all users, disable per-user MFA for all accounts except break-glass. Leaving both active creates a split control plane. The CA policy is the authoritative control; per-user MFA is the legacy mechanism. They should not coexist once CA is stable.
 ---
 ## The Baseline Policy Set
 This is the policy set to deploy on every engagement. Adapt scope and exclusions to the client's environment; do not adapt the design principles.
 **Naming convention:**
 `CA-[Audience]-[Condition or Trigger]-[Grant or Block]`
 Examples: `CA-AllUsers-LegacyAuth-Block`, `CA-Admins-AllApps-RequirePhishingResistantMFA`
 Consistent naming is not aesthetic preference — it is the difference between a policy set that can be maintained and one that accumulates technical debt.
 **Exclusion groups:**
 All exclusions use Entra ID security groups, never individual accounts (except break-glass, which is excluded by account). Group membership is reviewed as part of the leave-behind. A group named `CA-Exclusion-BreakGlass` is named and owned; an individual account exclusion is invisible in aggregate policy review.
 ---
 ### Layer 1 — Identity Controls
 | Policy | Conditions | Grant / Block | Notes |
 |--------|-----------|---------------|-------|
 | `CA-AllUsers-LegacyAuth-Block` | All users / All cloud apps / Legacy auth clients (Exchange ActiveSync + Other clients) | Block | Deploy first. Confirm zero legacy auth in sign-in logs post-enforce. |
 | `CA-AllUsers-AllApps-RequireMFA` | All users / All cloud apps / All platforms / Exclude break-glass group | Require MFA | Core enforcement. Deploy second. Resolve per-user MFA conflict after this is stable. |
 | `CA-GuestUsers-AllApps-RequireMFA` | Guest and external users / All cloud apps | Require MFA | Separate policy: guests often require different exclusion handling. |
 **E3 stops here for identity-layer controls.** Risk-based policies (sign-in risk, user risk) require Entra ID P2. If the client has P2 licensing, add:
 | Policy | Conditions | Grant / Block | Notes |
 |--------|-----------|---------------|-------|
 | `CA-AllUsers-HighUserRisk-RequirePasswordChange` | All users / High user risk | Require MFA + password change | P2 required. Requires Identity Protection enabled. |
 | `CA-AllUsers-MedHighSignInRisk-RequireMFA` | All users / Medium and High sign-in risk | Require MFA | P2 required. Step-up for risky sign-ins. |
 ---
 ### Layer 2 — Admin Controls
 | Policy | Conditions | Grant / Block | Notes |
 |--------|-----------|---------------|-------|
 | `CA-Admins-AllApps-RequirePhishingResistantMFA` | Directory roles (Global Admin, Privileged Role Admin, Security Admin, Exchange Admin, SharePoint Admin, User Admin, Conditional Access Admin, Application Admin) / All cloud apps | Require authentication strength: Phishing-resistant MFA | Phishing-resistant = FIDO2 security key, Windows Hello for Business, or certificate-based auth. Requires auth strength configured in Entra. Standard Authenticator push is not phishing-resistant. |
 | `CA-Admins-AllApps-RequireCompliantOrHybridDevice` | Same role scope / All cloud apps | Require compliant device OR hybrid Azure AD joined | Layer 3 control applied early to admins specifically. Activate this even before broad device compliance enforcement if Intune covers admin workstations. |
 **Why admins get a separate, stricter policy set:** Admin credentials are the highest-value target in the tenant. An attacker who can bypass MFA on an admin account owns the tenant. Standard Authenticator push MFA is bypassed by MFA fatigue attacks (request flooding until the user approves). Phishing-resistant MFA is not. The separation in the policy set makes it explicit that admin accounts have a different requirement — and makes it auditable.
 ---
 ### Layer 3 — Device Controls (activate when Intune is ready)
 Design these policies now. Activate them after [Assignment: Intune Security Baseline](assignment-intune-security-baseline.md) is complete and device compliance results are stable.
 | Policy | Conditions | Grant / Block | Notes |
 |--------|-----------|---------------|-------|
 | `CA-AllUsers-AllApps-RequireCompliantDevice` | All users / All cloud apps / All platforms | Require compliant device OR require MFA | Start with OR (compliant device OR MFA) — gives unmanaged-device users a path via MFA. Once enrollment is high enough, switch to AND or compliant-only. |
 | `CA-AllUsers-SensitiveApps-RequireCompliantDevice` | All users / Exchange Online + SharePoint Online / All platforms | Require compliant device | Strict. Apply to sensitive apps first before all apps. |
 | `CA-AllUsers-UnmanagedDevice-AppEnforcedRestrictions` | All users / Exchange Online + SharePoint Online / Any platform / Filter: not compliant, not hybrid-joined | Session: app-enforced restrictions (use limited web access) | Limits download and sync on unmanaged devices accessing mail and documents. Requires Exchange Online and SharePoint to be configured for app-enforced restrictions. E3-compatible. |
 The `CA-AllUsers-UnmanagedDevice-AppEnforcedRestrictions` policy is the most immediately valuable Layer 3 control for E3 clients without full Intune enrollment — it degrades access rather than blocks it, which is easier to deploy without user disruption.
 ---
 ### Named Locations (supporting the policy set)
 Configure named locations before deploying any location-based policies.
 | Location | Purpose |
 |----------|---------|
 | **Trusted corporate networks** | Office IP ranges. Used to relax MFA requirements on trusted networks if the client explicitly requests it. Default recommendation: do not relax MFA on any network — trusted location is less durable than device compliance. |
 | **High-risk countries** (optional) | Countries from which the client has no operations and no expected sign-ins. Can be used to block access or require MFA as a step-up. Use carefully: VPN exit nodes and mobile roaming will trigger this. Document the decision. |
 Named locations are often requested but rarely worth the operational overhead unless the client has a specific use case (blocking sign-ins from a defined list of countries, or relaxing physical office controls). Include in the design document; deploy only if the client has a clear requirement.
 ---
 ## Structural Resilience Checklist
 Controls that hold without ongoing human willingness after this engagement closes.
 - [ ] `CA-AllUsers-LegacyAuth-Block` is **enabled** — not report-only — and sign-in logs confirm zero legacy auth clients
 - [ ] `CA-AllUsers-AllApps-RequireMFA` is **enabled** and covers all users including guests (separate guest policy)
 - [ ] `CA-Admins-AllApps-RequirePhishingResistantMFA` is **enabled** and authentication strength is configured
 - [ ] Per-user MFA has been disabled for all accounts after CA-enforced MFA is stable (except break-glass)
 - [ ] All exclusions use named Entra ID groups — no individual account exclusions except break-glass
 - [ ] Every policy has a documented name, intent, owner, and date of last review
 - [ ] CAExporter export (before and after) stored in client documentation
 - [ ] Layer 3 policies exist in **report-only** mode, ready for activation when Intune is complete
 ---
 ## Kill Chain Contribution
 **What this assignment closes:**
 | Attack vector | Control deployed |
 |---------------|-----------------|
 | Password spray with no MFA prompt | `CA-AllUsers-AllApps-RequireMFA` |
 | MFA fatigue attack against admin accounts (push flooding) | `CA-Admins-AllApps-RequirePhishingResistantMFA` |
 | Legacy protocol abuse (SMTP AUTH, IMAP, Basic Auth REST) | `CA-AllUsers-LegacyAuth-Block` |
 | Credential stuffing from breached credential lists | MFA enforcement |
 | Guest account lateral movement through weakly controlled external access | `CA-GuestUsers-AllApps-RequireMFA` |
 | Unmanaged device access to sensitive apps (if Layer 3 activated) | `CA-AllUsers-UnmanagedDevice-AppEnforcedRestrictions` |
 **What this assignment does not close:**
 | Remaining gap | Addressed by |
 |---------------|-------------|
 | Adversary-in-the-middle / session token theft post-MFA | Device compliance in CA + Entra token protection (P2) |
 | Unmanaged device as unrestricted access vector | [Assignment: Intune Security Baseline](assignment-intune-security-baseline.md) + Layer 3 activation |
 | Standing admin privilege (long-lived sessions, no JIT) | Privileged access engagement (PIM) |
 | Sign-in risk and impossible travel detection | Entra ID P2 Layer 1 additions |
 | App permission abuse (OAuth consent phishing) | Service identity engagement |
 The residual gap the client is most likely to feel: a stolen session token (from phishing with AiTM proxy) bypasses MFA because it captures the token after MFA completes. This is the next-generation phishing technique. Mitigating it requires token binding to device compliance — a Layer 3 control — plus Entra token protection (P2 feature). Document this in the residual risk statement.
 ---
 ## Leave-Behind Package
 | Artifact | Description |
 |----------|-------------|
 | **CAExporter JSON (before)** | CA policy state at engagement start |
 | **CAExporter JSON (after)** | CA policy state at engagement close |
 | **Policy design document** | Every deployed policy: name, intent, conditions, grant/block, exclusion groups, owner, date enabled |
 | **Policy coverage matrix** | Human-readable: which users are covered by which policies, which apps, which platforms |
 | **Per-user MFA resolution record** | Confirmation that per-user MFA has been disabled post-CA deployment |
 | **Layer 3 design document** | Device compliance policies designed but not yet activated; activation prerequisites and checklist |
 | **Exclusion group inventory** | Every CA exclusion group: name, members, review cadence |
 | **Sign-in log confirmation** | Legacy auth: zero clients post-block. MFA: applied to >99% of sign-ins. |
 | **Named locations documentation** | Any configured named locations with business justification |
 | **Scope boundary log** | Every finding outside this scope, named and prioritized |
 | **Residual risk statement** | What this assignment did not close, specifically including AiTM/token theft risk |
 The Layer 3 design document is the explicit handoff to the Intune assignment. A CISO reading the leave-behind package can see exactly what was built, why, what it prevents, and what comes next — without needing to ask.
 ---
 ## Scope Boundary Signals
 | Signal | Points toward |
 |--------|--------------|
 | No Intune enrollment or compliance policies active | Intune Security Baseline assignment — activate Layer 3 after |
 | Global Admins have no phishing-resistant MFA method registered | Auth method enrollment drive; may need hardware key procurement |
 | Entra ID P2 not licensed; client has credential-stuffing exposure | Licensing recommendation: P2 for Identity Protection (cheaper than full E5) |
 | App registrations with broad Graph permissions visible in sign-in logs | Service identity engagement |
 | Service accounts authenticating with CA policies applied | Service account remediation — service accounts should use managed identities or workload identity federation, not user-like credential flows through CA |
 | Defender for Cloud Apps not licensed; session control requests needed | MDCA engagement for full session control |
 | Sign-in logs show access from unexpected geographies | Named location policy review; may warrant country block |
 | Audit log retention < 90 days | Detection baseline assignment |
 ---
 ## Buildable-On: What the Next Assignment Depends On
 The Intune Security Baseline assignment builds directly on the CA architecture deployed here. Specifically, it depends on:
 1. **`CA-AllUsers-AllApps-RequireCompliantDevice` exists in report-only mode.** The Intune assignment activates this policy as its final step — the point where device compliance becomes an access control, not just a reporting tool.
 2. **CA exclusion groups are using the right naming convention.** Device compliance policies deployed in Intune reference the same user groups used in CA. Consistent group naming prevents the Intune assignment from having to clean up CA policy exclusions mid-deployment.
 3. **Sign-in logs show MFA is enforced.** The Intune assignment cannot safely activate device-compliance CA policies if MFA enforcement is incomplete — an unmanaged device could otherwise use the compliance check as a bypass path.
 If all three conditions are true at handover, the Intune assignment can activate Layer 3 without revisiting the CA work. If any condition is false, the scope boundary log documents what needs to be resolved first.
 ---
 *For the identity foundation this builds on, see [Assignment: Identity Baseline](assignment-identity-baseline.md).*
 *For the device compliance integration that activates Layer 3, see [Assignment: Intune Security Baseline](assignment-intune-security-baseline.md).*
 *For the technical depth on privileged access architecture that informs admin CA requirements, see [Book III — Privileged Access](../books/02-privileged-access.md).*
@@ -0,0 +1,443 @@
 # Assignment: Collaboration and Data Security
 > *Data is liquid. It leaves where you put it — copied, shared, forwarded, synced, linked. The question is never "is it locked down" but "where can it flow, who can reshare it, and can you see and reverse the flow?"*
 This is a **scoped assignment package** and the fourth in the M365 security sequence. It addresses the data and collaboration layer: how corporate data moves, where it leaks, and what structural controls reduce the blast radius when it does. It can be delivered standalone, but the device and identity controls from the preceding assignments are assumed in the residual risk analysis.
 This assignment completes the **"Secure M365"** engagement when delivered after Identity Baseline, CA Architecture, and Intune Security Baseline.
 ---
 ## The Brief
 Client requests that fall within this scope:
 - *"Secure our M365 / harden our Exchange and SharePoint"*
 - *"We're worried about data leaking through email or shared links"*
 - *"We got a phishing email and want to prevent it"*
 - *"Our auditor wants to see DLP controls"*
 - *"We need email authentication — DMARC / DKIM / SPF"*
 - *"We need to know what's being shared externally"*
 - *"Set up sensitivity labels"*
 This assignment does not require executive sponsorship. It requires one named IT lead with Global Administrator and Exchange Administrator access, tolerance for discovering that external sharing is significantly wider than assumed, and willingness to remove sharing types that users may push back on.
 ---
 ## Scope Boundary
 **In scope:**
 - External sharing exposure mapping ("Anyone" links, external guests, external shares)
 - Removal of anonymous sharing and external auto-forwarding
 - Exchange Online Protection (EOP) hardening: anti-phishing, anti-malware, anti-spam
 - Email authentication: SPF verification, DKIM enablement, DMARC deployment
 - SharePoint and OneDrive tenant-level sharing governance
 - Guest access governance: expiration, review cadence
 - Sensitivity label taxonomy and deployment (foundation: 3–4 labels)
 - DLP baseline: 3–5 known high-value patterns for Exchange, SharePoint, OneDrive
 - Audit logging verification and configuration
 - App consent governance: restrict user consent, enable admin consent workflow
 **Out of scope:**
 - Comprehensive data classification programme → separate Purview engagement
 - Defender for Office 365 P1/P2 advanced configuration (Safe Links, Safe Attachments, Attack Simulation) → E5 or add-on engagement
 - Microsoft Defender for Cloud Apps session controls → MDCA engagement
 - Retention policies and data lifecycle governance → separate Purview engagement
 - On-premises Exchange decommissioning → separate hybrid engagement
 - Cross-tenant access configuration (B2B direct connect) → out of scope unless specifically requested
 - Entitlement management and full guest lifecycle (P2 feature) → out of scope for E3
 When the client asks for comprehensive DLP — covering all data types across all services — scope it as a separate engagement. A DLP programme that attempts to cover everything produces alert fatigue that degrades the protection for the things that actually matter.
 ---
 ## Before You Touch Anything
 **1. Crown jewels question.**
 Before configuring any control, ask the named client lead one question: *"Which three data sets, if leaked, would cause the most harm to the organisation — regulatory, competitive, or reputational?"*
 If they cannot answer, that inability is finding #1. You cannot apply protection asymmetrically until you know what the asymmetry is for. Sensitivity labels, DLP policies, and restricted-site configurations all depend on this answer. If the organisation genuinely cannot identify its crown jewels, document it and apply the default framework (financial data, HR data, and strategic/M&A communications) as a starting point.
 **2. Surface map.**
 Before making any changes, enumerate the actual external exposure. The findings are almost always worse than the client assumes — and the enumeration itself, shared with the client lead, is often the moment that creates willingness for the removal steps that follow.
 Run these reports before touching configuration:
 | Report | Tool / Location |
 |--------|----------------|
 | "Anyone" (anonymous) links | SharePoint admin center → Reports → Sharing → or Graph API |
 | External shares (authenticated guest links) | SharePoint admin center → Sharing report |
 | Guest users with last sign-in date | Entra ID → External Identities → All users (filter: Guest) |
 | External auto-forwarding rules | Exchange admin center → Mail flow → Rules; or PowerShell: `Get-TransportRule` filtered for external redirect |
 | User-consented OAuth app grants | Entra ID → Enterprise applications → filter: User consent |
 | SPF, DKIM, DMARC status | MXToolbox or PowerShell DNS lookup per domain |
 | Unified Audit Log status | Compliance portal → Audit → or `Get-AdminAuditLogConfig` |
 Deliver the surface map to the named client lead before proceeding to any removal steps. State the findings plainly: "You have 847 anonymous sharing links. Fourteen mailboxes have active external forwarding rules. You have 312 guest accounts, 189 of whom have not signed in within 90 days. DMARC is not configured. Your Unified Audit Log has not been enabled."
 These are facts, not accusations. The client lead needs to see the actual exposure before approving the removal steps.
 ---
 ## Principles Applied
 **Remove first, then govern.**
 The highest-impact actions in this assignment are removals: anonymous links, external auto-forwarding, over-permissioned OAuth grants. These are not governance gaps — they are open doors. No amount of sensitivity labelling or DLP configuration compensates for an anonymous sharing link that routes around every identity control built in the preceding three assignments. Subtraction comes first.
 **Name the crown jewels before you protect them.**
 Even-spreading protection across all data is the concave failure: enormous maintenance cost, false positive noise that trains users to click through warnings, and the real exfiltration lost in the background. Sensitivity labels and DLP policies are applied to the crown jewels and known high-value patterns — not to everything. Three well-targeted DLP policies that fire reliably are worth more than thirty policies that nobody trusts.
 **Visibility before governance.**
 The surface map is the most valuable deliverable in this assignment. An organisation that has never seen its "Anyone" link count, its guest list with last sign-in dates, or its auto-forward rule inventory cannot govern what it has. The surface map creates visibility; governance follows from it.
 **Protection must travel with the data.**
 A sensitivity label with encryption is the only control that survives data leaving the tenant. Container controls — SharePoint permissions, CA policies, device compliance — stop working the moment the file is downloaded and forwarded. For the crown jewels, the protection must be bound to the file itself. Everything else is a gate on the way out, not a lock on the data.
 ---
 ## Delivery Architecture
 ### Step 1 — Surface Map (no changes)
 *Described above in "Before You Touch Anything." Complete and deliver before proceeding.*
 The surface map has a second purpose beyond informing the work: it is the before-state that makes the leave-behind measurable. "You had 847 anonymous links; you now have 0" is a concrete, auditable risk-reduction statement.
 ---
 ### Step 2 — Remove the Dangerous Paths
 These actions have the highest impact per unit of effort in the entire assignment. They should be completed before any additive control is deployed.
 **Kill anonymous "Anyone" links.**
 Set the tenant-level sharing policy to prohibit new "Anyone" links:
 - SharePoint admin center → Policies → Sharing
 - External sharing: set to **New and existing guests** (requires authentication) — not "Anyone"
 - This stops new anonymous links from being created. It does not revoke existing links.
 Existing anonymous links must be revoked separately. Use the SharePoint Sharing Report or a Graph API query to enumerate them, then decide with the client lead: bulk revoke all, or review and selectively revoke. Bulk revoke is correct for any link created more than 90 days ago with no documented business justification. Document the decision and the revocation count.
 **Block external auto-forwarding.**
 External auto-forwarding rules are the most reliable mailbox-compromise exfiltration technique. They should not exist.
 - Exchange admin center → Mail flow → Remote domains → Default domain → Uncheck "Allow automatic forwarding"
 - Or via the outbound anti-spam policy: set automatic forwarding to **Off**
 - After disabling, audit existing rules: `Get-TransportRule | Where-Object { $_.RedirectMessageTo -like "*@*" }` and `Get-Mailbox -ResultSize Unlimited | Get-InboxRule | Where-Object { $_.ForwardTo -or $_.RedirectTo -like "*@*" }`
 Any active external forwarding rule found during the audit is a potential incident indicator. Treat each one as suspicious until confirmed legitimate by the mailbox owner and the named client lead. Document the outcome for each.
 **Restrict user OAuth consent.**
 Users should not be able to grant arbitrary third-party applications access to tenant data.
 - Entra ID → Enterprise applications → Consent and permissions → User consent settings
 - Set to: **Allow user consent for apps from verified publishers, for selected permissions (classified as low impact)** — or **Do not allow user consent** (more restrictive; requires admin approval workflow to compensate)
 - Enable the **Admin consent workflow**: users can submit a request; named admins receive and review it
 Review existing user-consented grants. Flag any app with permissions in these categories:
 - `Mail.Read`, `Mail.ReadWrite`, `Mail.Send` — reads or sends all mail
 - `Files.ReadWrite.All`, `Sites.Read.All` — accesses all files and sites
 - `User.Read.All`, `Directory.Read.All` — reads full directory
 High-permission user-consented grants should be reviewed with the named client lead and revoked where the app is not recognised, not actively used, or not from a verified publisher. Revoke through Entra ID → Enterprise applications → [App] → Permissions → Revoke user consent.
 ---
 ### Step 3 — Exchange Online Protection Baseline
 EOP is included in E3 and M365 Business Premium. It handles anti-phishing, anti-malware, and anti-spam for Exchange Online. Default EOP configuration is functional but not optimal.
 **Email authentication (SPF, DKIM, DMARC):**
 | Protocol | What it does | Configuration |
 |----------|-------------|---------------|
 | **SPF** | Declares which servers may send email as your domain | DNS TXT record — verify it exists and is not over-broad (`+all` invalidates it) |
 | **DKIM** | Cryptographically signs outbound email | Enable in Exchange admin center → Email authentication → DKIM → Enable for each domain. Key rotation is handled automatically. |
 | **DMARC** | Specifies how receiving servers handle SPF/DKIM failures | DNS TXT record. Deploy in stages: `p=none` (monitoring) → verify no legitimate mail fails → `p=quarantine` → eventually `p=reject`. Minimum target for this assignment: `p=quarantine` after 30-day monitoring period shows no legitimate mail failing. |
 Without DMARC, your domain can be spoofed in inbound email to your users and in outbound email to others. SPF and DKIM without DMARC do not enforce — DMARC is the enforcement record.
 **Anti-phishing policy (EOP):**
 - Exchange admin center → Policies & rules → Threat policies → Anti-phishing
 - Enable impersonation protection for: the organisation's own domain(s), key users (CEO, CFO, board members, finance team)
 - Enable mailbox intelligence (learning sender patterns)
 - Set action for impersonation detections: **Quarantine** (not move to Junk — quarantine is reviewed; Junk is ignored)
 If the client has Defender for Office 365 P1 (included in M365 Business Premium or as an add-on): enable Safe Links and Safe Attachments. These are materially more effective than EOP baseline anti-phishing. Note the gap if E3 without the add-on.
 **Anti-malware policy:**
 - Threat policies → Anti-malware
 - Enable common attachment filter: block executable file types (.exe, .vbs, .js, .ps1, .bat, .cmd and others)
 - Zero-hour auto purge (ZAP): ensure it is enabled — retroactively quarantines malware found after delivery
 - Admin notifications: notify security team on malware detection
 **Anti-spam policy:**
 - Threat policies → Anti-spam
 - Bulk complaint level threshold: set to 6 (aggressive; default is 7)
 - Enable outbound spam notifications: alert the security team when a mailbox is detected sending spam (indicator of compromise)
 - Verify SPF hard fail is evaluated
 ---
 ### Step 4 — Sharing Governance
 Sharing governance operates at multiple levels in M365. The tenant setting is the ceiling — per-site can be more restrictive but never more permissive than the tenant setting.
 **Tenant-level settings (SharePoint admin center → Policies → Sharing):**
 | Setting | Target value | Notes |
 |---------|-------------|-------|
 | External sharing — SharePoint | New and existing guests | Requires guest authentication. "Anyone" was removed in Step 2. |
 | External sharing — OneDrive | New and existing guests | Match SharePoint setting or more restrictive. |
 | Require guests to sign in using the same account | Yes | Prevents link forwarding to a different account. |
 | Allow guests to share items they don't own | No | Prevents reshare chain from escaping first-hop control. |
 | Guest access expiration | 30 days (or per organisation policy) | Guests must be reviewed and re-invited; standing access expires. |
 | Link permissions default | View | Least privilege; users explicitly upgrade if edit is needed. |
 | Link expiry (new and existing guest links) | 30 days | Prevents permanent link accumulation. |
 **Per-site controls — crown jewel sites:**
 For sites identified in the crown jewels question (Step 1 of "Before You Touch Anything"):
 - Set external sharing to **Only people in your organization**
 - Remove broad internal permissions ("Everyone except external users", "All company")
 - Document the named owners of the site and the access review schedule
 Internal oversharing is often overlooked: a finance site accessible to "All company" means any compromised internal account reaches the financial data. Restrict sensitive sites to named groups with specific membership.
 ---
 ### Step 5 — Guest Governance
 Guest accounts are standing external blast radius. Every guest that has not been reviewed is an unknown with access to unknown data.
 **Immediate actions:**
 1. **Export the guest list with last sign-in date.** In Entra ID → Users → filter by User type: Guest. Export to CSV. Sort by last sign-in date.
 2. **Flag for removal:** guests who have not signed in within 90 days and have no active project sponsorship. Present the list to the named client lead for approval before removing.
 3. **Remove approved stale guests.** Document the count.
 **Ongoing governance (configure before handover):**
 | Control | Configuration |
 |---------|--------------|
 | Guest invitation restrictions | Restrict to Entra ID admins only (not all users can invite guests) |
 | Guest access expiration | Configure in Entra ID → External Identities → External collaboration settings: Guest user access expires after 180 days unless reviewed |
 | Access reviews | Entra ID → Identity Governance → Access reviews — create a quarterly review for all guests. Reviewer: IT lead or line-of-business owner. Action on no response: remove access. |
 Access reviews require Entra ID P2 for full automation. For E3, a manual quarterly review using the Entra guest export is the alternative — document the cadence in the leave-behind and assign an owner.
 ---
 ### Step 6 — Sensitivity Labels Foundation
 Sensitivity labels are the mechanism that makes protection travel with the data. A labelled document carries its permissions wherever it goes — downloaded, emailed, shared externally.
 **Label taxonomy — baseline (4 labels):**
 | Label | Meaning | Default protection |
 |-------|---------|-------------------|
 | **Public** | Intended for external distribution | No restrictions |
 | **Internal** | Default for internal business content | No external sharing by default |
 | **Confidential** | Business-sensitive; restricted distribution | Encrypt; restrict to organisation members; no external forwarding |
 | **Highly Confidential** | Crown jewels: financial, legal, M&A, HR | Encrypt; restrict to named group; no download on unmanaged device; watermark |
 Keep the taxonomy to four labels. More labels increase classification fatigue and reduce the percentage of content that gets labelled at all. A four-label taxonomy that users understand and apply is worth more than a twelve-label taxonomy that nobody uses.
 **Deployment:**
 1. Create labels in Microsoft Purview compliance portal → Information protection → Labels
 2. Publish labels to all users via a label policy
 3. Configure auto-labelling for the Highly Confidential label: define content patterns (e.g., project name, internal designation) that trigger auto-labelling in SharePoint and OneDrive
 4. Set the default label for SharePoint sites identified as crown jewel sites: Confidential
 **For Highly Confidential — encryption configuration:**
 - Rights Management encryption: Only organisation members can open; no external forwarding; no printing
 - Apply to: the named crown-jewel sites and document libraries
 The label is the escape hatch. A Highly Confidential document downloaded to an unmanaged device and forwarded externally is still encrypted — the attacker has ciphertext, not data. This is the only control in this assignment that holds after data leaves the tenant.
 ---
 ### Step 7 — DLP Baseline
 DLP policies intercept known sensitive information patterns transiting Exchange, SharePoint, and OneDrive. Deploy DLP as a scalpel: 3–5 specific, high-confidence patterns. Do not attempt comprehensive coverage.
 **Target patterns for most organisations:**
 | Policy | Pattern | Initial action |
 |--------|---------|---------------|
 | Payment card data | Credit card numbers (PCI scope) | Policy tip to user + admin alert |
 | National identity numbers | National ID / tax number format for the client's jurisdiction | Policy tip to user |
 | Crown jewel content | Sensitivity label: Highly Confidential (label-based DLP) | Block external sharing + admin alert |
 | External forwarding with attachments | Email to external recipients with attachments > threshold | Notify user |
 Start every DLP policy in **simulation mode** (test/audit) before enforcement. Review DLP activity reports after 48 hours of simulation. Identify false positives. Tune the policy. Then enable with **notify only** before moving to **block**.
 The sequence: simulation → notify → block. Never skip the simulation and notify stages.
 **What E3 DLP covers:** Exchange Online, SharePoint Online, OneDrive for Business. It does not cover Teams messages (requires Purview add-on) or endpoint DLP (requires Purview or E5 compliance).
 Note the gaps in the residual risk statement: DLP at this scope does not cover Teams conversations or files shared through channels. If Teams is a primary working environment for crown-jewel content, document this as a gap pointing toward a Purview engagement.
 ---
 ### Step 8 — Audit Logging
 Audit logging is the foundation of any post-incident forensics capability. If it is not enabled, every breach investigation starts with nothing.
 **Unified Audit Log:**
 ```powershell
 # Verify status
 Get-AdminAuditLogConfig | Select-Object UnifiedAuditLogIngestionEnabled
 # Enable if false
 Set-AdminAuditLogConfig -UnifiedAuditLogIngestionEnabled $true
 ```
 E3 default retention: 90 days. Verify actual retention in the Compliance portal → Audit. If the client has regulatory requirements for longer retention (NIS2, DORA, banking regulations typically require 1 year minimum), document the gap. The E3 upgrade path is the Audit (Premium) add-on or E5 compliance.
 **Mailbox audit logging:**
 ```powershell
 Get-Mailbox -ResultSize Unlimited | 
  Where-Object {$_.AuditEnabled -eq $false} | 
  Set-Mailbox -AuditEnabled $true
 ```
 Verify that key mailbox audit operations are captured: MailboxLogin, SendAs, SendOnBehalf, HardDelete, FolderBind.
 **Critical audit events to verify are captured:**
 | Event category | Why it matters |
 |---------------|---------------|
 | File and page activities | Accessed, downloaded, shared — the data exfiltration footprint |
 | Sharing and access request activities | External shares created; guest invitations sent |
 | Synchronization activities | Files synced to devices (OneDrive sync client) |
 | Exchange admin activities | Transport rule creation/modification; external forwarding |
 | Azure AD sign-in events | Anomalous sign-ins, MFA failures, conditional access decisions |
 | DLP rule matches | Evidence that DLP policies are firing |
 ---
 ## Structural Resilience Checklist
 Controls that hold without ongoing human willingness after this engagement closes.
 - [ ] Anonymous sharing blocked at tenant level — confirmed by SharePoint sharing settings
 - [ ] Existing anonymous links revoked — count documented
 - [ ] External auto-forwarding blocked at tenant level — confirmed by transport rule and outbound spam policy
 - [ ] Active external forwarding rules reviewed and removed
 - [ ] DKIM enabled for all domains
 - [ ] DMARC deployed at minimum `p=quarantine` after monitoring period
 - [ ] User OAuth consent restricted — admin consent workflow active
 - [ ] High-permission user-consented OAuth grants reviewed
 - [ ] Guest expiration configured — new guests expire by default
 - [ ] Stale guests removed (90+ days inactive, no active sponsorship)
 - [ ] Guest access review cadence documented with named owner
 - [ ] Sensitivity labels published to all users — Highly Confidential label with encryption
 - [ ] DLP baseline policies active (post-simulation and notify stages) — not in simulation only
 - [ ] Unified Audit Log enabled
 - [ ] Mailbox audit logging enabled for all mailboxes
 ---
 ## Kill Chain Contribution
 **What this assignment closes:**
 | Attack vector | Control deployed |
 |---------------|-----------------|
 | Data exfiltration via anonymous link (bypasses all identity controls) | Anonymous link prohibition + existing link revocation |
 | Business email compromise via mailbox forwarding rule | External auto-forwarding block + rule audit |
 | OAuth consent phishing (malicious app requesting mail/file access) | User consent restriction + high-permission grant review |
 | Domain spoofing (impersonation of the client's domain in email) | DMARC `p=quarantine` |
 | Phishing email impersonating known users or domain | Anti-phishing impersonation protection |
 | Crown-jewel document leaking outside the tenant | Sensitivity label encryption (Highly Confidential) — protection travels with file |
 | Known sensitive data patterns transiting email or SharePoint | DLP baseline policies |
 | Stale guest accounts as standing external foothold | Guest expiration + stale guest removal |
 **What this assignment does not close:**
 | Remaining gap | Addressed by |
 |---------------|-------------|
 | Advanced phishing: Safe Links, Safe Attachments | Defender for Office 365 P1 (E5 or add-on) |
 | Teams message DLP | Purview compliance add-on |
 | Endpoint DLP (data leaving via USB, local app) | Purview E5 compliance or endpoint DLP engagement |
 | Full data lifecycle governance (retention, disposal) | Purview engagement |
 | MDCA session controls (block download from browser on unmanaged device) | MDCA engagement |
 | Full guest lifecycle management (access packages, entitlement) | Entra ID Governance (P2) engagement |
 | Residual data on unmanaged/BYOD devices | App Protection Policies (Intune assignment) |
 ---
 ## Leave-Behind Package
 | Artifact | Description |
 |----------|-------------|
 | **Surface map report** | Before-state: "Anyone" link count, external shares, guest list with last sign-in, forwarding rules found, OAuth grant inventory, SPF/DKIM/DMARC status |
 | **Anonymous link revocation record** | Links revoked: count, method, date |
 | **External forwarding rule audit** | Rules found, disposition of each (removed / confirmed legitimate / flagged as suspicious) |
 | **OAuth grant review record** | Grants reviewed, grants revoked, grants retained with justification |
 | **EOP policy documentation** | Anti-phishing, anti-malware, anti-spam settings with rationale |
 | **DMARC monitoring report** | DMARC aggregate reports at `p=none` before moving to `p=quarantine`; confirmation of quarantine deployment |
 | **Sharing governance configuration** | Tenant sharing settings, crown-jewel site configurations |
 | **Guest governance documentation** | Expiration settings, access review configuration, stale guest removal count, review cadence with named owner |
 | **Sensitivity label documentation** | Label taxonomy, label policy, encryption configuration for Highly Confidential |
 | **DLP policy documentation** | Each policy: target pattern, scope, actions, simulation results before enforcement |
 | **Audit logging confirmation** | Unified Audit Log status, retention period, mailbox audit status |
 | **Scope boundary log** | Every finding outside this scope, named and prioritized |
 | **Residual risk statement** | What this assignment did not close: Teams DLP gap, endpoint exfil path, advanced phishing gap, guest lifecycle limitations |
 ---
 ## Scope Boundary Signals
 | Signal | Points toward |
 |--------|--------------|
 | Significant Teams usage for crown-jewel content; Teams DLP not covered | Purview compliance engagement |
 | No independent M365 backup — Microsoft recycle bin only | Recovery and detection engagement (Book VI) |
 | Audit log retention < regulatory requirement | Audit (Premium) add-on; or compliance-driven M365 upgrade |
 | On-premises Exchange still in the estate | Hybrid Exchange engagement — decommissioning path |
 | Advanced phishing; no Defender for Office 365 P1 | E5 / MDO add-on evaluation |
 | High volume of user-consented high-permission OAuth apps | Entitlement management engagement |
 | Crown-jewel data accessible to broad internal groups | Information architecture engagement (governance, IA, Purview classification) |
 | No independent M365 backup | Recovery and detection engagement |
 | No incident response plan | IR planning engagement |
 ---
 ## Completing the "Secure M365" Engagement
 When all four assignments are delivered, the client has:
 **Identity Baseline** — MFA enforced for all users and phishing-resistant MFA for admins. Legacy authentication blocked at the tenant level. Break-glass accounts established and monitored. Admin accounts separated and audited.
 **CA Architecture** — A named, documented, principled CA policy set. Layer 1 (identity) and Layer 2 (admin elevation) enforced. Layer 3 (device compliance) activated following the Intune assignment. Per-user MFA conflict resolved.
 **Intune Security Baseline** — Device compliance policies returning results for the enrolled fleet. Compliant device required for M365 access (CA Layer 3 active). BitLocker, patch compliance, and LAPS deployed. Update rings with canary. App Protection Policies for BYOD. The real device population is mapped and documented.
 **Collaboration and Data Security** — Anonymous links removed. External auto-forwarding blocked. Email authentication at DMARC quarantine. External sharing governed. Stale guests removed. Sensitivity labels deployed with crown-jewel encryption. DLP baseline active for known high-value patterns. Audit logging enabled.
 **What this engagement does not close** — and what the CISO has in writing:
 - Session token theft (AiTM phishing) → Entra ID P2 + token protection
 - EDR and post-compromise detection → Defender for Endpoint P2 or Wazuh augmentation
 - Standing privilege → PIM / PAM engagement
 - Active Directory on-premises hardening → hybrid identity and AD hardening engagement
 - Full data governance → Purview engagement
 - Backup and recovery → recovery and detection engagement
 - Incident response capability → IR planning and detection baseline engagement
 The residual risk statement across all four packages is the honest description of what has been built and what remains. It is not a sales document — it is the record that the client's security posture was improved deliberately, with full awareness of what was and was not in scope.
 ---
 *For the identity foundation, see [Assignment: Identity Baseline](assignment-identity-baseline.md).*
 *For the CA architecture, see [Assignment: CA Architecture](assignment-ca-architecture.md).*
 *For the device security baseline, see [Assignment: Intune Security Baseline](assignment-intune-security-baseline.md).*
 *For the data and collaboration philosophy, see [Book V — Data & Collaboration](../books/04-data-and-collaboration.md).*
 *For the recovery and detection layer this engagement exposes as the next priority, see [Book VI — Recovery & Detection](../books/05-recovery-and-detection.md).*
@@ -0,0 +1,222 @@
 # Assignment: Identity Baseline
 > *Enforce what you already have. Every other M365 security control is downstream of this one.*
 This is a **scoped assignment package** — a complete, principled delivery guide for one specific client brief. It is designed to work with limited organizational engagement and to leave behind infrastructure that holds without anyone needing to want it.
 ---
 ## The Brief
 Client requests that fall within this scope:
 - *"Secure our M365 / our identities are a mess"*
 - *"We need MFA enforced — the auditor asked for it"*
 - *"We got phished and IT wants to prevent it happening again"*
 - *"Review our user accounts and admin accounts"*
 - *"Make sure only the right people have access"*
 This assignment does not require executive sponsorship. It requires one named IT lead with Global Administrator access and a tolerance for findings.
 ---
 ## Scope Boundary
 **In scope:**
 - Entra ID authentication configuration (MFA, legacy auth, auth methods)
 - Conditional Access policy review for existing policies (not full CA architecture)
 - Global Administrator and other privileged role audit
 - Break-glass account establishment
 - Entra ID Protection risk policy baseline
 - Authentication method registration and SSPR configuration
 - Service principal and app registration review (inventory and flag — not remediate)
 **Out of scope:**
 - Conditional Access policy design and architecture → [Assignment: CA Architecture](assignment-ca-architecture.md)
 - Device compliance and Intune → [Assignment: Intune Security Baseline](assignment-intune-security-baseline.md)
 - Privileged Access Management (PIM, PAM, PAW) → separate privileged access engagement
 - Active Directory on-premises → hybrid identity engagement
 - Application permissions remediation → separate service identity engagement
 When the client asks for something adjacent, log it in the scope boundary signals section at the end of the engagement. Do not absorb it silently and do not pitch the next engagement. The log is the record.
 ---
 ## Before You Touch Anything
 These three steps happen before any change, on day one.
 **1. Break-glass accounts.**
 If the tenant has no cloud-only break-glass accounts excluded from all CA policies, create two before proceeding. Document their credentials out of band (not in the same tenant). Alert on their sign-in. This is the safety net. Without it, a misconfigured CA policy can lock the entire tenant — including you.
 **2. CAExporter baseline.**
 Export the current CA policy state using [CAExporter](https://github.com/merill/caexporter). This JSON export is the before-state. Every change made during this engagement is measurable against it. It is also the rollback reference if something breaks.
 **3. Authentication sign-in log baseline.**
 Export 30 days of Entra sign-in logs, filtered for legacy authentication clients. This is the baseline for measuring the impact of legacy auth block and the evidence that the block is complete. Without it, you cannot demonstrate that legacy auth is actually gone — only that a policy exists.
 ---
 ## Principles Applied
 **Automation over procedure.**
 Every control in this assignment is a policy, not a document. MFA enforcement is a CA policy, not a user awareness campaign. Legacy auth block is an authentication policy or CA rule, not a helpdesk notification. A procedure only works when someone follows it. A policy works when no one is looking.
 **Kill chain first.**
 There are two controls in this assignment that matter more than all others: MFA enforcement on all users, and legacy auth block. Everything else — admin hygiene, SSPR configuration, risk policies — is valuable but secondary. If the engagement ends early, these two must be complete.
 **Visibility as accountability.**
 Every export, every report, every baseline produced during this engagement exists in the client's own tenant and documentation system permanently. A sign-in log showing zero legacy auth clients is evidence that outlasts the engagement. An admin account inventory with a date on it creates accountability that does not require anyone to actively manage it.
 **Scope discipline.**
 Anything discovered outside scope goes into the scope boundary log — not into the work plan. A consultant who silently fixes adjacent problems during a scoped engagement creates unscoped liability and destroys the client's ability to understand what was done. Log it, name it, leave it.
 ---
 ## Delivery Architecture
 Sequenced by impact, not by calendar. Each step depends on the one before it.
 ### Step 1 — Baseline (no changes)
 | Action | Output |
 |--------|--------|
 | CAExporter export | CA policy baseline JSON |
 | Break-glass accounts created and monitored | Break-glass documentation (out of band) |
 | Sign-in log export: legacy auth clients | Legacy auth client list |
 | Global Administrator audit: who holds it, cloud-only vs synced, standing vs eligible | Admin account inventory |
 | Service principal inventory: client secrets expiry, Graph permissions, admin consent | Service principal risk log |
 | Authentication method registration report | Who has MFA registered, by method |
 | SSPR configuration review | Current state documented |
 At the end of Step 1, share the admin account inventory and legacy auth client list with the named client lead. No recommendations yet. Just findings, plainly stated.
 ---
 ### Step 2 — Kill Chain (two controls)
 **Legacy authentication block.**
 Deploy via Entra authentication policies (tenant-wide, preferred) or CA policy (targeted by legacy auth client type). Stage it: report mode for 48 hours, confirm zero legitimate legacy auth clients in sign-in logs, then enforce. The 48-hour window exists because there are always surprises — a printer, a shared mailbox script, an MFA-unregistered VIP. Find them before enforcement, not after.
 **MFA enforcement.**
 If the client has no CA policies at all: deploy one CA policy requiring MFA for all users, all cloud apps, excluding break-glass accounts. If the client has existing CA policies: review coverage gaps and close them. Staged: exclude a pilot group of 10 users for 24 hours, confirm no breakage, then enforce broadly.
 These two controls are the assignment's kill chain contribution. Legacy auth block plus MFA enforcement closes the most common attack path in the Microsoft ecosystem. Both should be complete before Step 3 begins.
 ---
 ### Step 3 — Admin Hygiene
 **Global Administrator audit.**
 Every account with Global Administrator should be cloud-only (not synced from on-premises AD — a synced account can be compromised on-prem to take the cloud). Count standing Global Admins. The target is zero standing Global Admins beyond break-glass and emergency access. If PIM is not in scope, document the gap and log it. If the client has PIM licensing (P2), note it — it is the correct next step.
 **Admin account separation.**
 Admins should have a dedicated admin account separate from their daily-use account. If they do not, log it as a scope boundary signal for a privileged access engagement. If the client will accept one quick win: rename or create dedicated admin accounts for any standing Global Admins. This is a short task with meaningful blast-radius reduction.
 **Service principal review.**
 Flag any service principal with:
 - Client secrets expiring in under 30 days (operational risk, not security risk — but surfaces the gap)
 - Tenant-wide admin consent granted
 - Graph permissions: `RoleManagement.ReadWrite.Directory`, `AppRoleAssignment.ReadWrite.All`, `Application.ReadWrite.All`, `Directory.ReadWrite.All`
 Log all flags in the scope boundary signals. Do not remediate service principals in this assignment — it requires application owner coordination and deserves its own scoped engagement.
 ---
 ### Step 4 — Risk Baseline
 **Entra ID Protection.**
 If the tenant has P2 licensing (included in E5, available separately), deploy:
 - User risk policy: require password change at High risk (Conditional Access, not legacy user risk policy)
 - Sign-in risk policy: require MFA step-up at Medium or High risk
 If no P2: document the gap. Log the licensing delta for the leave-behind.
 **SSPR.**
 If SSPR is not enabled: enable it for all users with a minimum of two authentication methods required. Default to Microsoft Authenticator + email or phone. SSPR with strong auth methods removes helpdesk dependency for password resets and is a prerequisite for a healthy MFA rollout.
 ---
 ## Structural Resilience Checklist
 Controls that hold without ongoing human willingness after this engagement closes.
 - [ ] MFA enforcement CA policy active — not in report mode
 - [ ] Legacy authentication blocked at tenant level — not just reported
 - [ ] Break-glass accounts exist, are cloud-only, are excluded from CA, are monitored with alerts
 - [ ] Break-glass credentials documented out of band
 - [ ] Sign-in risk and user risk policies active (if P2 licensed)
 - [ ] CAExporter export stored in client documentation
 - [ ] SSPR active for all users
 These are the controls that keep working after the engagement ends. If any item is not checked at handover, document why and log the residual risk.
 ---
 ## Kill Chain Contribution
 **What this assignment closes:**
 | Attack vector | Control deployed |
 |---------------|-----------------|
 | Password spray against cloud accounts | MFA enforcement |
 | Credential stuffing using breached passwords | MFA enforcement + Entra ID Protection |
 | Legacy authentication protocol abuse (SMTP, IMAP, MAPI) | Legacy auth block |
 | Basic phishing for MFA bypass via legacy clients | Legacy auth block |
 | Attacker using compromised admin account persistently | Break-glass monitoring, admin hygiene |
 **What this assignment does not close:**
 | Remaining gap | Addressed by |
 |---------------|-------------|
 | Device-based attacks (unmanaged device as access vector) | [Assignment: Intune Security Baseline](assignment-intune-security-baseline.md) |
 | Adversary-in-the-middle / session token theft | Device compliance in CA + token protection |
 | Standing Global Administrator accounts | Privileged access engagement (PIM) |
 | Service principal over-permission | Service identity engagement |
 | Data exfiltration through sanctioned apps | Collaboration and data security assignment |
 | Persistence via application consent abuse | Service identity engagement |
 The kill chain contribution of this assignment is significant and real. The residual gaps are also real. Both belong in the leave-behind.
 ---
 ## Leave-Behind Package
 Every item below must be delivered at handover. The engagement is not complete until all items exist in the client's own documentation system.
 | Artifact | Description |
 |----------|-------------|
 | **CAExporter JSON (before)** | CA policy state at engagement start |
 | **CAExporter JSON (after)** | CA policy state at engagement close |
 | **Admin account inventory** | Every privileged role assignment: account name, role, cloud-only vs. synced, standing vs. eligible, last sign-in |
 | **Legacy auth sign-in confirmation** | Sign-in log export showing zero legacy auth clients post-block |
 | **MFA registration report** | Authentication method registration by user, at engagement close |
 | **Break-glass documentation** | Account names, monitoring alert confirmation, out-of-band credential storage reference |
 | **Service principal risk log** | Flagged principals with permissions and expiry dates |
 | **Scope boundary log** | Every finding outside this scope, named and prioritized |
 | **Residual risk statement** | Plain-language summary of what this assignment did not close and why |
 The residual risk statement is not optional. A client who receives a clean handover without a residual risk statement has been misled about their posture.
 ---
 ## Scope Boundary Signals
 Log these when you find them. Do not fix them. Do not pitch them. The log is the record.
 | Signal | Points toward |
 |--------|--------------|
 | No device compliance policies exist | Intune Security Baseline assignment |
 | CA policies exist but are poorly designed (overlapping, unnamed, undocumented) | CA Architecture assignment |
 | Global Admins have standing privilege with no PIM | Privileged access engagement |
 | Entra Connect / Cloud Sync server is domain-joined to production domain | Hybrid identity engagement — T0 isolation |
 | AD FS present | Hybrid identity engagement — Golden SAML risk, migration to PHS |
 | Service principals with tenant-wide admin consent | Service identity engagement |
 | No Defender for Office 365 baseline | Collaboration security assignment |
 | Audit logging not configured or retention < 90 days | Detection baseline assignment |
 ---
 *For the conditional access architecture built on top of this baseline, see [Assignment: CA Architecture](assignment-ca-architecture.md).*
 *For technical depth on hybrid identity and the sync server risk, see [Book II — Hybrid Identity](../books/01-hybrid-identity.md).*
 *For privileged access architecture, see [Book III — Privileged Access](../books/02-privileged-access.md).*
@@ -0,0 +1,384 @@
 # Assignment: Intune Security Baseline
 > *The device will be compromised. Compliant is not the same as secure, and the portal toggle is not the same as the device's behaviour. Build for the compromise, not against it.*
 This is a **scoped assignment package** — a complete, principled delivery guide for one specific client brief. It closes the device-layer gap and activates the CA Layer 3 policies designed in [Assignment: CA Architecture](assignment-ca-architecture.md). It can be delivered standalone, but its full structural value is realised when CA Layer 3 is activated at the end.
 ---
 ## The Brief
 Client requests that fall within this scope:
 - *"Deliver a security baseline for our Intune-managed endpoints"*
 - *"Set up Intune / we need device management"*
 - *"We need compliant devices to be required for M365 access"*
 - *"Our auditor wants evidence that devices are encrypted and patched"*
 - *"We have Intune but nobody set up the security policies"*
 - *"We're retiring SCCM and going cloud-native"* (if co-management migration is explicitly scoped)
 This assignment does not require executive sponsorship. It requires one named IT lead with Intune Administrator access, a tolerance for a grace-period before enforcement, and an understanding that the enrollment rate at the start is almost never what the CMDB says.
 ---
 ## Scope Boundary
 **In scope:**
 - Device population mapping (what is actually authenticating, vs. what is enrolled, vs. what the CMDB says)
 - Compliance policies: Windows, macOS, iOS, Android — as applicable to the fleet
 - Device configuration profiles: Windows security baseline settings
 - Windows Update rings (quality and feature updates)
 - Windows LAPS (local admin password management)
 - App Protection Policies for BYOD iOS and Android (MAM without MDM)
 - Enrollment review and gaps (not a new enrollment deployment unless scoped separately)
 - CA Layer 3 activation: connecting compliance state to Conditional Access
 **Out of scope:**
 - SCCM co-management migration → separate engagement (scope is complex and fleet-specific)
 - Autopilot setup and Autopilot-based provisioning → separate deployment engagement
 - EDR configuration: Defender for Endpoint advanced features, custom detection rules → separate or within E5 engagement
 - WDAC / Smart App Control / application allowlisting → advanced application control engagement
 - Driver and firmware update management → note as gap, recommend Windows Update for Business or third-party where Intune is insufficient
 - GPO conflict resolution for hybrid-joined estates → flag; recommend cloud-native migration path
 - Endpoint Privilege Management (JIT local admin elevation) → note as follow-on if standing local admin cannot be removed
 When the client asks about SCCM migration or Autopilot, scope it separately. Co-management is a legitimate transitional architecture but it adds complexity that deserves its own scoped engagement with its own completion criteria.
 ---
 ## Before You Touch Anything
 **1. Break-glass exclusion.**
 Confirm that break-glass accounts are excluded from all device-compliance CA policies. A flaky compliance signal must never lock out tenant recovery. If CA Layer 3 is not yet designed, this step ensures the door is open when it is deployed.
 **2. Four-population mapping.**
 The CMDB is a claim. Authentication logs are facts. Before configuring compliance policies, build the real device picture from four sources:
 | Population | Source |
 |-----------|--------|
 | **Enrolled (MDM)** | Intune device list |
 | **Registered (Entra)** | Entra ID → Devices → All devices |
 | **Authenticating** | Entra sign-in logs (30 days), filtered by device detail |
 | **CMDB** | Whatever the client has |
 Map the differences. Devices in sign-in logs but not in Intune are known-unmanaged — they reach data and you cannot apply compliance policies to them. Devices in the CMDB but not in sign-in logs may be retired equipment or offline devices that have never actually authenticated. The gap between enrolled and authenticating is the real finding, and it belongs in the leave-behind regardless of whether it is addressed in this engagement.
 **3. Existing Intune policy audit.**
 If Intune has been configured before — even partially — audit what exists before touching anything. Duplicate compliance policies, conflicting configuration profiles, and orphaned enrollment restrictions are common. A client who says "Intune is set up" often has one compliance policy created in 2021, three enrollment profiles nobody recognises, and a Windows security baseline applied to a group that no longer exists. Export the current state.
 **4. CA Layer 3 status.**
 Check whether `CA-AllUsers-AllApps-RequireCompliantDevice` exists in report-only mode from the CA Architecture assignment. If it does, this assignment ends by activating it. If it does not exist, design and deploy it in report-only mode as part of this assignment — but do not activate it until compliance coverage is proven.
 ---
 ## Principles Applied
 **Compliance is a signal, not a checkbox.**
 A device marked compliant in Intune carries a staleness window: compliance is evaluated on check-in cadence, not continuously. A device can fall out of compliance — lose encryption, miss patches, be rooted — and still hold a valid compliant token and access grant for hours. Design around this: the compliance requirement at CA is a meaningful control that raises the cost of attack, not a guarantee of device integrity. Document what it is and what it isn't.
 **Test on real devices, not portal configurations.**
 A Conditional Access policy can show a perfectly correct configuration in the portal and enforce nothing. The same applies to compliance policies: a policy assigned to a group can appear active and produce no compliance results for enrolled devices whose group membership has drifted. And MAM/App Protection enforcement has documented gaps between the toggle and the actual device behaviour — gaps that vary by platform, OS build, and companion app version. For every control that matters, confirm it with a real device producing the expected result. Write the expected result down before you test, not after.
 **Velocity with a brake.**
 Update rings exist not to slow patching but to make patching safe at speed. An unbraked push to the entire fleet is one bad update away from a mass outage — the kind that stops production, not the kind that stops attackers. A canary ring with a real halt-and-rollback capability is the mechanism that lets the rest of the fleet patch fast and safely. The canary must be tested — an untested canary is just the first domino with a friendly name.
 **The device is disposable; the data boundary is the protection.**
 Every design decision in this assignment should ask: if this device is wiped and reprovisioned in an hour, does anything important break? A device that can be reprovisioned in an hour is antifragile. A device whose compromise is a crisis is fragile, regardless of how many compliance policies are applied to it. Build for reprovisionability: Autopilot, LAPS, application deployment from Intune, user profile from OneDrive. The compliance baseline hardens the device; the reprovision capability makes its loss survivable.
 ---
 ## Delivery Architecture
 ### Step 1 — Population Mapping and Audit (no changes)
 | Action | Output |
 |--------|--------|
 | Four-population mapping (enrolled / registered / authenticating / CMDB) | Device population report: counts, deltas, known-unmanaged estimate |
 | Existing compliance policy audit | Policy inventory: assignments, settings, mode, last modified |
 | Existing configuration profile audit | Profile inventory: conflicts, orphaned assignments, platform coverage |
 | Update ring inventory | Current rings or absence of rings |
 | Sign-in log: device compliance state | What proportion of sign-ins carried a compliant device signal in the last 30 days |
 | LAPS status | Whether Windows LAPS is deployed or legacy LAPS or neither |
 Share the device population report with the named client lead before writing any policies. The finding is almost always the same: the managed fleet is smaller than assumed, the dark population is larger than assumed, and several CMDB entries have not authenticated in months. State it plainly.
 ---
 ### Step 2 — Compliance Policies (report mode first)
 Deploy all compliance policies in report mode. Review results for 72 hours before activating noncompliance actions. The goal at this step is to see the real compliance state of the fleet — not to block anyone.
 **Noncompliance action sequence (apply to all compliance policies):**
 | Day | Action |
 |-----|--------|
 | 0 | Mark noncompliant (reporting only — this is immediate and always on) |
 | 1 | Send email notification to user |
 | 7 | Block access (activates when `CA-AllUsers-AllApps-RequireCompliantDevice` is enabled) |
 | 30 | Retire device (for persistent noncompliance — confirm with client lead before activating) |
 The 7-day grace window is not leniency — it is the window in which IT can identify and remediate legitimate noncompliance (device in repair, device offline, missed check-in) before a user is blocked. Without it, the first enforcement wave produces a support ticket flood. With it, enforcement is gradual and explainable.
 **Windows compliance policy — baseline settings:**
 | Setting | Value | Rationale |
 |---------|-------|-----------|
 | BitLocker required | Yes | Unencrypted devices lose data on physical theft |
 | OS minimum version | Windows 10 22H2 / Windows 11 22H2 | Below this: no Windows LAPS; OS in extended support only |
 | Defender AV enabled | Yes | Baseline detection |
 | Defender real-time protection | Yes | |
 | Firewall enabled | Yes | |
 | Secure boot enabled | Yes | Blocks bootkit-level compromise |
 | TPM required | Yes (for new enrollments; consider exclusion group for legacy hardware) | PRT TPM-binding requires TPM |
 | Password required | Yes | Minimum complexity, minimum length 8 |
 | Maximum inactivity before screen lock | 15 minutes | |
 Do not configure the compliance policy to evaluate Microsoft Defender for Endpoint risk score unless Defender for Endpoint P2 (E5) is licensed. Misconfiguring this setting against an E3 tenant produces false noncompliance for all devices.
 **macOS compliance policy (if fleet includes Macs):**
 | Setting | Value |
 |---------|-------|
 | FileVault enabled | Yes |
 | OS minimum version | macOS 13 (Ventura) or later |
 | Password required | Yes |
 | Firewall enabled | Yes |
 | System Integrity Protection | Yes |
 **iOS compliance policy:**
 | Setting | Value |
 |---------|-------|
 | OS minimum version | iOS 16 or later |
 | Passcode required | Yes |
 | Jailbreak detection | Block jailbroken devices |
 | Device threat level | Secured (no threat level tolerance) |
 **Android compliance policy:**
 | Setting | Value |
 |---------|-------|
 | OS minimum version | Android 12 or later |
 | Device PIN required | Yes |
 | Rooted devices | Block |
 | Minimum security patch level | Within 90 days |
 **The honest note on jailbreak/root detection:** detection is an arms race. A motivated attacker with a current tool bypasses it. Treat root detection as a tripwire that raises the cost of the attack, never as a barrier that stops it. Document this in the residual risk statement.
 ---
 ### Step 3 — Device Configuration Baseline
 The Microsoft Windows Security Baseline (available in Intune → Endpoint security → Security baselines) is the starting point. It encodes Microsoft's recommended settings as an Intune profile that enforces continuously.
 **Deployment approach:**
 1. Deploy the Windows Security Baseline in **report mode** to a pilot group (10–20 devices, IT team first)
 2. Review conflicts and configuration gaps for 48 hours
 3. Resolve any conflicts with existing policies (overlapping profiles produce unpredictable results — Intune applies the stricter setting per-setting by default, but conflicting values create undefined behaviour)
 4. Expand to production groups
 5. Monitor Intune reports for policy conflicts and noncompliance
 **Additional configuration profiles (deploy after the security baseline is stable):**
 | Profile | Purpose | Notes |
 |---------|---------|-------|
 | **BitLocker configuration** | Enable BitLocker silently, escrow recovery keys to Entra | Separate from compliance (compliance requires BitLocker; this profile configures how it's applied) |
 | **Microsoft Defender AV** | Configure exclusions, scheduled scans, PUA protection | Do not configure AV exclusions broadly — each exclusion reduces coverage |
 | **Firewall configuration** | Block inbound connections, logging | Complements compliance requirement |
 | **Edge browser baseline** | SmartScreen, extension management, safe browsing, disable password manager sync | Applies to corporate Edge profile; test carefully — extension management can break legitimate workflows |
 | **Windows Hello for Business** | Phishing-resistant authentication at device layer | If deploying phishing-resistant MFA (required by CA-Admins policy), WHfB is the most practical path |
 ---
 ### Step 4 — Update Rings
 Update rings are the mechanism that makes patching fast and safe simultaneously. Deploy three rings minimum.
 **Ring structure:**
 | Ring | Assignment | Quality update deferral | Feature update deferral | Notes |
 |------|-----------|------------------------|------------------------|-------|
 | **Canary** | IT team (5–10 devices) | 0 days | 0 days | Takes every update immediately. Canary for production rings. Must include at least one machine that runs every critical business application. |
 | **Pilot** | 10–15% of fleet, varied roles | 7 days | 30 days | Broad business representation. If Canary is clear after 7 days, Pilot proceeds. |
 | **Production** | Remainder | 14 days | 90 days | Conservative deferral. If Pilot is clear after 7 days, Production proceeds. |
 **Pause and rollback configuration:**
 Configure Intune update rings with the pause capability enabled. Define in the client's runbook:
 - Who has authority to pause an update ring (named person, not a committee)
 - What the trigger is for pausing (Canary devices showing a known issue, not a vague "something might be wrong")
 - Maximum pause duration before the pause is reviewed (7 days)
 An untested pause capability is a fiction. Test it during the engagement: deploy an update to Canary, confirm it lands, pause the ring, confirm the pause holds, resume. This takes 30 minutes and is the only proof the mechanism works.
 ---
 ### Step 5 — Windows LAPS
 Standing local administrator accounts are the device-layer version of standing privilege. If the same local admin password is shared across the fleet (common in legacy environments), one compromised device yields lateral movement credentials for the entire estate.
 **Windows LAPS (cloud-native):**
 - Available on Windows 10 22H2+ and Windows 11 22H2+ with current patches
 - Configure backup target: Entra ID (cloud-native; no on-prem infrastructure required)
 - Rotation schedule: 30 days, plus rotate on device handoff
 - Requires Entra ID P1 (included in E3)
 **Deployment:**
 1. Enable LAPS in Entra ID (Entra admin center → Devices → Device settings → Enable Microsoft Entra Local Administrator Password Solution)
 2. Create an Intune LAPS policy (Endpoint security → Account protection → LAPS)
 3. Assign to a pilot group; confirm password backup to Entra after check-in
 4. Expand to production
 **For legacy LAPS (on-prem AD environments where Windows LAPS is not yet deployable):**
 Legacy LAPS (the original Microsoft LAPS MSI) remains deployable via Intune for hybrid-joined devices. Flag this as a transitional state — cloud-native Windows LAPS is the destination.
 **What this does not solve:** if standing Domain Admin or local admin is provided to specific IT staff outside of LAPS, that standing privilege is out of scope for this assignment. Log it in scope boundary signals.
 ---
 ### Step 6 — App Protection Policies (BYOD)
 App Protection Policies (MAM without MDM) manage the data layer on personal devices without enrolling the device. This is the correct model for BYOD: wall the corporate data, not the device.
 **The honest caveat, stated plainly:** App Protection Policy enforcement has gaps. The policy controls what managed apps should do; the actual enforcement is dependent on the app version, OS version, companion app (Company Portal on Android), and specific API support. "Block copy/paste to unmanaged apps" blocks in documented paths — it does not block screenshots, OS-level share sheet on some platforms, or every third-party clipboard manager. Test on real devices. Document what you verified and where the limits are.
 **Deploy separate policies per platform.** iOS and Android are not symmetric. A policy that works on iOS may not produce the same behaviour on Android. Test both independently.
 **iOS App Protection Policy — baseline settings:**
 | Setting | Value |
 |---------|-------|
 | Prevent "Save As" to personal storage | Block |
 | Restrict cut/copy/paste to managed apps only | Managed apps with paste in |
 | Require PIN for app access | Yes (after 5 minutes inactivity) |
 | Minimum OS version | iOS 16 |
 | Offline grace period before access blocked | 720 hours (30 days) |
 | Selective wipe after failed PIN attempts | Yes (after 10 attempts) |
 | Minimum app version | Latest − 1 (configure per app) |
 | Jailbroken/rooted devices | Block |
 Apply to: Outlook, Teams, Edge, OneDrive, SharePoint mobile. These are the apps through which corporate data flows on BYOD devices.
 **Android App Protection Policy — same baseline settings.** Test enforcement independently — behaviour on Android differs, particularly clipboard controls and "open in" restrictions.
 **Selective wipe verification:**
 Test selective wipe on a real BYOD device before the engagement closes. Confirm that corporate data (email, files, Teams content) is removed and personal data (photos, personal apps) is not. This is the capability that makes MAM politically viable — if the user doesn't trust that it won't touch their personal data, enrollment fails. Document the test.
 ---
 ### Step 7 — CA Layer 3 Activation
 This is the step that connects device compliance to access control. Everything before this point has been deploying and measuring; this step makes compliance matter for access.
 **Prerequisites before activating:**
 - [ ] Compliance policy deployed and returning results for ≥ 80% of the enrolled fleet
 - [ ] 72 hours of report-only compliance results reviewed — no widespread false noncompliance identified
 - [ ] Break-glass accounts confirmed excluded from device compliance CA policies
 - [ ] Named client lead has approved activation in writing
 - [ ] IT team briefed on noncompliance action timeline (users blocked after day 7 if noncompliant)
 - [ ] Helpdesk runbook written: what to do when a user is blocked due to noncompliance
 **Activation sequence:**
 1. Switch `CA-AllUsers-AllApps-RequireCompliantDevice` from report-only to **enabled**
 2. Monitor Intune compliance dashboard and Entra sign-in logs for 24 hours
 3. Confirm: compliant devices are signing in successfully; noncompliant devices are being blocked at CA
 4. Confirm: break-glass accounts are not blocked
 Do not activate device-compliance CA policies on a Monday or before a public holiday. An unexpected compliance failure during a period of low IT staffing is a bad outcome that a one-day wait entirely prevents.
 **After activation, the compliance signal is live.** A device that loses compliance — drops encryption, falls behind on patches, is rooted — will be blocked from M365 access within the 7-day noncompliance action window. This is the control working as designed.
 ---
 ## Structural Resilience Checklist
 Controls that hold without ongoing human willingness after this engagement closes.
 - [ ] Compliance policies deployed and returning results for enrolled devices
 - [ ] Noncompliance action timer active (day 7 block — not just report)
 - [ ] Windows Security Baseline profile active on production fleet
 - [ ] Update rings deployed with Canary, Pilot, and Production separation
 - [ ] Update ring pause tested at least once
 - [ ] Windows LAPS deployed; local admin passwords backing up to Entra
 - [ ] App Protection Policies active for iOS and Android BYOD (tested on real devices)
 - [ ] Selective wipe tested on BYOD device
 - [ ] `CA-AllUsers-AllApps-RequireCompliantDevice` **enabled** (not report-only)
 - [ ] Break-glass accounts excluded from device compliance CA policies — confirmed with a real sign-in
 ---
 ## Kill Chain Contribution
 **What this assignment closes (or significantly raises the cost of):**
 | Attack vector | Control deployed |
 |---------------|-----------------|
 | Stolen credentials used from unmanaged/unknown device | CA Layer 3: compliant device required |
 | Physical theft of unencrypted device | BitLocker compliance requirement |
 | Lateral movement via shared local admin credentials | Windows LAPS: unique per-device passwords |
 | Unpatched OS exploited at known CVE | Update rings: enforced patch cadence |
 | BYOD personal device accessing corporate data without controls | App Protection Policies: data container on unmanaged device |
 | Attacker persistence on device after credential reset | Compliance noncompliance action: device retired after persistent noncompliance |
 **What this assignment does not close:**
 | Remaining gap | Addressed by |
 |---------------|-------------|
 | Session token theft post-compliance check (AiTM phishing) | Entra token protection (P2) + continuous access evaluation |
 | Compromised but still-compliant device (stale signal window) | Defender for Endpoint device risk integration (E5) |
 | App-layer data exfiltration through sanctioned apps | Collaboration and data security assignment |
 | Advanced malware, post-exploitation on managed device | EDR: Defender for Endpoint P2 (E5) or Wazuh/Sysmon augmentation |
 | Standing privilege on servers accessed from managed devices | Privileged access engagement |
 | Dark access (legacy auth, long-lived tokens bypassing CA) | Legacy auth block (identity baseline) + token lifetime policies |
 The most important gap to document plainly: a managed, compliant device that carries a stolen session token (issued after legitimate MFA) still has access. The compliance signal does not re-evaluate session tokens retroactively. Continuous Access Evaluation (CAE) narrows this window for supported apps — verify which apps in the client's environment support CAE, and document the remainder as residual risk.
 ---
 ## Leave-Behind Package
 | Artifact | Description |
 |----------|-------------|
 | **Device population report** | Four-population map: enrolled, registered, authenticating, CMDB; delta analysis; known-unmanaged estimate |
 | **Compliance policy documentation** | Every policy: settings, assignments, noncompliance action timeline, rationale |
 | **Compliance dashboard export** | Compliance rates by policy and platform at engagement close |
 | **Configuration profile documentation** | Security baseline and supplemental profiles: settings, assignments, conflict analysis |
 | **Update ring documentation** | Ring structure, deferral schedule, pause/rollback procedure, pause test result |
 | **LAPS deployment confirmation** | Devices with LAPS active; Entra backup confirmed; rotation schedule |
 | **App Protection Policy documentation** | iOS and Android policies: settings, tested behaviours, documented gaps per platform |
 | **Selective wipe test record** | Device tested, result, personal data confirmed intact |
 | **CA Layer 3 activation confirmation** | Sign-in log showing compliant devices accessing successfully, noncompliant devices blocked |
 | **Scope boundary log** | Every finding outside this scope, named and prioritized |
 | **Residual risk statement** | What this assignment did not close: stale compliance signal, AiTM token theft, EDR gap, dark access |
 ---
 ## Scope Boundary Signals
 | Signal | Points toward |
 |--------|--------------|
 | Shadow IT apps visible in Intune application inventory | Collaboration and data security assignment; shadow AI discovery |
 | SCCM co-management active; GPO policies conflicting with Intune | Co-management migration engagement; AD hardening |
 | Hybrid-joined devices that depend on line-of-sight to DC | Cloud-native migration path; hybrid identity engagement |
 | No Defender for Endpoint P2; device risk signal not feeding CA | E5 licensing gap; E3 augmentation with Wazuh/Sysmon |
 | Standing local admin accounts for IT staff outside LAPS scope | Privileged access engagement (Endpoint Privilege Management) |
 | Autopilot not configured; device reprovision takes days not hours | Autopilot deployment engagement |
 | Legacy devices below Windows 10 22H2 in the compliance-excluded group | Accelerate OS refresh; document as known risk with timeline |
 | Audit log retention < 90 days | Detection baseline assignment |
 | MAM enforcement gaps found during BYOD testing | Document with vendor; consider MDM enrollment for corporate-issued mobile |
 ---
 ## Buildable-On: What the Next Assignment Depends On
 The Collaboration and Data Security assignment builds on the device posture deployed here. Specifically:
 1. **`CA-AllUsers-UnmanagedDevice-AppEnforcedRestrictions` behaviour** is now testable against the real unmanaged device population. With enrolled and unmanaged devices mapped, you know which users will be affected by app-enforced restrictions and can design the policy accurately.
 2. **The application inventory from Intune** surfaces the shadow IT picture that informs data security scope — what apps are running, what cloud storage is installed, whether consumer AI tools are present.
 3. **Managed device as a data exfiltration boundary** — with compliant devices required for access, the remaining data risk is through sanctioned apps on managed devices. That is the scope of the next assignment.
 ---
 *For the identity foundation, see [Assignment: Identity Baseline](assignment-identity-baseline.md).*
 *For the CA Layer 3 policies this assignment activates, see [Assignment: CA Architecture](assignment-ca-architecture.md).*
 *For the governing philosophy on device posture, see [Book IV — Devices & Endpoint](../books/03-devices-and-intune.md).*
@@ -0,0 +1,93 @@
 # Kill Chain Assessment App
 > *"We say it in every engagement: find the kill chain first. But how do you find it in territory you've never seen? You don't start with the chain — you start with the questions that surface the edges, and you let the graph tell you where the shortest path to the end of the company actually runs."*
 This document specifies the **Kill Chain Assessment app** — a single-file, offline browser tool a consultant runs during the diagnostic to turn an unknown estate into a mapped attack graph, compute the shortest existential path (the kill chain), and size every node on it into a remediation [quantum](../core/quantum-vulnerability-management.md).
 **The tool:** [`tools/kill-chain-assessment.html`](../tools/kill-chain-assessment.html) — open it in any browser. No install, no network, no data leaves the machine. State persists locally and exports to `.json` (to resume) and `.md` (to drop straight into the report or the [Findings Backlog](../assessment-templates/findings-backlog.md)).
 ---
 ## Why this needed to be built
 The handbook and the [Move Fast and Fix Things](../core/move-fast-and-fix-things.md) posture both rest on a single instruction: *fix the kill chain first.* The [assessment team guide](../assessment-templates/assessment-team-guide.md) tells you what to run (BloodHound, Purple Knight, Elysium, Entra checks); the [sample engagement](sample-engagement-mid-market.md) shows a finished kill chain drawn as an ASCII path. But between "run the tools" and "here is the finished chain" there is a synthesis step that has always lived only in the consultant's head: **taking a pile of findings about an unfamiliar estate and working out which sequence of them actually ends the company.**
 In unknown territory that synthesis is hard, inconsistent between consultants, and easy to get wrong — the obvious 9.8 grabs attention while the cheap two-hop path to the backups goes unseen. The app makes the synthesis explicit and repeatable: capture what you find as nodes and attacker moves, and let a shortest-path computation surface the chain you'd otherwise have to spot by eye. It is the missing instrument for the first and most important act of every engagement.
 ---
 ## The model
 ### Nodes
 A **node** is any asset, foothold, identity, or system. Each carries the attributes that determine its position in the chain:
 | Attribute | Meaning | Drives |
 |-----------|---------|--------|
 | **Layer** | entry / identity / privilege / device / data / infra-OT / recovery | Orientation, report grouping |
 | **Tier** | T0 / T1 / T2 ([T0 Asset Framework](../core/t0-asset-framework.md)) | Blast-radius weighting |
 | **Entry point** | Internet-reachable or unauth foothold | Source of the chain |
 | **Crown jewel** | Existential — the org cannot operate without it | End of the chain |
 | **Reachable?** | Can the adversary actually get to it (yes/no/**unknown**) | Quantum sizing |
 | **Exploit available?** | Working path/exploit in the wild (yes/no/**unknown**) | Quantum sizing |
 | **Compensating control** | EDR / WAF / segmentation already in front | Quantum sizing (the ~90% subtraction) |
 The "unknown" values are first-class, not placeholders: a node you cannot characterise is a **dark quantum**, and capturing it honestly is the point.
 ### Moves (edges)
 A **move** is one directed attacker step — "from here, an attacker can reach there" — with a *mechanism* (how: DCSync, NTLM relay, password spray, reused credential, OAuth consent) and an *effort* weight from 1 (trivial) to 5 (very hard). Effort is the consultant's judgement of how hard that single hop is for the adversary.
 ### The computation
 The app runs a **multi-source Dijkstra** from every entry point across the move graph, and finds the **lowest-total-effort path to any crown jewel.** That path *is* the kill chain — the cheapest route from foothold to existential impact. The tool then classifies every node:
 - **P0** — on the shortest chain. Break any one link and the existential path is severed.
 - **P1** — on *some* path from an entry to a jewel (reachable-from-entry ∧ can-reach-a-jewel), but not on the cheapest one.
 - **P2 / off-chain** — not on any path to a crown jewel. Real, but not existential — housekeeping, not kill chain.
 This is the [Move Fast](../core/move-fast-and-fix-things.md) doctrine made computable: *kill-chain position sets priority, not CVSS.*
 ### Quantum sizing
 Each node on a chain is sized into a [quantum](../core/quantum-vulnerability-management.md) by the same logic the framework defines:
 | Quantum | Condition | Budget / action |
 |---------|-----------|-----------------|
 | **Critical** | On shortest chain, reachable **yes**, exploit **yes**, not compensated | **Hours** — sever reachability / compensating control now |
 | **Severe** | On a chain, reachable **or** exploit = yes | **Days** — one change window, verify enforcement |
 | **Standard** | On a chain, neither reachable nor exploitable yet | **Sprint** — batch; patch velocity fits here |
 | **Dark** | On a chain but reachability **or** exploit = unknown | **Unsized** — route to discovery; characterise first |
 ---
 ## How to run it in an engagement
 1. **Open the tool** and clear the sample (or keep it as a worked reference). Switch to the **Discovery** tab — it lists, per layer, the questions and commands that surface edges (external scan for entries, the Connect sync account for the cloud↔on-prem bridge, BloodHound `shortestPath` for privilege, "what stops the business operating?" for jewels, flat-network checks for blast radius). This is the unknown-territory protocol.
 2. **Capture as you go.** Every finding from the [assessment team guide](../assessment-templates/assessment-team-guide.md) becomes a node; every "an attacker could move from X to Y" becomes a move. Mark entries and jewels. Leave reachability/exploit as *unknown* when you genuinely don't know — that flags the dark quanta to chase.
 3. **Read the chain.** The centre panel draws the attack graph and highlights the shortest existential path in red. The right panel sizes the quanta. If no path is found, either the estate is genuinely segmented there (note it as a win) or you haven't mapped the connecting moves yet — in unknown territory, assume the latter until proven.
 4. **Export.** `Export report .md` produces a kill-chain section, quantum-bucketed remediation, and a priority table ready to paste into the diagnostic deliverable. `Save .json` lets you resume or hand off.
 5. **Close the loop.** After remediation, reload the `.json` and ask the antifragile question the framework demands: *did the chain get shorter?* A severed link or a collapsed privilege should visibly lengthen the shortest path or remove it entirely.
 ---
 ## What it is and is not
 It is a **synthesis and prioritisation instrument** — it makes the consultant's kill-chain judgement explicit, repeatable, and exportable, and it removes the human error of eyeballing the cheapest path. It is deliberately **offline and dependency-free** (Pillar 4, Sovereign Intelligence: the attack graph of a client estate must never leave the consultant's machine for a vendor cloud).
 It is **not** a scanner and not an autonomous agent. It does not discover assets for you — it structures what you discover. The discovery still comes from the tools in the [assessment team guide](../assessment-templates/assessment-team-guide.md) and the [zero-budget discovery](zero-budget-vulnerability-discovery.md) playbooks; the autonomous hours-lane execution lives in [AI-Assisted TVM](ai-assisted-tvm.md). This tool is the bridge between them: it turns raw discovery into a sized, prioritised chain that the rest of the programme acts on.
 ---
 ## Roadmap (build-later)
 The current tool is a self-contained synthesis instrument. Natural extensions, in priority order:
 1. **Import from BloodHound / Purple Knight** — ingest exported attack paths directly as nodes and moves, rather than hand-entry.
 2. **PULSAR / ASTRAL signal overlay** — pull live reachability and config-drift signal so "reachable?" is answered by observation, not assertion (Book I: validate by observation).
 3. **Chain-shortening tracker** — store successive `.json` snapshots and chart kill-chain length over time, making the antifragile feedback loop a number on a dashboard.
 4. **Multi-chain view** — surface the top-N existential paths, not just the cheapest, so secondary chains (the [sample engagement](sample-engagement-mid-market.md) on-prem path) aren't hidden behind the primary.
 ---
 *Specified for [Book VII — Vulnerability Management](../books/06-vulnerability-management.md) and the [Quantum Vulnerability Management](../core/quantum-vulnerability-management.md) framework. The tool: [`tools/kill-chain-assessment.html`](../tools/kill-chain-assessment.html).*
@@ -0,0 +1,251 @@
 # ORION — Technical Proposition
 > *"The kill chain exists before you have access to a single system. It's already drawn — in the org chart, the procurement history, the sector's threat landscape, and the things people will tell you in a room if you ask the right questions. ORION is the instrument for reading that chain on day zero, before a single tool has touched the estate."*
 **Codename:** ORION (the Hunter — it hunts the kill chain). Celestial, consistent with ASTRAL / PULSAR / AURORA. Rename freely.
 **Status:** Technical proposition — pre-build. This document exists to be argued with before any code is written.
 **One line:** ORION is the pre-engagement intake, interview, and threat-intelligence layer that produces the input the [Kill Chain Assessment app](kill-chain-assessment-app.md) (L1) consumes — turning structured human answers and public intelligence into a *hypothesised* attack graph, without ever touching client infrastructure.
 ---
 ## 1. Why this needs to exist
 The L1 [Kill Chain Assessment app](kill-chain-assessment-app.md) is a synthesis instrument: you feed it nodes and attacker moves you've already discovered, and it computes the shortest existential path and sizes the [quanta](../core/quantum-vulnerability-management.md). It assumes you already have findings — BloodHound paths, Entra checks, the [assessment team guide](../assessment-templates/assessment-team-guide.md) output.
 But on **day zero of a new engagement** you have none of that. You may not even have access yet — the contract may not permit infrastructure contact, the change-advisory board hasn't met, the client's legal team is still reviewing the scope. And yet this is exactly the moment the consultant most needs a hypothesis: *where is this company's kill chain likely to run, what should we ask, and what should we look at first when access arrives?*
 Today that reasoning lives entirely in the experienced consultant's head. It is the single least reproducible, least scalable part of the practice — a senior consultant walks in, asks fifteen sharp questions, and forms a mental model of the likely kill chain; a junior consultant asks the obvious questions and misses it. ORION makes that reasoning **explicit, structured, intel-informed, and repeatable** — and it does so in the window before fieldwork is even possible.
 ORION is, deliberately, the "What If" tool of the assessment world (Book I). It produces a *declared* picture — what the client says, what public intel suggests — which is precisely the picture the rest of the engagement exists to validate by observation. Naming that honestly is the whole design (see §7).
 ---
 ## 2. The hard boundary: ORION never touches client infrastructure
 This is the defining constraint and the primary selling point, not a limitation to apologise for.
 ORION works from exactly two input classes:
 1. **What humans tell it** — structured intake and questionnaire responses from the client.
 2. **Passive public intelligence** — sector threat landscape, CISA KEV, vendor advisories, exploited-CVE feeds, public OSINT about the named technology stack. **Passive only**: ORION reads public and threat-intelligence sources. It does *not* perform active external scanning — that is a separate, consented capability (see [Perimeter Scanning Capability](perimeter-scanning-capability.md)) and explicitly out of ORION's scope.
 What this buys:
 - **Zero onboarding friction.** No credentials, no agent, no firewall change, no data-processing agreement for telemetry. ORION can run during the sales conversation, in the pre-contract phase, or in a sector where the client cannot yet grant access.
 - **No incident risk.** A tool that touches nothing breaks nothing and triggers no alerts. It can never be the cause of an outage or a "who ran that scan?" conversation.
 - **Clean legal posture.** The only client data ORION holds is what the client deliberately typed into a questionnaire. That is a categorically simpler privacy and liability position than any tool that ingests infrastructure data.
 The boundary is also the honest limit: because ORION observes nothing, everything it produces is a hypothesis (§7).
 ---
 ## 3. The three-stage workflow
 ### Stage 1 — Intake (minutes)
 A short structured form establishes the engagement's shape. The consultant fills this, usually from the first call:
 - Sector and sub-sector (drives the threat-landscape lookup and the regulatory profile)
 - Size, geography, and regulatory exposure (NIS2 / DORA / GDPR / sector-specific)
 - Technology footprint at a coarse level: M365 (E3/E5/BP), hybrid AD vs cloud-only, major cloud, OT/ICS presence, internet-facing services they'll admit to
 - Business-level crown jewels: "what stops the company operating?" — ERP, payment rails, OT control, the customer database
 - Known history: prior incidents, prior pentest, known pain points
 ### Stage 2 — Generate the tailored questionnaire (the core trick)
 ORION's LLM expands the intake into a **detailed, role-targeted, adaptive questionnaire**, and this is where it earns its keep. The questionnaire is:
 - **Role-segmented** — separate tracks for the identity/AD admin, the M365 admin, the network/OT lead, and the business owner. Each person answers only what they'd know.
 - **Adaptive** — questions branch on prior answers. Hybrid AD declared → the Entra Connect sync-account and DCSync questions appear. OT declared → Purdue-model and remote-vendor-access questions appear. Cloud-only → the questionnaire skips on-prem forest-recovery questions entirely.
 - **Framed against the kill chain, not compliance** — every question maps to a candidate node or edge ("Do any standing Domain Admins log into normal workstations for email?" targets a known privilege-path edge), not to a control checkbox. This is the inversion the whole practice rests on.
 The client fills it via a shared per-engagement link, partially and over time, with their own people answering their own sections.
 ### Stage 3 — Synthesis → hypothesised kill chain → L1 export
 From the responses plus the threat intel, ORION proposes:
 - **Candidate entry points** (internet-facing services, legacy auth, the contractor-access pattern), each with the intel that suggests it.
 - **Candidate crown jewels** (from the business answers).
 - **Hypothesised moves** between them, each with a *mechanism*, a *confidence*, and a *rationale citing its source* ("hybrid AD + unrotated KRBTGT declared → likely Entra-Connect→on-prem DCSync edge").
 - **A prioritised "look here first" list** for when fieldwork begins — what to point BloodHound, the Entra review, and the L1 app at on day one.
 The synthesis exports directly to the **L1 Kill Chain Assessment app's `.json` schema**, so the consultant opens L1 with the hypothesised graph already drawn and spends fieldwork *validating and correcting* it rather than building from a blank canvas. ORION hypothesises; L1 plus fieldwork confirm or kill each hypothesis by observation.
 ---
 ## 4. Threat-intelligence layer
 ORION continuously contextualises the client against the *current* threat environment — the dimension a static questionnaire can't capture and the one that feeds the [quantum](../core/quantum-vulnerability-management.md) sort key's "exploit availability" axis:
 - **CISA KEV and exploited-CVE feeds** — for the client's named technologies, what is being exploited *now*.
 - **Vendor advisories** — current critical advisories for their declared stack (the VPN appliance, the mail gateway, the ERP).
 - **Sector threat landscape** — which actors and ransomware groups are currently targeting their vertical, drawn from public reporting.
 Each intel item carries **provenance** (source, date, URL) because ORION's output is advisory and the consultant must be able to trace and re-verify every claim. Threat intel ages fast; ORION timestamps everything and treats stale intel as a prompt to re-check, never as fact.
 ---
 ## 5. Architecture
 Deliberately mirrors CISO Assistant and the AURORA model so it's familiar to operate and fits the suite.
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │  ORION (Docker Compose, consultant self-hosted)              │
 │                                                              │
 │  ┌────────────┐   ┌──────────────┐   ┌───────────────────┐  │
 │  │  Web UI    │   │  API backend │   │  PostgreSQL       │  │
 │  │ (SvelteKit │◄─►│ (FastAPI or  │◄─►│  engagements,     │  │
 │  │  or React) │   │  Django/DRF) │   │  responses,       │  │
 │  └────────────┘   └──────┬───────┘   │  hypotheses       │  │
 │   client fills           │           └───────────────────┘  │
 │   questionnaire          │                                   │
 │   via shared link        ▼                                   │
 │              ┌──────────────────────┐                        │
 │              │  LLM abstraction     │  pluggable backend     │
 │              │  layer               │──► Ollama (default)    │
 │              └──────────────────────┘──► Azure OpenAI (opt)  │
 │                         │           └──► llm.cqre.net (opt)  │
 │                         ▼                                     │
 │              ┌──────────────────────┐                        │
 │              │ Threat-intel         │  passive fetch only:   │
 │              │ connector module     │──► CISA KEV, advisories│
 │              └──────────────────────┘──► curated OSINT/search│
 │                         │                                     │
 │              ┌──────────┴───────────┐   ┌─────────────────┐  │
 │              │ L1 export adapter    │──►│ kill-chain .json│  │
 │              └──────────────────────┘   └─────────────────┘  │
 │              ┌──────────────────────┐                        │
 │              │ MCP server           │  AURORA / Claude can   │
 │              │ (query ORION)        │  query engagements     │
 │              └──────────────────────┘                        │
 └─────────────────────────────────────────────────────────────┘
            NO connection to client infrastructure
 ```
 Components:
 - **Backend** — FastAPI (Python) or Django REST, matching CISO Assistant's proven stack. Houses the questionnaire engine, synthesis orchestration, and export.
 - **Frontend** — SvelteKit or React. Two surfaces: the consultant console and the client-facing questionnaire (shareable per-engagement link, no client login burden beyond a token).
 - **LLM abstraction layer** — single internal interface, swappable backend. **Default: local Ollama** so sensitive intake data never leaves the box (§6). Optional: Azure OpenAI (EU) or managed `llm.cqre.net`, exactly as ASTRAL/AURORA offer.
 - **Questionnaire engine — questions-as-data** — adopting CISO Assistant's "frameworks as data, not code" principle: questionnaire templates, branching rules, and node/edge mappings live in the database as editable data, so new sector packs and question sets ship without code changes.
 - **Threat-intel connector** — passive fetchers for KEV, advisories, and curated search, each normalised into a provenance-tagged `ThreatIntelItem`.
 - **L1 export adapter** — emits the exact `.json` schema the L1 app imports.
 - **MCP server** — exposes ORION engagement state to AURORA and to AI assistants, consistent with the rest of the suite.
 ### Data model (sketch)
 | Entity | Holds | Notes |
 |--------|-------|-------|
 | `Engagement` | Client, scope, status | Per-engagement isolation boundary |
 | `IntakeProfile` | Stage-1 answers | Drives questionnaire generation |
 | `QuestionnaireTemplate` | Questions, branching rules, node/edge mappings | Questions-as-data; sector packs |
 | `Response` | Client answers, respondent role, timestamp | Sensitive — encrypted at rest |
 | `ThreatIntelItem` | Intel + source + date + URL | Provenance mandatory |
 | `Hypothesis` | Candidate node/edge + confidence + rationale + sources | The advisory output; never a "finding" |
 | `Export` | Generated L1 `.json` snapshots | Versioned, so you can diff intake-time vs post-fieldwork |
 ---
 ## 6. Sovereignty and data handling
 ORION holds something genuinely sensitive: a client's own description of where they are weak. That is a map of the kill chain drawn by the victim. The data posture must be uncompromising and is a direct expression of Pillar 4 (Sovereign Intelligence — never rent your ability to think) and Pillar 1.
 - **Local LLM by default.** Ollama runs in the same Compose stack; intake and responses never leave the consultant's host unless a backend is *explicitly* switched. The default must be the safe one.
 - **Encryption at rest** for `Response` and `Hypothesis` data; per-engagement key isolation.
 - **Retention and deletion.** Each engagement has a retention clock and a hard "right to delete" — when the engagement closes, the client's answers can be destroyed and the destruction evidenced (GDPR-friendly, and the right thing).
 - **No telemetry, no phone-home.** Consistent with the offline ethos of the L1 tool.
 - **Untrusted-content handling.** Threat-intel fetched from the web is untrusted input — treated as data, never as instructions to the LLM (prompt-injection defence, §8).
 ---
 ## 7. The epistemic honesty layer (the most important section)
 ORION's single greatest risk is that its confident, well-written output gets mistaken for fact. The repo's founding principle (Book I) is *validate by observation, never by inspection* — and ORION, by design, observes nothing. So the design must make its own uncertainty impossible to ignore:
 - **Everything ORION emits is a `Hypothesis`, never a `Finding`.** The vocabulary is enforced in the data model and the UI. A finding comes from the [assessment team guide](../assessment-templates/assessment-team-guide.md) fieldwork and lands in the [Findings Backlog](../assessment-templates/findings-backlog.md); a hypothesis comes from ORION and lands in L1 as something *to test*.
 - **Confidence and provenance on every claim.** No hypothesis without a stated confidence and the source(s) — the client answer or the intel item — that produced it.
 - **The "ghost-assessment" trap, named.** Just as a ghost CA policy displays correct config while enforcing nothing (Book I corollary), a client questionnaire can describe a control that has rotted into a ghost. ORION's hypotheses inherit the client's blind spots. The output must say so, loudly, and route every load-bearing claim to observation.
 - **The handoff is explicit.** ORION's deliverable is not "here is your kill chain." It is "here is the kill chain we *expect*, ranked by where to look first — now go and prove or disprove each link." That handoff into L1 and fieldwork is the product, not the hypothesis itself.
 Get this section right and ORION strengthens the practice. Get it wrong and it becomes the most dangerous thing in the toolkit: a confident map of a territory no one checked.
 ---
 ## 8. LLM guardrails
 - **Human-in-the-loop, always.** ORION proposes; the consultant disposes. No hypothesis auto-promotes to a finding, and ORION takes no action on anything.
 - **Prompt-injection defence.** Web/threat-intel content is wrapped and labelled as untrusted data; the system prompt instructs the model to treat fetched content as evidence to summarise, never as commands.
 - **Hallucination control.** Provenance is mandatory; a claim with no traceable source is flagged, not shown as fact. The consultant can click any hypothesis through to its sources.
 - **Quality floor.** Local models are weaker; the proposition should set an expectation that the default Ollama model is adequate for questionnaire generation and basic synthesis, with Azure OpenAI recommended where deeper reasoning materially helps — and the UI should make the active model and its limits visible.
 ---
 ## 9. How it fits the engagement
 | Phase | ORION's role |
 |-------|--------------|
 | Pre-contract / sales | Stage-1 intake during the first conversation; instant sector threat-landscape briefing as a credibility opener |
 | [Brownhat Diagnostic](../assessment-templates/nist-csf-baseline.md) intake | Generate and distribute the tailored questionnaire; collect responses before the on-site half-days |
 | Fieldwork ([assessment team guide](../assessment-templates/assessment-team-guide.md)) | Hand the consultant a hypothesised graph and a "look here first" list; fieldwork validates by observation |
 | L1 mapping | Import ORION's `.json`; correct and confirm; compute the real shortest existential path |
 | Reporting | Diff intake-time hypotheses against confirmed findings — a powerful "what you told us vs what we found" narrative for the client |
 ---
 ## 10. Regulatory alignment (EU)
 | Regulation | Requirement | ORION relevance |
 |------------|-------------|-----------------|
 | **NIS2** Art. 21 | Risk analysis, supply-chain and access governance | Structured intake produces documented evidence of risk-analysis scoping at engagement start |
 | **DORA** | ICT risk identification | The hypothesised kill chain is an ICT-risk-identification artefact (clearly marked as preliminary) |
 | **GDPR** Art. 5/32 | Data minimisation, appropriate measures, accountability | Local-LLM default, encryption, retention/deletion — minimal, sovereign handling of the only PII it holds |
 ---
 ## 11. Phased build (proposed MVP → product)
 1. **Phase 1 — MVP.** Stage-1 intake, LLM questionnaire generation (Ollama), manual-assisted synthesis, L1 `.json` export. No threat intel yet. Proves the core loop.
 2. **Phase 2 — Threat intel.** KEV / advisory / curated-search connectors with provenance; exploit-availability enrichment of hypotheses.
 3. **Phase 3 — Adaptive + integrated.** Full branching questionnaire engine (questions-as-data), MCP server, AURORA integration, sector question packs.
 4. **Phase 4 — Productisation.** Hosted tier, multi-engagement console, RBAC, retention automation.
 ---
 ## 12. Provisional commercial framing
 Positioned like AURORA — self-hosted and hosted tiers — though pricing is a placeholder pending the build decision:
 | Tier | Self-hosted | Hosted (managed) |
 |------|-------------|------------------|
 | Per-consultant / small practice | TBD | TBD |
 | Practice / multi-seat | TBD | TBD |
 Self-hosters bring their own LLM (Ollama / Azure OpenAI); hosted tier includes a managed model. Note the natural bundling: ORION (pre-engagement) → L1 Kill Chain Assessment (synthesis) → ASTRAL/PULSAR/AURORA (the operational layer once access exists).
 ---
 ## 13. What ORION is NOT
 - **Not a scanner and not an agent.** It touches no client system, active-scans nothing, and runs nothing in the client environment.
 - **Not autonomous.** It proposes hypotheses for a consultant; it never acts and never self-promotes a hypothesis to a finding.
 - **Not a replacement for fieldwork or for L1.** It is the layer *before* them — it tells you where to look, it does not tell you what is true.
 - **Not a compliance questionnaire tool.** The questions target the kill chain, not a control checklist; CISO Assistant covers the GRC/framework job and ORION should integrate with it, not duplicate it.
 ---
 ## 14. Open questions for the build decision
 1. **Backend choice** — FastAPI (lighter, our synthesis is bespoke) vs Django/DRF (matches CISO Assistant, more batteries). Leaning FastAPI.
 2. **Client-facing surface** — shared tokenised link (low friction) vs lightweight client login (more control). Leaning tokenised link with per-engagement expiry.
 3. **Where is the OSINT/active line drawn exactly?** Confirm ORION stays strictly passive and that any external scanning is deferred to the consented [Perimeter Scanning Capability](perimeter-scanning-capability.md).
 4. **CISO Assistant integration depth** — loose (export/import) vs deep (shared data model). Loose first.
 5. **Default Ollama model and the quality floor** — which local model is "good enough" for questionnaire generation, and where do we tell consultants to switch to Azure OpenAI.
 6. **Hypothesis accuracy expectations** — how do we measure and communicate that ORION's day-zero map is a starting hypothesis, and track how often it was right once fieldwork closed the loop?
 ---
 *Companion to the [Kill Chain Assessment app](kill-chain-assessment-app.md) (L1), [Book VII — Vulnerability Management](../books/06-vulnerability-management.md), and the [Quantum Vulnerability Management](../core/quantum-vulnerability-management.md) framework. Positioned in the suite alongside [ASTRAL, PULSAR, and AURORA](cqre-product-suite.md).*
@@ -17,6 +17,51 @@ The antifragile answer is a two-layer architecture: **network access** (Tailscal
 ---
 ## When overlay management networks help — and when they don't
 **Enterprises with their own data centres** already have the physical substrate for a proper management network: dedicated VLANs, hardware segmentation, jump boxes. Adding an overlay management network introduces a new Tier 0 component (the coordinator) on top of infrastructure that already solves the problem. The complexity cost outweighs the benefit. Traditional management VLAN segmentation, done properly, is the right answer.
 **SME clients with multi-cloud resources, containers, and DevOps workloads** have a different problem: there is no physical network to segment. Resources are scattered across Azure, AWS, a colo, and maybe on-prem. The management plane does not exist yet — you are building it. An overlay is how you build it, and it is the right answer for this context.
 **The T0/T1 split** — applying the tier model to the overlay itself:
 - **T0 systems** (domain controllers, ADCS, Entra Connect sync server — the identity control plane): use **Nebula**. No coordinator in the runtime path — once certificates are distributed, the overlay functions with zero external dependencies. The Nebula CA is the only Tier 0 component, and it can be kept offline. This means no coordinator to compromise, no external API call, no cloud service availability dependency for reaching your most critical systems.
 - **T1 systems** (member servers, cloud workloads, Kubernetes clusters, multi-cloud management): use **Tailscale** (or Headscale for sovereign requirements). Per-node ACLs, Entra OIDC integration, per-session MFA via key expiry and IdP enforcement. The coordinator trust concern is more acceptable at T1 — a compromised coordinator affects T1 access, not T0.
 **The T0 node count is not scary.** For a 5,000-person organisation, the realistic T0 Nebula population is:
 | Component | Count |
 |-----------|-------|
 | Domain Controllers | 4–8 |
 | Entra Connect / Cloud Sync server | 1–2 |
 | ADCS issuing CA | 1–2 |
 | AD FS servers (if not yet removed) | 0–4 |
 | Cloud admin VMs / PAWs | 5–10 |
 | **Total** | **~15–25 nodes** |
 Certificate management for 15–25 nodes is a documented procedure, not an operational burden. The CA signing ceremony happens a few times a year when a PAW is replaced or an admin leaves. This is tractable.
 ---
 ## The PAW problem and the cloud admin VM
 Physical PAWs are the right principle. They almost never get deployed. Hardware procurement, second device on the desk, behaviour change — the project dies before it starts.
 The **cloud-hosted admin workstation** preserves the essential security properties without the hardware problem:
 - A Windows 365 or Azure Virtual Desktop VM provisioned from a hardened template
 - Used only for privileged tasks (no email, no general browsing)
 - Connected to the Nebula T0 overlay (for DC access) and Tailscale T1 overlay (for server/cloud access)
 - Accessed by the admin from their normal device via browser or RDP client
 - Privileged credentials live in the cloud VM, not on the admin's local device
 - Compromise response: wipe the VM, reprovision from template in 20 minutes
 The security property that matters — privileged credentials do not touch the device used for email and browsing — is preserved. An attacker who compromises the admin's local device gets a browser session to a cloud VM that requires phishing-resistant MFA to reach. They do not get cached credentials, session tokens, or WireGuard keys for the management overlay.
 **When to use a physical PAW instead:** clients with a strong security culture and genuine appetite for the operational overhead, OT/ICS environments where the management workstation may need to be air-gapped, or engagements where the threat model includes a sophisticated attacker who would attempt to compromise the RDP session interactively.
 ---
 ## The Two Layers
 ### Layer 1: Network Access — Tailscale / Headscale + WireGuard
@@ -130,6 +175,30 @@ This catches more clients than it appears. A manufacturing company with 800 empl
 ---
 ### Nebula — T0 Management Overlay
 | Attribute | Detail |
 |-----------|--------|
 | **What it does** | WireGuard-based overlay mesh with no coordinator in the runtime path. Nodes authenticate via pre-distributed certificates signed by a local CA. Lighthouse nodes handle NAT traversal only — they are not in the authentication path. |
 | **Why it is right for T0** | No external runtime dependency. A compromised or unavailable coordinator cannot affect T0 access. The CA (the actual trust anchor) can be kept offline and brought up only for certificate issuance. |
 | **Trade-off vs Tailscale** | No dynamic node management (adding/removing a node requires a CA operation and cert redistribution); no cloud-managed control plane; higher initial setup complexity; certificate revocation requires distributing an updated blocklist |
 | **Why the trade-off is acceptable for T0** | T0 node population is small (15–25 nodes) and stable. Revocation events (lost PAW, departing admin) are rare and known immediately. The operational overhead is a documented ceremony run a few times a year, not a recurring burden. |
 | **Antifragile pillar** | Structural Decoupling, Sovereign Intelligence |
 | **When to deploy** | T0 systems (DCs, sync server, ADCS) in any estate; air-gapped or restricted environments; clients where the management plane must have zero external runtime dependencies |
 **Nebula CA management — the one non-trivial operation:**
 The Nebula CA private key is the trust anchor for the entire T0 overlay. It must be treated accordingly:
 - Air-gapped machine (a dedicated laptop that is never networked, or a hardware security module)
 - Documented signing ceremony: who is authorised to sign a new certificate, what approval is required, what the procedure is
 - Named individuals (minimum two) who know the procedure and can perform it
 - CA key backup: encrypted, stored separately from the signing machine, tested
 - Short certificate lifetimes (90–180 days) so revocation is handled implicitly by non-renewal as much as by explicit blocklist distribution
 This is the same discipline as an offline root CA — because that is functionally what it is.
 ---
 ### Smallstep — Certificate-Based SSH Access
 | Attribute | Detail |
@@ -145,20 +214,34 @@ This catches more clients than it appears. A manufacturing company with 800 empl
 ## The Decision Framework
 ```
-Does the client have legacy VPN sprawl or flat-network vendor access?
+Does the client have their own data centre with physical network infrastructure?
-├── YES → Deploy Layer 1 (network access) first
+├── YES → Traditional management VLAN segmentation + jump box
-│   ├── Wants managed service + commercial support → Tailscale (partnership)
+│          Overlay adds complexity without proportional benefit here
 └── NO / Multi-cloud / Scattered resources → Overlay is the right management plane
 Does the client need a T0 management overlay (DC, ADCS, sync server access)?
 ├── YES → Nebula (no external runtime dependency, CA offline)
 │   └── Admin workstation: cloud admin VM (W365/AVD) or physical PAW, enrolled in Nebula
 │
 Does the client need a T1 overlay (servers, cloud workloads, K8s, DevOps)?
 ├── YES → Layer 1 (network access)
 │   ├── Wants managed service + commercial support → Tailscale + Entra OIDC + key expiry MFA
 │   └── Wants full sovereignty / data residency → Headscale + WireGuard
 │
 Does the client need protocol-aware session recording / JIT / DB access?
 ├── YES → Add Layer 2 (PAM)
 │   ├── < 100 employees AND < $10M revenue → Teleport CE (free, self-hosted)
-│   ├── Larger org / needs support → Teleport Enterprise (commercial)
+│   ├── Larger org / needs support → Teleport Enterprise (commercial, verify current pricing)
-│   └── SSH-only, budget-constrained → Smallstep (certificates only)
+│   └── SSH-only, budget-constrained → Smallstep (certificates only, no session recording)
 │
-Does the client need both layers?
+Typical SME multi-cloud client:
-├── MOST CLIENTS → Tailscale (network) + Teleport CE/Enterprise (PAM)
+├── T0: Nebula + cloud admin VMs
-└── OT/CRITICAL INFRA → Headscale (sovereign network) + Teleport (recorded vendor access)
+├── T1: Tailscale + Entra OIDC
 └── Session recording: Teleport CE if eligible, otherwise accept the gap and compensate with
    cloud VM audit logging and Tailscale connection logs
 OT / Critical infrastructure:
 └── Headscale (sovereign T1) + Nebula (T0 where applicable) + Teleport (vendor session recording)
 ```
 ---
@@ -0,0 +1,15 @@
 # Tools
 Standalone, runnable instruments that support the engagement — as distinct from the markdown frameworks and playbooks elsewhere in the repository.
 | Tool | What it does | How to run |
 |------|--------------|------------|
 | [`kill-chain-assessment.html`](kill-chain-assessment.html) | Maps an unknown estate into an attack graph, computes the shortest existential path (the kill chain), and sizes every node into a remediation quantum. The synthesis instrument for the first act of every engagement. | Open in any browser. Offline, no install, no network. State persists locally; exports to `.json` and `.md`. |
 ## Design constraints for tools in this directory
 - **Offline and sovereign.** Client attack-surface data must never leave the consultant's machine for a vendor cloud (Antifragile Manifest, Pillar 4). Tools here are single-file and dependency-free wherever possible.
 - **Exportable.** Output drops into the engagement deliverables — the [diagnostic report](../assessment-templates/nist-csf-baseline.md) and the [Findings Backlog](../assessment-templates/findings-backlog.md) — not into a proprietary format.
 - **Explicit, not magic.** A tool makes the consultant's judgement repeatable; it does not replace it.
 See the [Kill Chain Assessment App spec](../playbooks/kill-chain-assessment-app.md) for the model behind the first tool.
@@ -0,0 +1,642 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
 <meta charset="UTF-8">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <title>Kill Chain Assessment — Brownhat / CQRE</title>
 <style>
  :root{
    --bg:#0d1117; --panel:#161b22; --panel2:#1c2330; --line:#30363d; --line2:#3d4654;
    --ink:#e6edf3; --muted:#9aa6b2; --faint:#6e7781;
    --p0:#ff4d4f; --p1:#ff9f0a; --p2:#3fb950; --dark:#a371f7; --entry:#58a6ff; --jewel:#f7c948;
    --accent:#58a6ff; --accent2:#1f6feb;
    --crit:#ff4d4f; --sev:#ff9f0a; --std:#3fb950; --darkq:#a371f7; --house:#6e7781;
  }
  *{box-sizing:border-box}
  body{margin:0;background:var(--bg);color:var(--ink);font:14px/1.5 -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif}
  header{padding:16px 22px;border-bottom:1px solid var(--line);display:flex;align-items:center;gap:16px;flex-wrap:wrap;background:linear-gradient(180deg,#11161d,#0d1117)}
  header h1{font-size:18px;margin:0;letter-spacing:.3px}
  header .tag{font-size:11px;color:var(--faint);border:1px solid var(--line);padding:2px 8px;border-radius:20px}
  header .sub{color:var(--muted);font-size:12.5px;margin-left:auto;max-width:520px;text-align:right}
  .wrap{display:grid;grid-template-columns:340px 1fr 360px;gap:0;height:calc(100vh - 59px)}
  .col{overflow-y:auto;padding:16px}
  .col.left{border-right:1px solid var(--line)}
  .col.right{border-left:1px solid var(--line);background:#0b0f14}
  h2{font-size:12px;text-transform:uppercase;letter-spacing:1px;color:var(--muted);margin:4px 0 10px;font-weight:600}
  h2 .hint{text-transform:none;letter-spacing:0;font-weight:400;color:var(--faint);display:block;font-size:11.5px;margin-top:3px}
  .panel{background:var(--panel);border:1px solid var(--line);border-radius:10px;padding:13px;margin-bottom:14px}
  label{display:block;font-size:11.5px;color:var(--muted);margin:9px 0 3px}
  input,select,textarea,button{font:inherit;color:var(--ink)}
  input[type=text],select,textarea{width:100%;background:var(--panel2);border:1px solid var(--line2);border-radius:7px;padding:7px 9px}
  input[type=text]:focus,select:focus,textarea:focus{outline:none;border-color:var(--accent)}
  textarea{resize:vertical;min-height:34px}
  .row{display:flex;gap:8px}
  .row>*{flex:1}
  .chk{display:flex;align-items:center;gap:7px;margin:8px 0;font-size:12.5px;color:var(--ink)}
  .chk input{width:auto}
  button{cursor:pointer;background:var(--panel2);border:1px solid var(--line2);border-radius:7px;padding:8px 12px;transition:.12s}
  button:hover{border-color:var(--accent);color:#fff}
  button.primary{background:var(--accent2);border-color:var(--accent2);color:#fff;font-weight:600}
  button.primary:hover{background:#388bfd}
  button.ghost{background:transparent}
  button.danger:hover{border-color:var(--p0);color:var(--p0)}
  .btnrow{display:flex;gap:8px;flex-wrap:wrap;margin-top:10px}
  .btnrow button{flex:1;min-width:0}
  .pill{display:inline-block;font-size:10px;font-weight:700;letter-spacing:.5px;padding:2px 7px;border-radius:20px;text-transform:uppercase}
  .pill.entry{background:rgba(88,166,255,.16);color:var(--entry);border:1px solid var(--entry)}
  .pill.jewel{background:rgba(247,201,72,.14);color:var(--jewel);border:1px solid var(--jewel)}
  .node-item{background:var(--panel2);border:1px solid var(--line);border-radius:8px;padding:9px 10px;margin-bottom:7px;cursor:pointer}
  .node-item:hover{border-color:var(--accent)}
  .node-item.sel{border-color:var(--accent);box-shadow:0 0 0 1px var(--accent) inset}
  .node-item .nm{font-weight:600;display:flex;justify-content:space-between;align-items:center;gap:6px}
  .node-item .meta{font-size:11px;color:var(--faint);margin-top:3px;display:flex;gap:6px;flex-wrap:wrap}
  .edge-item{font-size:12px;background:var(--panel2);border:1px solid var(--line);border-radius:7px;padding:7px 9px;margin-bottom:6px;display:flex;justify-content:space-between;gap:8px;align-items:flex-start}
  .edge-item .x{cursor:pointer;color:var(--faint);flex-shrink:0}
  .edge-item .x:hover{color:var(--p0)}
  .tabs{display:flex;gap:4px;margin-bottom:12px;border-bottom:1px solid var(--line)}
  .tabs button{border:none;border-bottom:2px solid transparent;border-radius:0;background:none;color:var(--muted);padding:8px 12px}
  .tabs button.on{color:#fff;border-bottom-color:var(--accent)}
  svg{width:100%;display:block}
  .empty{color:var(--faint);font-size:12.5px;text-align:center;padding:30px 10px;border:1px dashed var(--line2);border-radius:10px}
  .kc-box{background:var(--panel);border:1px solid var(--line);border-radius:10px;padding:14px;margin-bottom:14px}
  .kc-step{display:flex;align-items:center;gap:10px;padding:7px 0}
  .kc-arrow{color:var(--p0);font-size:18px;text-align:center;margin:-2px 0}
  .kc-node{flex:1;background:var(--panel2);border:1px solid var(--line2);border-left:3px solid var(--p0);border-radius:6px;padding:7px 10px}
  .kc-node .n{font-weight:600;font-size:13px}
  .kc-node .m{font-size:11px;color:var(--muted)}
  .kc-mech{font-size:11px;color:var(--faint);font-style:italic;padding-left:14px}
  .stat{display:flex;justify-content:space-between;padding:5px 0;border-bottom:1px solid var(--line);font-size:13px}
  .stat:last-child{border:none}
  .stat b{font-variant-numeric:tabular-nums}
  .q{border-radius:8px;border:1px solid var(--line);padding:10px 12px;margin-bottom:9px;background:var(--panel)}
  .q .qh{display:flex;justify-content:space-between;align-items:center;font-weight:700;font-size:12px;letter-spacing:.5px;text-transform:uppercase}
  .q.crit{border-left:4px solid var(--crit)} .q.crit .qh{color:var(--crit)}
  .q.sev{border-left:4px solid var(--sev)}  .q.sev .qh{color:var(--sev)}
  .q.std{border-left:4px solid var(--std)}  .q.std .qh{color:var(--std)}
  .q.darkq{border-left:4px solid var(--darkq)} .q.darkq .qh{color:var(--darkq)}
  .q .ql{font-size:12.5px;margin-top:7px}
  .q .qi{padding:4px 0;border-top:1px solid var(--line);margin-top:5px}
  .q .qi:first-of-type{border:none}
  .q .qi .qn{font-weight:600}
  .q .qi .qd{font-size:11px;color:var(--muted)}
  .q .budget{font-size:10.5px;color:var(--faint);font-weight:400;text-transform:none;letter-spacing:0}
  .discovery h3{font-size:12.5px;margin:12px 0 5px;color:var(--accent)}
  .discovery ul{margin:0 0 6px;padding-left:18px;color:var(--muted);font-size:12px}
  .discovery li{margin-bottom:3px}
  .discovery code{background:var(--panel2);border:1px solid var(--line);border-radius:4px;padding:1px 5px;color:#e6edf3;font-size:11px}
  .note{font-size:11.5px;color:var(--faint);margin-top:6px}
  .legend{display:flex;gap:12px;flex-wrap:wrap;font-size:11px;color:var(--muted);margin-bottom:8px}
  .legend span{display:flex;align-items:center;gap:5px}
  .dot{width:10px;height:10px;border-radius:50%}
  .topbtns{display:flex;gap:8px}
  .file-in{display:none}
  ::-webkit-scrollbar{width:10px;height:10px}
  ::-webkit-scrollbar-thumb{background:#222b36;border-radius:6px}
  ::-webkit-scrollbar-track{background:transparent}
  .muted{color:var(--muted)} .small{font-size:11.5px}
 </style>
 </head>
 <body>
 <header>
  <h1>⛓ Kill Chain Assessment</h1>
  <span class="tag">Brownhat · CQRE</span>
  <div class="topbtns">
    <button class="ghost" onclick="loadSample()">Load sample</button>
    <button class="ghost" onclick="exportJSON()">Save .json</button>
    <button class="ghost" onclick="document.getElementById('imp').click()">Open .json</button>
    <button class="primary" onclick="exportMD()">Export report .md</button>
    <input type="file" id="imp" class="file-in" accept=".json" onchange="importJSON(event)">
  </div>
  <div class="sub">Map unknown territory into nodes and attacker moves. The tool finds the shortest path from a foothold to an existential asset — that path <b>is</b> the kill chain — and sizes each node into a remediation quantum.</div>
 </header>
 <div class="wrap">
  <!-- LEFT: capture -->
  <div class="col left">
    <div class="tabs">
      <button id="t-node" class="on" onclick="tab('node')">Nodes</button>
      <button id="t-edge" onclick="tab('edge')">Moves</button>
      <button id="t-disc" onclick="tab('disc')">Discovery</button>
    </div>
    <!-- NODE form -->
    <div id="pane-node">
      <div class="panel">
        <h2>Add / edit node<span class="hint">An asset, foothold, identity, or system in the estate.</span></h2>
        <label>Name</label>
        <input type="text" id="n-name" placeholder="e.g. Entra ID Connect sync server">
        <div class="row">
          <div>
            <label>Layer</label>
            <select id="n-type">
              <option value="entry">Entry / exposure</option>
              <option value="identity">Identity</option>
              <option value="privilege">Privilege</option>
              <option value="device">Device / endpoint</option>
              <option value="data">Data / collaboration</option>
              <option value="infra">Infrastructure / OT</option>
              <option value="recovery">Recovery / backup</option>
            </select>
          </div>
          <div>
            <label>Tier</label>
            <select id="n-tier">
              <option value="">— unknown —</option>
              <option value="T0">T0 (control plane)</option>
              <option value="T1">T1 (servers/apps)</option>
              <option value="T2">T2 (workstations)</option>
            </select>
          </div>
        </div>
        <div class="chk"><input type="checkbox" id="n-entry"><label style="margin:0;color:var(--entry)">Adversary entry point (internet-reachable / unauth foothold)</label></div>
        <div class="chk"><input type="checkbox" id="n-jewel"><label style="margin:0;color:var(--jewel)">Crown jewel (existential — org cannot operate if lost)</label></div>
        <div class="row">
          <div>
            <label>Reachable by adversary?</label>
            <select id="n-reach"><option value="unknown">Unknown</option><option value="yes">Yes</option><option value="no">No</option></select>
          </div>
          <div>
            <label>Exploit / path available?</label>
            <select id="n-expl"><option value="unknown">Unknown</option><option value="yes">Yes</option><option value="no">No</option></select>
          </div>
        </div>
        <div class="chk"><input type="checkbox" id="n-comp"><label style="margin:0">Compensating control already in front of it (EDR, WAF, segmentation)</label></div>
        <label>Finding / note (optional)</label>
        <textarea id="n-note" placeholder="What's wrong here, evidence, CVE…"></textarea>
        <div class="btnrow">
          <button class="primary" onclick="saveNode()">Save node</button>
          <button class="ghost" onclick="clearNodeForm()">Clear</button>
        </div>
      </div>
      <h2>Nodes <span id="n-count" class="muted small"></span></h2>
      <div id="node-list"></div>
    </div>
    <!-- EDGE form -->
    <div id="pane-edge" style="display:none">
      <div class="panel">
        <h2>Add attacker move<span class="hint">A directed step: "from here, an attacker can reach there."</span></h2>
        <label>From</label>
        <select id="e-from"></select>
        <label>To</label>
        <select id="e-to"></select>
        <label>Mechanism (how)</label>
        <input type="text" id="e-mech" placeholder="e.g. DCSync via sync-account rights">
        <label>Adversary effort: <span id="e-wlabel">3 — moderate</span></label>
        <input type="range" id="e-weight" min="1" max="5" value="3" style="width:100%" oninput="document.getElementById('e-wlabel').textContent=effortLabel(this.value)">
        <div class="note">Lower effort = easier for the attacker. The kill chain is the <i>lowest-effort</i> path to a crown jewel.</div>
        <div class="btnrow"><button class="primary" onclick="saveEdge()">Add move</button></div>
      </div>
      <h2>Moves <span id="e-count" class="muted small"></span></h2>
      <div id="edge-list"></div>
    </div>
    <!-- DISCOVERY -->
    <div id="pane-disc" style="display:none">
      <div class="panel discovery">
        <h2>Discovering the chain in unknown territory<span class="hint">What to ask and run to surface the edges you can't see yet. Each answer becomes a node or a move.</span></h2>
        <h3>1 · Find the entry points (reachability)</h3>
        <ul>
          <li>What does the internet see? External scan / Shodan / attack-surface mapping → every internet-facing service is a candidate entry node.</li>
          <li>Internet-facing VPN, RDP, mail, web apps, appliances — firmware current? MFA enforced?</li>
          <li>Legacy auth still enabled? (bypasses MFA — a silent entry edge)</li>
        </ul>
        <h3>2 · Find the identity bridges (Book II)</h3>
        <ul>
          <li><code>Entra Connect sync account</code> — does it hold DCSync rights on-prem? That's a cloud→on-prem edge.</li>
          <li>Federation / PTA / PHS path, writeback, seamless SSO — map the bridge.</li>
        </ul>
        <h3>3 · Find privilege paths (Book III)</h3>
        <ul>
          <li>BloodHound: <code>shortestPath</code> to Domain Admins from non-admins — every path is a chain of edges.</li>
          <li>Kerberoastable / AS-REP-roastable high-priv accounts; KRBTGT last-set date.</li>
          <li>App registrations with <code>RoleManagement.ReadWrite.Directory</code>, <code>Mail.ReadWrite</code> — OAuth consent edges.</li>
        </ul>
        <h3>4 · Find the crown jewels (existential nodes)</h3>
        <ul>
          <li>Ask the business, not IT: "what stops the company operating?" ERP, payment rails, OT control, the customer DB.</li>
          <li>Backups & recovery — are they reachable from the estate they protect? If yes, that's an edge into your lifeboat.</li>
        </ul>
        <h3>5 · Map blast radius (the edges between)</h3>
        <ul>
          <li>Flat network? NTLM relay, lateral movement → dense edges, short chains.</li>
          <li>Segmentation, least privilege, T0 isolation → sparse edges, long chains. Note where they're <i>missing</i>.</li>
        </ul>
        <p class="note">Anything you can't characterise (reachable? unknown) becomes a <span style="color:var(--darkq)">dark quantum</span> — capture the node anyway and mark reachability/exploit "unknown". An uncharacterised asset is the dangerous kind.</p>
      </div>
    </div>
  </div>
  <!-- CENTER: graph + chain -->
  <div class="col center">
    <h2>Attack graph &amp; kill chain</h2>
    <div class="legend">
      <span><span class="dot" style="background:var(--entry)"></span>entry</span>
      <span><span class="dot" style="background:var(--jewel)"></span>crown jewel</span>
      <span><span class="dot" style="background:var(--p0)"></span>on shortest chain (P0)</span>
      <span><span class="dot" style="background:var(--p1)"></span>on a chain (P1)</span>
      <span><span class="dot" style="background:var(--p2)"></span>off-chain (P2)</span>
    </div>
    <div class="panel" style="padding:6px"><div id="graph"></div></div>
    <div id="chain-out"></div>
  </div>
  <!-- RIGHT: results -->
  <div class="col right">
    <h2>Assessment</h2>
    <div class="panel" id="summary"></div>
    <h2>Remediation quanta<span class="hint">Sized by time-to-existential-impact, not CVSS.</span></h2>
    <div id="quanta"></div>
  </div>
 </div>
 <script>
 /* ---------------- state ---------------- */
 let nodes = [];   // {id,name,type,tier,entry,jewel,reach,expl,comp,note}
 let edges = [];   // {id,from,to,mech,w}
 let editingId = null;
 let uid = () => 'n'+Math.random().toString(36).slice(2,8);
 const STORE='brownhat-killchain-v1';
 function persist(){ try{localStorage.setItem(STORE,JSON.stringify({nodes,edges}));}catch(e){} }
 function restore(){ try{const s=JSON.parse(localStorage.getItem(STORE));if(s&&s.nodes){nodes=s.nodes;edges=s.edges||[];}}catch(e){} }
 function effortLabel(v){return {1:'1 — trivial',2:'2 — easy',3:'3 — moderate',4:'4 — hard',5:'5 — very hard'}[v];}
 /* ---------------- tabs ---------------- */
 function tab(t){
  ['node','edge','disc'].forEach(x=>{
    document.getElementById('pane-'+x).style.display = x===t?'block':'none';
    document.getElementById('t-'+x).classList.toggle('on',x===t);
  });
  if(t==='edge') refreshEdgeSelects();
 }
 /* ---------------- node CRUD ---------------- */
 function saveNode(){
  const name=document.getElementById('n-name').value.trim();
  if(!name){alert('Name the node first.');return;}
  const data={
    name,
    type:document.getElementById('n-type').value,
    tier:document.getElementById('n-tier').value,
    entry:document.getElementById('n-entry').checked,
    jewel:document.getElementById('n-jewel').checked,
    reach:document.getElementById('n-reach').value,
    expl:document.getElementById('n-expl').value,
    comp:document.getElementById('n-comp').checked,
    note:document.getElementById('n-note').value.trim()
  };
  if(editingId){ Object.assign(nodes.find(n=>n.id===editingId),data); }
  else { nodes.push(Object.assign({id:uid()},data)); }
  clearNodeForm(); render();
 }
 function editNode(id){
  const n=nodes.find(x=>x.id===id); if(!n)return;
  editingId=id;
  document.getElementById('n-name').value=n.name;
  document.getElementById('n-type').value=n.type;
  document.getElementById('n-tier').value=n.tier||'';
  document.getElementById('n-entry').checked=n.entry;
  document.getElementById('n-jewel').checked=n.jewel;
  document.getElementById('n-reach').value=n.reach;
  document.getElementById('n-expl').value=n.expl;
  document.getElementById('n-comp').checked=n.comp;
  document.getElementById('n-note').value=n.note||'';
  tab('node'); window.scrollTo(0,0);
 }
 function delNode(id){
  if(!confirm('Delete this node and its moves?'))return;
  nodes=nodes.filter(n=>n.id!==id);
  edges=edges.filter(e=>e.from!==id&&e.to!==id);
  if(editingId===id)clearNodeForm();
  render();
 }
 function clearNodeForm(){
  editingId=null;
  ['n-name','n-note'].forEach(i=>document.getElementById(i).value='');
  document.getElementById('n-type').value='entry';
  document.getElementById('n-tier').value='';
  ['n-entry','n-jewel','n-comp'].forEach(i=>document.getElementById(i).checked=false);
  document.getElementById('n-reach').value='unknown';
  document.getElementById('n-expl').value='unknown';
 }
 /* ---------------- edge CRUD ---------------- */
 function refreshEdgeSelects(){
  const opts=nodes.map(n=>`<option value="${n.id}">${esc(n.name)}</option>`).join('');
  document.getElementById('e-from').innerHTML=opts;
  document.getElementById('e-to').innerHTML=opts;
 }
 function saveEdge(){
  const from=document.getElementById('e-from').value, to=document.getElementById('e-to').value;
  if(!from||!to){alert('Add at least two nodes first.');return;}
  if(from===to){alert('A move must go between two different nodes.');return;}
  edges.push({id:uid(),from,to,mech:document.getElementById('e-mech').value.trim(),w:+document.getElementById('e-weight').value});
  document.getElementById('e-mech').value='';
  render();
 }
 function delEdge(id){ edges=edges.filter(e=>e.id!==id); render(); }
 /* ---------------- analysis: Dijkstra shortest existential path ---------------- */
 function analyse(){
  const entryIds=nodes.filter(n=>n.entry).map(n=>n.id);
  const jewelIds=new Set(nodes.filter(n=>n.jewel).map(n=>n.id));
  const adj={}; nodes.forEach(n=>adj[n.id]=[]);
  edges.forEach(e=>{ if(adj[e.from]) adj[e.from].push(e); });
  // multi-source Dijkstra from all entry points
  const dist={}, prev={}, prevEdge={};
  nodes.forEach(n=>dist[n.id]=Infinity);
  const pq=[];
  entryIds.forEach(id=>{dist[id]=0; pq.push([0,id]);});
  while(pq.length){
    pq.sort((a,b)=>a[0]-b[0]);
    const [d,u]=pq.shift();
    if(d>dist[u])continue;
    (adj[u]||[]).forEach(e=>{
      const nd=d+e.w;
      if(nd<dist[e.to]){dist[e.to]=nd;prev[e.to]=u;prevEdge[e.to]=e;pq.push([nd,e.to]);}
    });
  }
  // best jewel = reachable jewel with min dist
  let best=null;
  jewelIds.forEach(j=>{ if(dist[j]<Infinity && (!best||dist[j]<dist[best])) best=j; });
  // reconstruct shortest chain
  let chain=[],chainEdges=[];
  if(best!=null){
    let cur=best;
    while(cur!=null){ chain.unshift(cur); if(prevEdge[cur]){chainEdges.unshift(prevEdge[cur]);cur=prev[cur];} else cur=null; }
  }
  const onShortest=new Set(chain);
  // nodes on ANY existential path: reachable from entry AND can reach a jewel
  const reachFromEntry=new Set();
  (function(){const st=[...entryIds];entryIds.forEach(i=>reachFromEntry.add(i));
    while(st.length){const u=st.pop();(adj[u]||[]).forEach(e=>{if(!reachFromEntry.has(e.to)){reachFromEntry.add(e.to);st.push(e.to);}});}})();
  // reverse reachability to a jewel
  const radj={}; nodes.forEach(n=>radj[n.id]=[]); edges.forEach(e=>{if(radj[e.to])radj[e.to].push(e.from);});
  const canReachJewel=new Set();
  (function(){const st=[...jewelIds];jewelIds.forEach(i=>canReachJewel.add(i));
    while(st.length){const u=st.pop();(radj[u]||[]).forEach(f=>{if(!canReachJewel.has(f)){canReachJewel.add(f);st.push(f);}});}})();
  const onAnyChain=new Set(nodes.filter(n=>reachFromEntry.has(n.id)&&canReachJewel.has(n.id)).map(n=>n.id));
  return {chain,chainEdges,onShortest,onAnyChain,dist,best,entryIds,jewelIds,reachable:reachFromEntry};
 }
 /* priority + quantum per node */
 function priority(n,a){
  if(a.onShortest.has(n.id))return 'P0';
  if(a.onAnyChain.has(n.id))return 'P1';
  return 'P2';
 }
 function quantum(n,a){
  const onChain = a.onShortest.has(n.id)||a.onAnyChain.has(n.id);
  if(!onChain) return 'house';
  if(n.reach==='unknown'||n.expl==='unknown') return 'dark';
  if(a.onShortest.has(n.id) && n.reach==='yes' && n.expl==='yes' && !n.comp) return 'crit';
  if(n.reach==='yes' || n.expl==='yes') return 'sev';
  return 'std';
 }
 const QMETA={
  crit:{label:'Critical quantum',budget:'hours · compensating control, not the patch',cls:'crit'},
  sev:{label:'Severe quantum',budget:'days · batched into one change window',cls:'sev'},
  std:{label:'Standard quantum',budget:'sprint · drained in finishable batches',cls:'std'},
  dark:{label:'Dark quantum',budget:'unsized · route to discovery',cls:'darkq'},
  house:{label:'Housekeeping',budget:'off every kill chain — not urgent',cls:'std'}
 };
 /* ---------------- render ---------------- */
 function esc(s){return (s||'').replace(/[&<>"]/g,c=>({'&':'&amp;','<':'&lt;','>':'&gt;','"':'&quot;'}[c]));}
 const TYPELBL={entry:'Entry',identity:'Identity',privilege:'Privilege',device:'Device',data:'Data',infra:'Infra/OT',recovery:'Recovery'};
 function render(){
  persist();
  renderNodeList(); renderEdgeList(); refreshEdgeSelects();
  const a = analyse();
  renderGraph(a); renderChain(a); renderSummary(a); renderQuanta(a);
 }
 function renderNodeList(){
  document.getElementById('n-count').textContent = nodes.length?`(${nodes.length})`:'';
  const el=document.getElementById('node-list');
  if(!nodes.length){el.innerHTML='<div class="empty">No nodes yet. Add the footholds and assets you find — or “Load sample”.</div>';return;}
  const a=analyse();
  el.innerHTML=nodes.map(n=>{
    const p=priority(n,a);
    const pc=p==='P0'?'var(--p0)':p==='P1'?'var(--p1)':'var(--p2)';
    return `<div class="node-item ${editingId===n.id?'sel':''}" onclick="editNode('${n.id}')">
      <div class="nm"><span>${esc(n.name)}</span>
        <span style="display:flex;gap:5px;align-items:center">
          ${n.entry?'<span class="pill entry">entry</span>':''}
          ${n.jewel?'<span class="pill jewel">jewel</span>':''}
          <span style="color:${pc};font-weight:700;font-size:11px">${(a.onShortest.has(n.id)||a.onAnyChain.has(n.id))?p:'—'}</span>
          <span class="x" onclick="event.stopPropagation();delNode('${n.id}')" style="cursor:pointer;color:var(--faint)">✕</span>
        </span>
      </div>
      <div class="meta"><span>${TYPELBL[n.type]||n.type}</span>${n.tier?`<span>· ${n.tier}</span>`:''}
        <span>· reach:${n.reach}</span><span>· exploit:${n.expl}</span>${n.comp?'<span>· compensated</span>':''}</div>
    </div>`;
  }).join('');
 }
 function renderEdgeList(){
  document.getElementById('e-count').textContent = edges.length?`(${edges.length})`:'';
  const el=document.getElementById('edge-list');
  if(!edges.length){el.innerHTML='<div class="empty">No moves yet. A move is one attacker step from one node to another.</div>';return;}
  const nm=id=>{const n=nodes.find(x=>x.id===id);return n?esc(n.name):'?';};
  el.innerHTML=edges.map(e=>`<div class="edge-item">
    <div><b>${nm(e.from)}</b> → <b>${nm(e.to)}</b><br>
      <span class="muted small">${esc(e.mech)||'(mechanism unspecified)'} · effort ${e.w}</span></div>
    <span class="x" onclick="delEdge('${e.id}')">✕</span></div>`).join('');
 }
 function renderGraph(a){
  const g=document.getElementById('graph');
  if(!nodes.length){g.innerHTML='<div class="empty" style="margin:10px">The attack graph renders here.</div>';return;}
  // simple layered layout by distance-from-entry (BFS depth), entries left → jewels right
  const depth={}; nodes.forEach(n=>depth[n.id]=n.entry?0:null);
  const adj={};nodes.forEach(n=>adj[n.id]=[]);edges.forEach(e=>{if(adj[e.from])adj[e.from].push(e.to);});
  let q=nodes.filter(n=>n.entry).map(n=>n.id),guard=0;
  while(q.length&&guard++<999){const u=q.shift();(adj[u]||[]).forEach(v=>{if(depth[v]==null||depth[v]>depth[u]+1){depth[v]=depth[u]+1;q.push(v);}});}
  let maxd=0;nodes.forEach(n=>{if(depth[n.id]==null)depth[n.id]=999;maxd=Math.max(maxd,depth[n.id]===999?0:depth[n.id]);});
  // orphans (no depth) put in a trailing column
  const cols={};nodes.forEach(n=>{const d=depth[n.id]===999?maxd+1:depth[n.id];(cols[d]=cols[d]||[]).push(n);});
  const colKeys=Object.keys(cols).map(Number).sort((x,y)=>x-y);
  const W=Math.max(640,colKeys.length*180), colW=W/colKeys.length;
  let maxRows=0;colKeys.forEach(k=>maxRows=Math.max(maxRows,cols[k].length));
  const H=Math.max(220,maxRows*72+40);
  const pos={};
  colKeys.forEach((k,ci)=>{cols[k].forEach((n,ri)=>{const rows=cols[k].length;
    pos[n.id]={x:colW*ci+colW/2,y:H/(rows+1)*(ri+1)};});});
  const col=n=>{if(a.onShortest.has(n.id))return'var(--p0)';if(a.onAnyChain.has(n.id))return'var(--p1)';if(n.jewel)return'var(--jewel)';if(n.entry)return'var(--entry)';return'#3fb95066';};
  const onChainEdge=new Set(a.chainEdges.map(e=>e.id));
  let svg=`<svg viewBox="0 0 ${W} ${H}" preserveAspectRatio="xMidYMid meet">
    <defs><marker id="arr" markerWidth="9" markerHeight="9" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6 Z" fill="#5b6675"/></marker>
    <marker id="arrR" markerWidth="10" markerHeight="10" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6 Z" fill="var(--p0)"/></marker></defs>`;
  edges.forEach(e=>{const a1=pos[e.from],b=pos[e.to];if(!a1||!b)return;
    const hot=onChainEdge.has(e.id);
    const mx=(a1.x+b.x)/2,my=(a1.y+b.y)/2-18;
    svg+=`<path d="M${a1.x},${a1.y} Q${mx},${my} ${b.x},${b.y}" fill="none" stroke="${hot?'var(--p0)':'#39414d'}" stroke-width="${hot?2.4:1.2}" marker-end="url(#${hot?'arrR':'arr'})" opacity="${hot?1:.7}"/>`;
  });
  nodes.forEach(n=>{const p=pos[n.id];if(!p)return;const c=col(n);
    const r=n.jewel||n.entry?20:16;
    svg+=`<g>
      <circle cx="${p.x}" cy="${p.y}" r="${r}" fill="${c}" fill-opacity="${a.onShortest.has(n.id)?0.95:0.18}" stroke="${c}" stroke-width="2"/>
      ${n.jewel?`<text x="${p.x}" y="${p.y+4}" text-anchor="middle" font-size="14">★</text>`:''}
      ${n.entry?`<text x="${p.x}" y="${p.y+4}" text-anchor="middle" font-size="12">▶</text>`:''}
      <text x="${p.x}" y="${p.y+r+13}" text-anchor="middle" font-size="11" fill="#c9d4df">${esc(n.name.length>22?n.name.slice(0,21)+'…':n.name)}</text>
    </g>`;});
  svg+='</svg>';
  g.innerHTML=svg;
 }
 function renderChain(a){
  const el=document.getElementById('chain-out');
  if(!a.entryIds.length||!a.jewelIds.size){
    el.innerHTML=`<div class="kc-box"><b>No kill chain yet.</b><div class="note">Mark at least one node as an <span style="color:var(--entry)">entry point</span> and one as a <span style="color:var(--jewel)">crown jewel</span>, then connect them with moves.</div></div>`;return;}
  if(!a.chain.length){
    el.innerHTML=`<div class="kc-box"><b style="color:var(--p2)">No path found from any entry point to a crown jewel.</b><div class="note">Either the estate is genuinely segmented here (good — note it), or you haven't mapped the connecting moves yet. In unknown territory, assume the latter until proven.</div></div>`;return;}
  const nm=id=>nodes.find(n=>n.id===id);
  let html=`<div class="kc-box"><h2 style="color:var(--p0);margin-top:0">⛓ The kill chain<span class="hint">Lowest-effort path from foothold to existential impact. Total adversary effort: ${a.dist[a.best]}.</span></h2>`;
  a.chain.forEach((id,i)=>{
    const n=nm(id);
    html+=`<div class="kc-step"><div class="kc-node">
      <div class="n">${esc(n.name)} ${n.entry?'<span class="pill entry">entry</span>':''} ${n.jewel?'<span class="pill jewel">jewel</span>':''}</div>
      <div class="m">${TYPELBL[n.type]||n.type}${n.tier?' · '+n.tier:''}${n.note?' · '+esc(n.note):''}</div>
    </div></div>`;
    if(i<a.chainEdges.length){const e=a.chainEdges[i];
      html+=`<div class="kc-arrow">↓</div><div class="kc-mech">${esc(e.mech)||'move'} · effort ${e.w}</div>`;}
  });
  html+=`<div class="note" style="margin-top:10px">Every node on this path is a <b style="color:var(--p0)">P0</b>. Fix the chain first — break any single link and the existential path is severed. After the incident, ask: did this chain get <i>shorter</i>?</div></div>`;
  el.innerHTML=html;
 }
 function renderSummary(a){
  const counts={P0:0,P1:0,P2:0};
  nodes.forEach(n=>{counts[priority(n,a)]++;});
  const qc={crit:0,sev:0,std:0,dark:0,house:0};
  nodes.forEach(n=>qc[quantum(n,a)]++);
  document.getElementById('summary').innerHTML=`
    <div class="stat"><span>Nodes mapped</span><b>${nodes.length}</b></div>
    <div class="stat"><span>Attacker moves</span><b>${edges.length}</b></div>
    <div class="stat"><span>Entry points</span><b>${a.entryIds.length}</b></div>
    <div class="stat"><span>Crown jewels</span><b>${a.jewelIds.size}</b></div>
    <div class="stat"><span style="color:var(--p0)">Kill-chain length</span><b style="color:var(--p0)">${a.chain.length||'—'}</b></div>
    <div class="stat"><span style="color:var(--p0)">P0 nodes (on shortest chain)</span><b style="color:var(--p0)">${counts.P0}</b></div>
    <div class="stat"><span style="color:var(--p1)">P1 nodes (on a chain)</span><b style="color:var(--p1)">${counts.P1}</b></div>
    <div class="stat"><span style="color:var(--darkq)">Dark quanta (unsized)</span><b style="color:var(--darkq)">${qc.dark}</b></div>`;
 }
 function renderQuanta(a){
  const buckets={crit:[],sev:[],std:[],dark:[]};
  nodes.forEach(n=>{const q=quantum(n,a);if(buckets[q])buckets[q].push(n);});
  const order=['crit','sev','std','dark'];
  let html='';
  order.forEach(k=>{
    const list=buckets[k];if(!list.length)return;
    const m=QMETA[k];
    html+=`<div class="q ${m.cls}"><div class="qh"><span>${m.label}</span><span class="budget">${m.budget}</span></div>`;
    list.forEach(n=>{
      const action = k==='crit'?'Sever reachability / compensating control now'
        : k==='sev'?'Remediate in next change window, verify enforcement'
        : k==='std'?'Batch into sprint; this is where patch velocity fits'
        : 'Characterise: establish reachability & exploitability';
      html+=`<div class="qi"><div class="qn">${esc(n.name)}</div><div class="qd">${action}${n.note?' — '+esc(n.note):''}</div></div>`;
    });
    html+='</div>';
  });
  if(!html) html='<div class="empty">Quanta appear once nodes sit on a kill chain. Map entries, jewels, and the moves between.</div>';
  document.getElementById('quanta').innerHTML=html;
 }
 /* ---------------- import / export ---------------- */
 function exportJSON(){
  dl('kill-chain-assessment.json', JSON.stringify({nodes,edges,exported:new Date().toISOString()},null,2));
 }
 function importJSON(ev){
  const f=ev.target.files[0];if(!f)return;
  const r=new FileReader();
  r.onload=()=>{try{const s=JSON.parse(r.result);nodes=s.nodes||[];edges=s.edges||[];clearNodeForm();render();}catch(e){alert('Could not read that file.');}};
  r.readAsText(f); ev.target.value='';
 }
 function exportMD(){
  const a=analyse();const nm=id=>{const n=nodes.find(x=>x.id===id);return n?n.name:'?';};
  let md=`# Kill Chain Assessment\n\n_Generated ${new Date().toLocaleString()} · Brownhat / CQRE_\n\n`;
  md+=`## Summary\n\n- Nodes mapped: ${nodes.length}\n- Attacker moves: ${edges.length}\n- Entry points: ${a.entryIds.length}\n- Crown jewels: ${a.jewelIds.size}\n- Kill-chain length: ${a.chain.length||'—'}\n\n`;
  if(a.chain.length){
    md+=`## The kill chain (shortest existential path)\n\nLowest-effort path from foothold to existential impact (total adversary effort ${a.dist[a.best]}):\n\n\`\`\`\n`;
    a.chain.forEach((id,i)=>{md+=`${nm(id)}`;if(i<a.chainEdges.length)md+=`\n    → [${a.chainEdges[i].mech||'move'} · effort ${a.chainEdges[i].w}]\n`;});
    md+=`\n\`\`\`\n\nEvery node on this path is a **P0**. Break any single link to sever the existential path.\n\n`;
  } else {
    md+=`## The kill chain\n\nNo path from an entry point to a crown jewel was mapped. Either the estate is segmented here, or the connecting moves are not yet discovered.\n\n`;
  }
  // quanta
  const buckets={crit:[],sev:[],std:[],dark:[]};nodes.forEach(n=>{const q=quantum(n,a);if(buckets[q])buckets[q].push(n);});
  md+=`## Remediation quanta\n\n`;
  [['crit','Critical quantum — hours (compensating control, not the patch)'],
   ['sev','Severe quantum — days (one change window)'],
   ['std','Standard quantum — sprint (patch velocity fits here)'],
   ['dark','Dark quantum — unsized (route to discovery)']].forEach(([k,t])=>{
     if(!buckets[k].length)return;
     md+=`### ${t}\n\n`;
     buckets[k].forEach(n=>{md+=`- **${n.name}**${n.tier?` (${n.tier})`:''}${n.note?` — ${n.note}`:''} _(reach:${n.reach}, exploit:${n.expl}${n.comp?', compensated':''})_\n`;});
     md+=`\n`;
   });
  // findings table
  md+=`## All nodes by priority\n\n| Node | Layer | Tier | Priority | Quantum | Reach | Exploit |\n|---|---|---|---|---|---|---|\n`;
  const pri=n=>priority(n,a);
  nodes.slice().sort((x,y)=>({P0:0,P1:1,P2:2}[pri(x)]-{P0:0,P1:1,P2:2}[pri(y)])).forEach(n=>{
    md+=`| ${n.name} | ${TYPELBL[n.type]||n.type} | ${n.tier||'—'} | ${(a.onShortest.has(n.id)||a.onAnyChain.has(n.id))?pri(n):'off-chain'} | ${QMETA[quantum(n,a)].label} | ${n.reach} | ${n.expl} |\n`;
  });
  md+=`\n---\n\n_See Book VII — Vulnerability Management and the Quantum Vulnerability Management framework for how to size and drain these quanta._\n`;
  dl('kill-chain-assessment.md', md);
 }
 function dl(name,content){
  const b=new Blob([content],{type:'text/plain'});const u=URL.createObjectURL(b);
  const a=document.createElement('a');a.href=u;a.download=name;a.click();URL.revokeObjectURL(u);
 }
 /* ---------------- sample (repo: mid-market engagement) ---------------- */
 function loadSample(){
  if(nodes.length && !confirm('Replace current assessment with the sample engagement?'))return;
  nodes=[
    mk('Stale contractor credential','identity','',{entry:1,reach:'yes',expl:'yes',note:'Active 6 months after offboarding; no MFA'}),
    mk('Internet-facing VPN (legacy firmware)','entry','',{entry:1,reach:'yes',expl:'yes',note:'Cisco ASA, firmware 18mo stale, no MFA'}),
    mk('M365 / Entra ID','identity','T1',{reach:'yes',expl:'yes',note:'34% sign-ins without MFA; CA in report-only'}),
    mk('SharePoint / Teams / Exchange','data','T1',{reach:'yes',expl:'no',note:'All collaboration data + email'}),
    mk('Entra admin account','privilege','T0',{reach:'yes',expl:'yes',note:'Reachable via password spray'}),
    mk('Entra Connect sync account','privilege','T0',{reach:'yes',expl:'yes',note:'Has DCSync rights on-prem'}),
    mk('On-prem Active Directory','privilege','T0',{jewel:0,reach:'yes',expl:'yes',note:'KRBTGT never rotated (847d)'}),
    mk('SAP ERP','infra','T1',{jewel:1,reach:'unknown',expl:'unknown',note:'Financial + operational; default creds on secondary instance'}),
    mk('Backups (same segment as ERP)','recovery','T1',{jewel:1,reach:'yes',expl:'yes',comp:0,note:'Never restore-tested; reachable from estate'})
  ];
  const id=n=>nodes.find(x=>x.name.startsWith(n)).id;
  edges=[
    ed('Stale contractor','M365','Credential valid, no MFA',1),
    ed('Internet-facing VPN','On-prem','VPN auth → internal network',1),
    ed('M365','SharePoint','Token grants data access',1),
    ed('M365','Entra admin','Password spray → privilege escalation',2),
    ed('Entra admin','Entra Connect','Admin controls sync identity',2),
    ed('Entra Connect','On-prem','DCSync via sync-account rights',2),
    ed('On-prem','SAP ERP','Domain creds reused on ERP',3),
    ed('On-prem','Backups','Backups reachable from domain',1),
    ed('SAP ERP','Backups','Same network segment',1)
  ];
  function mk(name,type,tier,o){return Object.assign({id:uid(),name,type,tier,entry:!!o.entry,jewel:!!o.jewel,reach:o.reach||'unknown',expl:o.expl||'unknown',comp:!!o.comp,note:o.note||''},{});}
  function ed(a,b,mech,w){return {id:uid(),from:id(a),to:id(b),mech,w};}
  clearNodeForm();render();
 }
 /* ---------------- boot ---------------- */
 restore();
 if(!nodes.length) loadSample(); else render();
 </script>
 </body>
 </html>
Author	SHA1	Message	Date
tomas.kracmar	173704eca5	feat: Add vulnerability-management arc — Book VII, quantum framework, ORION, and kill-chain assessment tool	2026-06-15 07:56:50 +02:00
tomas.kracmar	633f82c5a7	feat: Add four consultant assignments (identity, CA, Intune, collaboration)	2026-06-09 16:56:48 +02:00
tomas.kracmar	7ff4fad953	feat: Add management overlay pattern (Nebula T0 / Tailscale T1) and cloud admin VM guidance	2026-06-09 14:40:34 +02:00