feat: Add vulnerability-management arc — Book VII, quantum framework, ORION, and kill-chain assessment tool

This commit is contained in:
2026-06-15 07:56:50 +02:00
parent 633f82c5a7
commit 173704eca5
9 changed files with 1357 additions and 2 deletions
@@ -0,0 +1,133 @@
# Quantum Vulnerability Management
> *"You do not have 40,000 critical vulnerabilities. You have ~400 that are real, ~40 that are on fire, and a process that cannot tell them apart. Quantum vulnerability management is the discipline of sizing remediation to the time you actually have — and of admitting that the unit of work was never the vulnerability. It was the path."*
This is the operating framework behind [Book VII — Vulnerability Management](../books/06-vulnerability-management.md). Book VII is the philosophy; this is the model a consultant runs in an engagement. It pairs with the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md) (which sizes the quanta) and the [AI-Assisted TVM Blueprint](../playbooks/ai-assisted-tvm.md) (which automates the hours-lane).
---
## The problem in one paragraph
Time-to-exploit has collapsed to roughly **4 hours** while median remediation sits at **43 days**; CVE volume has gone past **59,000/year** and the public enrichment data (NVD) is degrading; and as of the **2026 Verizon DBIR, vulnerability exploitation is the #1 initial-access vector, roughly twice phishing.** A human-paced, CVSS-sorted patch programme cannot close a gap that runs the wrong way by two orders of magnitude. The answer is not "patch faster." It is to **stop using the vulnerability list as the unit of work**, size remediation into time-budgeted quanta, contain the few that matter in hours, make the rest not matter through architecture, and feed every exploited path back into a shorter kill chain.
---
## What a quantum is
A **quantum** is the smallest unit of remediation that:
1. **Fully closes a specific exploitable path** — not a CVE in the abstract, a path an adversary could actually walk.
2. **Is sized to a time budget it can actually be completed within** — hours, days, or a sprint.
3. **Ends in a verifiable signal** — a test that proves the path is closed, not a ticket marked done.
The word is chosen deliberately:
- **Atomic.** You cannot ship half a quantum and claim half the protection. A patch on 80% of the fleet, or a rule applied but never verified to block, is a *ghost patch* — fully exploitable and now invisible. A quantum is all-or-nothing.
- **Discrete.** Work is packetised into units that fit the time available, not smeared across an infinite backlog. An undifferentiated backlog has no front; quanta give it one.
---
## The sort key: time-to-existential-impact
Quanta are ordered not by severity but by **time-to-existential-impact**, a function of three things the *environment* determines — not the CVE:
> **time-to-existential-impact = f( kill-chain position, reachability, exploit availability )**
| Factor | Question | Where it comes from |
|--------|----------|---------------------|
| **Kill-chain position** | Does this sit on a path to existential compromise? | [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md), BloodHound, the diagnostic |
| **Reachability** | Can the adversary actually get to it (internet-facing, one hop from T0, behind segmentation)? | Network topology, external scan, [Perimeter Scanning](../playbooks/perimeter-scanning-capability.md) |
| **Exploit availability** | Is there a working exploit in the wild now? | CISA KEV, exploit databases, threat intel |
The same CVE has a different quantum on different assets, because position, not severity, sets the clock. **A 9.8 on a segmented, unreachable, non-privileged host is a sprint quantum. A 7.5 on an internet-facing box one hop from a domain controller is an hours quantum.** This is the Book I principle — kill-chain position changes the priority, not the score — made operational.
---
## The four quanta
| Quantum | Time budget | What's in it | The response | Lane character |
|---------|-------------|--------------|--------------|----------------|
| **Critical** | **Hours** | On the kill chain, reachable, exploit available now | **Compensating control, not the patch** — sever reachability, edge-block, isolate, disable feature. Patch follows later. | Must be partly **autonomous**; human at policy boundary |
| **Severe** | **Days** | Material risk; reachable with friction, or partial compensating cover | Batched, completed and verified inside one short change window | Human-run, tightly scheduled |
| **Standard** | **Sprint** | The long, real, non-urgent tail | Drained in sprint-sized batches that can actually be finished; this is where patch velocity is the right tool | Routine engineering rhythm |
| **Dark** | **Unsized** | Can't see the asset, can't establish reachability, can't determine exploitability | **Route to discovery** — turn an uncharacterised risk into a sized quantum | Discovery, not remediation |
### Why "compensating control, not the patch" for the critical quantum
You cannot meet an hours budget with a vendor patch cycle, and often the patch does not exist yet. So the critical quantum's job is **not to fix the vulnerability — it is to move the asset out of the hours-window** by the cheapest fast control available: cut the reachability, block at the edge, isolate the host, disable the vulnerable feature, pull it behind the WAF. A 4-hour time-to-impact becomes a non-urgent one, and the actual patch drops into the standard lane on the normal change calendar. Reachability is almost always faster to change than a patch is to ship — which makes **reachability the fastest remediation you own.**
### Why the dark quantum is the most dangerous
The old model ignores the dark quantum because it has no score. That is exactly backwards: an uncharacterised risk on an unknown asset is how estates die. A *known* severe is safer than an *unknown* nothing, because you can plan around the known one. The antifragile move is to spend judgement converting dark quanta into sized ones — which is why discovery (the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md), [zero-budget discovery](../playbooks/zero-budget-vulnerability-discovery.md), osquery) is part of vulnerability management, not separate from it.
---
## The barbell: contain fast or architect away — never the fragile middle
```
CHEAP / FAST / REVERSIBLE SLOW / STRUCTURAL / DURABLE
Hours-lane compensating controls Segmentation, least privilege,
(edge block, isolate, cut reachability) T0 protection, assume-breach
── wins the time race the patch can't ── ── makes ~90% of vulns not matter ──
◄────────────── THE FRAGILE MIDDLE TO AVOID ──────────────►
The aging "critical patch backlog": carries hours-lane urgency,
moves at sprint-lane speed. Max anxiety, min protection,
and the attacker clears it for you one exploited host at a time.
```
Both ends of the barbell are convex (small cost, large payoff — Pillar 5). The fragile middle is concave (maximum cost, minimum return). The rule: **contain it fast, or architect it away. Never let it age in the middle.**
---
## The ~90% subtraction — via negativa applied to the list
The single highest-leverage move, and it is pure subtraction. Industry data suggests **roughly 90% of "critical" vulnerabilities are not exploitable in a given environment** once compensating controls, reachability, and segmentation are mapped. So before adding any work:
1. Map, per asset: internet reachability, EDR coverage, WAF rules, segmentation distance from T0.
2. Delete the false urgency on everything segmented, unreachable, or already neutralised.
3. What remains — the genuinely reachable, genuinely exploitable ~10% — is the only thing the hours- and days-lanes ever touch.
This turns "40,000 criticals" into a few hundred real findings and a few dozen on fire. The compensating-control map that makes it possible is **the single most valuable artefact in the programme** — build it before the incident, because during a zero-day it answers "are we actually exposed?" in minutes instead of days. The caveat (Book I): a mapped control that has rotted into a ghost is a false negative. **Test the controls you are counting on; do not trust the map.**
---
## The feedback loop — the antifragile difference
A vulnerability that was exploited or nearly exploited is the cheapest penetration test you will ever get. Patching the CVE wastes the data. The antifragile move is to **sever the path** the attacker used — boundary the flat segment, collapse the over-privileged service account, pull the reachable management interface behind the bastion — so the *next* vulnerability that lands there is a non-event before it is even disclosed.
**The metric is not MTTR. It is: did the kill chain get shorter?** Ten incidents that produce ten patches and zero severed paths mean you are merely fast. Ten incidents that produce six structurally shortened kill chains mean the estate is getting harder to compromise every time it is tested — the only honest definition of antifragile.
---
## Running it in an engagement — the sequence
1. **Discover** — run the [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md) to map assets, reachability, and the shortest existential path. Anything you cannot characterise is a dark quantum; route it to deeper discovery.
2. **Subtract** — apply the ~90% reduction using the compensating-control and reachability map. Delete false urgency.
3. **Size** — place every remaining real finding into a quantum (critical / severe / standard) by time-to-existential-impact.
4. **Contain the hours-lane** — apply compensating controls to the critical quantum *today*, autonomously where guardrails allow ([AI-Assisted TVM](../playbooks/ai-assisted-tvm.md)). Verify each closes with a signal.
5. **Batch the rest** — days-lane in the next change window, sprint-lane in the engineering rhythm.
6. **Architect away the middle** — feed the recurring paths into segmentation and least-privilege work (Books IIV) so the same class of vulnerability stops mattering.
7. **Close the loop** — after every exploited-or-near finding, ask what path got shorter, and track that number over time.
---
## What to measure
| Metric | Why it matters | Antifragile target |
|--------|----------------|--------------------|
| Critical-quantum containment time | The hours-lane is the race you must not lose | Hours, trending down |
| % of "criticals" confirmed reachable | Proves the ~90% subtraction is real, not assumed | Known, not "unknown" |
| Ghost-patch rate (closed-but-unverified) | Half-done remediation is hidden full exposure | Zero — every quantum closes with a signal |
| Dark-quantum count | Uncharacterised risk is the dangerous kind | Shrinking; each one converted to sized |
| **Kill-chain length after incidents** | The only measure of getting *stronger* | Shorter after each exploited-or-near event |
| Items aging in the fragile middle | The concave zone the barbell forbids | Zero — contained or architected, never aging |
---
## Honest uncertainty
The headline statistics (the 4-hour, 43-day, ~59,000-CVE, ~90%-not-exploitable, and "#1, ~2× phishing" figures) are point-in-time and churn annually — re-check them against the current DBIR, M-Trends, and FIRST/CVE data before putting them on a slide. The *direction* is the stable signal; the numbers move. The autonomous-execution tooling for the hours-lane is real but immature and fast-moving — verify current capability and failure modes, and start with reversible compensating controls, never irreversible change. What does not churn: kill-chain position beats CVSS, most criticals aren't reachable, a half-done remediation is a hidden full vulnerability, and every exploited path should shorten the chain.
---
*See [Book VII — Vulnerability Management](../books/06-vulnerability-management.md) for the full philosophy, [Kill Chain Assessment app](../playbooks/kill-chain-assessment-app.md) for sizing the quanta in unknown territory, and [AI-Assisted TVM Blueprint](../playbooks/ai-assisted-tvm.md) for automating the hours-lane.*