Files
antifragile/antifragile-consulting/core/quantum-vulnerability-management.md

12 KiB
Raw Permalink Blame History

Quantum Vulnerability Management

"You do not have 40,000 critical vulnerabilities. You have ~400 that are real, ~40 that are on fire, and a process that cannot tell them apart. Quantum vulnerability management is the discipline of sizing remediation to the time you actually have — and of admitting that the unit of work was never the vulnerability. It was the path."

This is the operating framework behind Book VII — Vulnerability Management. Book VII is the philosophy; this is the model a consultant runs in an engagement. It pairs with the Kill Chain Assessment app (which sizes the quanta) and the AI-Assisted TVM Blueprint (which automates the hours-lane).


The problem in one paragraph

Time-to-exploit has collapsed to roughly 4 hours while median remediation sits at 43 days; CVE volume has gone past 59,000/year and the public enrichment data (NVD) is degrading; and as of the 2026 Verizon DBIR, vulnerability exploitation is the #1 initial-access vector, roughly twice phishing. A human-paced, CVSS-sorted patch programme cannot close a gap that runs the wrong way by two orders of magnitude. The answer is not "patch faster." It is to stop using the vulnerability list as the unit of work, size remediation into time-budgeted quanta, contain the few that matter in hours, make the rest not matter through architecture, and feed every exploited path back into a shorter kill chain.


What a quantum is

A quantum is the smallest unit of remediation that:

  1. Fully closes a specific exploitable path — not a CVE in the abstract, a path an adversary could actually walk.
  2. Is sized to a time budget it can actually be completed within — hours, days, or a sprint.
  3. Ends in a verifiable signal — a test that proves the path is closed, not a ticket marked done.

The word is chosen deliberately:

  • Atomic. You cannot ship half a quantum and claim half the protection. A patch on 80% of the fleet, or a rule applied but never verified to block, is a ghost patch — fully exploitable and now invisible. A quantum is all-or-nothing.
  • Discrete. Work is packetised into units that fit the time available, not smeared across an infinite backlog. An undifferentiated backlog has no front; quanta give it one.

The sort key: time-to-existential-impact

Quanta are ordered not by severity but by time-to-existential-impact, a function of three things the environment determines — not the CVE:

time-to-existential-impact = f( kill-chain position, reachability, exploit availability )

Factor Question Where it comes from
Kill-chain position Does this sit on a path to existential compromise? Kill Chain Assessment app, BloodHound, the diagnostic
Reachability Can the adversary actually get to it (internet-facing, one hop from T0, behind segmentation)? Network topology, external scan, Perimeter Scanning
Exploit availability Is there a working exploit in the wild now? CISA KEV, exploit databases, threat intel

The same CVE has a different quantum on different assets, because position, not severity, sets the clock. A 9.8 on a segmented, unreachable, non-privileged host is a sprint quantum. A 7.5 on an internet-facing box one hop from a domain controller is an hours quantum. This is the Book I principle — kill-chain position changes the priority, not the score — made operational.


The four quanta

Quantum Time budget What's in it The response Lane character
Critical Hours On the kill chain, reachable, exploit available now Compensating control, not the patch — sever reachability, edge-block, isolate, disable feature. Patch follows later. Must be partly autonomous; human at policy boundary
Severe Days Material risk; reachable with friction, or partial compensating cover Batched, completed and verified inside one short change window Human-run, tightly scheduled
Standard Sprint The long, real, non-urgent tail Drained in sprint-sized batches that can actually be finished; this is where patch velocity is the right tool Routine engineering rhythm
Dark Unsized Can't see the asset, can't establish reachability, can't determine exploitability Route to discovery — turn an uncharacterised risk into a sized quantum Discovery, not remediation

Why "compensating control, not the patch" for the critical quantum

You cannot meet an hours budget with a vendor patch cycle, and often the patch does not exist yet. So the critical quantum's job is not to fix the vulnerability — it is to move the asset out of the hours-window by the cheapest fast control available: cut the reachability, block at the edge, isolate the host, disable the vulnerable feature, pull it behind the WAF. A 4-hour time-to-impact becomes a non-urgent one, and the actual patch drops into the standard lane on the normal change calendar. Reachability is almost always faster to change than a patch is to ship — which makes reachability the fastest remediation you own.

Why the dark quantum is the most dangerous

The old model ignores the dark quantum because it has no score. That is exactly backwards: an uncharacterised risk on an unknown asset is how estates die. A known severe is safer than an unknown nothing, because you can plan around the known one. The antifragile move is to spend judgement converting dark quanta into sized ones — which is why discovery (the Kill Chain Assessment app, zero-budget discovery, osquery) is part of vulnerability management, not separate from it.


The barbell: contain fast or architect away — never the fragile middle

  CHEAP / FAST / REVERSIBLE                                 SLOW / STRUCTURAL / DURABLE
  Hours-lane compensating controls                          Segmentation, least privilege,
  (edge block, isolate, cut reachability)                   T0 protection, assume-breach
  ── wins the time race the patch can't ──                  ── makes ~90% of vulns not matter ──
            ◄──────────────  THE FRAGILE MIDDLE TO AVOID  ──────────────►
            The aging "critical patch backlog": carries hours-lane urgency,
            moves at sprint-lane speed. Max anxiety, min protection,
            and the attacker clears it for you one exploited host at a time.

Both ends of the barbell are convex (small cost, large payoff — Pillar 5). The fragile middle is concave (maximum cost, minimum return). The rule: contain it fast, or architect it away. Never let it age in the middle.


The ~90% subtraction — via negativa applied to the list

The single highest-leverage move, and it is pure subtraction. Industry data suggests roughly 90% of "critical" vulnerabilities are not exploitable in a given environment once compensating controls, reachability, and segmentation are mapped. So before adding any work:

  1. Map, per asset: internet reachability, EDR coverage, WAF rules, segmentation distance from T0.
  2. Delete the false urgency on everything segmented, unreachable, or already neutralised.
  3. What remains — the genuinely reachable, genuinely exploitable ~10% — is the only thing the hours- and days-lanes ever touch.

This turns "40,000 criticals" into a few hundred real findings and a few dozen on fire. The compensating-control map that makes it possible is the single most valuable artefact in the programme — build it before the incident, because during a zero-day it answers "are we actually exposed?" in minutes instead of days. The caveat (Book I): a mapped control that has rotted into a ghost is a false negative. Test the controls you are counting on; do not trust the map.


The feedback loop — the antifragile difference

A vulnerability that was exploited or nearly exploited is the cheapest penetration test you will ever get. Patching the CVE wastes the data. The antifragile move is to sever the path the attacker used — boundary the flat segment, collapse the over-privileged service account, pull the reachable management interface behind the bastion — so the next vulnerability that lands there is a non-event before it is even disclosed.

The metric is not MTTR. It is: did the kill chain get shorter? Ten incidents that produce ten patches and zero severed paths mean you are merely fast. Ten incidents that produce six structurally shortened kill chains mean the estate is getting harder to compromise every time it is tested — the only honest definition of antifragile.


Running it in an engagement — the sequence

  1. Discover — run the Kill Chain Assessment app to map assets, reachability, and the shortest existential path. Anything you cannot characterise is a dark quantum; route it to deeper discovery.
  2. Subtract — apply the ~90% reduction using the compensating-control and reachability map. Delete false urgency.
  3. Size — place every remaining real finding into a quantum (critical / severe / standard) by time-to-existential-impact.
  4. Contain the hours-lane — apply compensating controls to the critical quantum today, autonomously where guardrails allow (AI-Assisted TVM). Verify each closes with a signal.
  5. Batch the rest — days-lane in the next change window, sprint-lane in the engineering rhythm.
  6. Architect away the middle — feed the recurring paths into segmentation and least-privilege work (Books IIV) so the same class of vulnerability stops mattering.
  7. Close the loop — after every exploited-or-near finding, ask what path got shorter, and track that number over time.

What to measure

Metric Why it matters Antifragile target
Critical-quantum containment time The hours-lane is the race you must not lose Hours, trending down
% of "criticals" confirmed reachable Proves the ~90% subtraction is real, not assumed Known, not "unknown"
Ghost-patch rate (closed-but-unverified) Half-done remediation is hidden full exposure Zero — every quantum closes with a signal
Dark-quantum count Uncharacterised risk is the dangerous kind Shrinking; each one converted to sized
Kill-chain length after incidents The only measure of getting stronger Shorter after each exploited-or-near event
Items aging in the fragile middle The concave zone the barbell forbids Zero — contained or architected, never aging

Honest uncertainty

The headline statistics (the 4-hour, 43-day, ~59,000-CVE, ~90%-not-exploitable, and "#1, ~2× phishing" figures) are point-in-time and churn annually — re-check them against the current DBIR, M-Trends, and FIRST/CVE data before putting them on a slide. The direction is the stable signal; the numbers move. The autonomous-execution tooling for the hours-lane is real but immature and fast-moving — verify current capability and failure modes, and start with reversible compensating controls, never irreversible change. What does not churn: kill-chain position beats CVSS, most criticals aren't reachable, a half-done remediation is a hidden full vulnerability, and every exploited path should shorten the chain.


See Book VII — Vulnerability Management for the full philosophy, Kill Chain Assessment app for sizing the quanta in unknown territory, and AI-Assisted TVM Blueprint for automating the hours-lane.