Files
antifragile/antifragile-consulting/core/move-fast-and-fix-things.md
T
Claude Sonnet 4.6 5c4e91179d feat: Add findings backlog as pragmatic alternative to risk register
New: assessment-templates/findings-backlog.md
  Design principles: lives where client works, every finding has an owner,
  feeds the housekeeping stream, accumulates from all sources.
  Format: 6-field minimal entry (ID, finding, source, priority, owner,
  status) with optional target date/effort/notes/closed date.
  P0/P1/P2 priority using kill chain test.
  Flat file template for Git-based clients.
  Population guide: Day 30 (from Brownhat), subsequent modules, continuous
  tools (ASTRAL drift, PULSAR alerts, Elysium, BloodHound).
  Monthly housekeeping cycle structure.
  Relationship to formal risk register explained.
  Backlog health indicators (warning signs it is not functioning).

Wired into existing framework:
  move-fast-and-fix-things.md: Rule 4 now names the backlog as the queue
  rapid-modernisation-plan.md: Day 30 item 7 and Phase 1 action updated
  engagement-model.md: Section 4 deliverables table updated at all stages
  assessment-templates/README.md: Production-ready templates section added
  index.md: Findings Backlog added to Assessment and Tools table

Co-Authored-By: Tom Kracmar <tom+claude@cat6.cz>
2026-06-05 10:09:08 +00:00

28 KiB
Raw Blame History

Move Fast and Fix Things

"The best time to plant a tree was 20 years ago. The second best time is now. The worst time is after the storm has already knocked it down."

This document anchors the antifragile consulting practice in a single, actionable posture: move fast and fix things. It is not a contradiction of Taleb's philosophy—it is its operational expression. Antifragility is not achieved by standing still and theorizing. It is earned by rapid iteration, honest repair, and the refusal to let perfect be the enemy of resilient.


The Brownhat Methodology

This practice operates under the Brownhat brand when engaging clients. The name is deliberate:

  • Brownfield — industrial land that has been used, built on, and left with the legacy of past decisions. Every mature organisation's security environment is a brownfield: layers of partially implemented tools, forgotten configurations, and "temporary" solutions that became permanent.
  • Blackhat / Whitehat — the security domain's language for attackers and defenders. Brownhat sits between them: we understand how attackers think, but we are here to recultivate the environment, not exploit it.

"Brownhat is not a methodology for greenfield deployments. It is for organisations that have been building, acquiring, and running for years — and need someone to recultivate what they have before adding anything new."

What Brownhat signals to clients:

  • We are not going to sell you a new platform to replace the one you already own.
  • We are going to understand your environment as it actually is, not as it was designed to be.
  • We are going to extract maximum value from existing investments before recommending anything new.
  • We are going to be honest about what you have, what it costs, and what it would take to fix it.

The Brownhat Diagnostic — a structured NIST CSF 2.0 baseline assessment — is the named entry engagement for new clients. It is how we earn the right to recommend anything.



The Philosophy

Speed Is a Security Control

The organizations that survive are not the ones with the most comprehensive plans. They are the ones that execute fastest against the gaps that actually matter. A realistic engagement delivers 3060% of an ideal posture in 180 days. That is the honest target. It is also, in almost every case, an enormous improvement over what existed before — and infinitely better than the 100% solution that stays in planning and never ships.

The correct comparison is not "30% today vs. 100% in six months." It is "30% today vs. the 0% that will still be there in two years if you wait for the perfect plan." Momentum beats completeness. Imperfect progress beats perfect paralysis.

Fixing Things Is Strategic

Every unfixed vulnerability, orphaned account, and untested backup is a compounding liability. Technical debt in security does not accrue interest linearly. It accrues catastrophically. The longer a gap exists, the more likely it becomes the entry point for an existential incident.

Fixing things is not maintenance. It is risk reduction at velocity.

Work Beats Purchases

Most organizations do not have a tools problem. They have a utilization problem. They own EDR but have 40% coverage. They own a SIEM but log only 20% of critical systems. They own a PAM solution but have not onboarded privileged accounts. They own backup software but have never tested a restore.

The antifragile consultant's first duty is not to recommend new spending. It is to extract the value already paid for.


The Five Rules

Rule 1: Start With What You Own

Before any new purchase is discussed, exhaust the capabilities of existing tooling. This is not cheapness. It is optionality preservation: every dollar not spent on redundant tooling is a dollar available for structural improvement.

Common Underutilized Asset What Most Organizations Do What We Do
Microsoft E5 / Defender suite Buy additional EDR, SIEM, CASB Maximize Defender for Endpoint, Sentinel, Entra ID PIM, Purview
Existing firewall / IDS Buy another "next-gen" platform Audit rules, enable logging, integrate with SOC workflow
Active Directory Add third-party IAM Cleanse accounts, implement PAWs, enforce conditional access
Backup solution Buy additional DRaaS Test restores, document runbooks, automate verification
CMDB / ITAM tool Start a new discovery project Populate with T0 assets, enforce ownership, feed security workflow

Rule 2: Fix the Kill Chain First

Not all debt is equal. We identify the shortest sequence of failures that would end the organization—the kill chain—and we fix those nodes with extreme prejudice. Everything else waits.

This requires brutal honesty:

  • If your domain admins are logging in from workstations with email and browsing, that is the kill chain.
  • If your backups have never been restored, that is the kill chain.
  • If your cloud storage bucket is public and contains customer data, that is the kill chain.
  • If your CEO's email has no MFA, that is the kill chain.

We do not fix everything. We fix the existential things. Fast.

Rule 3: Every Fix Must Produce a Signal

A fix that does not generate intelligence is a fix that will rot. Every remediation must produce a signal: a metric, an alert, a log entry, or a structural change that prevents recurrence.

Bad Fix Good Fix
"We disabled the old account." "We disabled the old account and implemented automated orphan detection."
"We patched the server." "We patched the server and added it to automated vulnerability management."
"We rotated the password." "We rotated the password and vaulted it in the PAM with checkout logging."
"We fixed the firewall rule." "We fixed the firewall rule and added a monthly rule review to the change process."

Rule 4: Run Housekeeping as a Permanent Stream

This is the rule most often acknowledged and least often followed. In every engagement, cleanup is identified as necessary. In almost no engagement is it ever finished. Stale accounts accumulate. Orphaned permissions persist. Old devices stay enrolled. Legacy protocols remain enabled because removing them requires a change window that never gets scheduled.

The correct response is not to add cleanup to the project backlog. It is to establish housekeeping as a dedicated, permanently resourced stream with its own queue, its own cadence, and its own accountability.

Housekeeping is not janitorial work. It is attack surface reduction at a structural level. Every stale account is a credential that can be compromised without detection. Every orphaned permission is a privilege escalation path that BloodHound will find. Every legacy protocol still enabled is an authentication downgrade waiting to happen. The environment accumulates new objects continuously — every employee, every project, every vendor relationship adds accounts, permissions, and configurations. Almost nothing removes them automatically. Without a permanent housekeeping stream, the attack surface grows without bound regardless of what else you fix.

What housekeeping covers:

  • Stale user accounts: departed employees, contractors, service accounts with no owner
  • Orphaned group memberships and permissions that outlasted the project that created them
  • Old app registrations and service principals — often the most overlooked and most dangerous
  • Enrolled devices that are no longer in use
  • Conditional Access policies with no named owner and no documented purpose
  • Legacy protocols: NTLM, basic authentication, SMBv1, NTLMv1 — things that should have been disabled years ago
  • DNS records for decommissioned services
  • Firewall rules added for temporary access that became permanent
  • Old GPOs, old admin rights, old certificates

The engagement implication: Every module scoping conversation must include a housekeeping component. It is not optional and not deferrable. The client names a resource, a cadence (minimum monthly), and a queue. The queue is the Findings Backlog — the single place where every finding from every diagnostic and module lands, prioritised, owned, and tracked to closure. The backlog is populated from module findings and from continuous discovery tools (ASTRAL drift, PULSAR alerts, quarterly BloodHound and Elysium runs). Progress is tracked and reviewed at every steering committee. If there is no resourcing for housekeeping, the engagement model must reflect that — because every fix we make will be partially undone within 18 months by new accumulation if the stream does not exist.


Rule 5: Build Toward Greenfield Capability

The cheapest and fastest recovery from a serious breach is often a greenfield deployment — rebuilding the environment from scratch on clean infrastructure rather than remediating a compromised one. Most organisations treat this as a nightmare scenario. The goal is to treat it as a standard operational capability exercised every five years or so — not something that wakes you up at night, but something you have done before and know how to do again.

This is the ultimate defender's power move. An attacker's leverage in a breach depends largely on your inability to walk away from the compromised environment. If you can build the parallel company and burn the old one, that leverage disappears. Ransomware becomes an inconvenience rather than an existential event. The threat model changes fundamentally.

What greenfield capability requires:

  • Everything documented as code: infrastructure configuration, security baselines, identity architecture, network topology. If you cannot rebuild it from documentation in a clean environment, you do not own it — you are renting it from accumulated history.
  • Configuration under version control: M365 policy state in ASTRAL, infrastructure definitions in IaC, runbooks in a repository. The new environment can be provisioned from the same source of truth.
  • Clean data separation: you know where your data is, what form it is in, and how to migrate it. Data that cannot be migrated cleanly is a dependency you have not acknowledged.
  • Tested migration procedures: the greenfield capability is not real until it has been exercised. Partial migrations, parallel-environment tests, and recovery drills build the muscle. Each module completion should leave the client one step closer to a documented, tested rebuild path.
  • Vendor independence at critical layers: you cannot rebuild greenfield if the new environment depends on the same compromised vendor. Optionality (Pillar 2) is the prerequisite.

The cadence target: An organisation that can execute a planned greenfield migration in 90 days — with data integrity, minimal service disruption, and full security posture — is in a structurally different risk position than one for which greenfield is theoretical. This is not a one-time project. It is a capability you build, test, and maintain.

The controlled burn: forests that are never burned accumulate the fuel for catastrophic fires. Organisations that are never greenfield-deployed accumulate technical debt, legacy dependencies, and accumulated compromise that makes eventual failure more severe. Planned greenfield on a 5-year cycle is the controlled burn that prevents the uncontrolled one.

The Critical Infrastructure Adaptation

For organisations operating OT/NT environments — power generation, transmission, water utilities, telecoms network infrastructure — a full greenfield rebuild is often genuinely not possible. Protection relays run for 30 years. PLCs controlling turbines cannot be taken offline for a rebuild exercise. Safety systems require regulatory approval for any change. The controlled burn, taken literally, cannot be applied.

The goal remains the same. The method changes.

The purpose of greenfield capability is to eliminate inherited compromise and return to a known-good operational state. In OT environments, this is achieved through a different set of moves — but the test is identical: "If our control systems were completely compromised and had to be restored, could we maintain critical service delivery and return to full automated operation from a verified baseline?"

IT layer greenfield protects the OT layer. The corporate IT environment, SCADA servers, historian, HMI workstations, and M365 tenant can almost always be made greenfield-capable even when the OT hardware cannot. When the IT layer can be rebuilt clean, an adversary who compromised it loses their persistence and pivot path without a single OT system being touched. IT greenfield is the outer defence of an OT environment that cannot be rebuilt itself.

Configuration as code for OT. PLC logic, IED settings, protection relay configurations, SCADA databases, and DCS configurations belong in version control. The ability to restore a verified configuration to existing hardware is the OT equivalent of greenfield: the hardware stays, but the software state is erased and rebuilt from a known-good baseline. Configuration backup and integrity checking for OT systems is not optional — it is the closest available substitute for the rebuild capability that IT environments take for granted. ASTRAL for M365 is the pattern; the same discipline applied to OT configuration archives is the OT equivalent.

Manual operation capability is a form of "drop the compromised layer." A power utility that can maintain 80% of service from manual procedures during a SCADA compromise has a fundamentally different risk profile than one that cannot. The ability to operate without the automation layer is, in effect, the ability to sacrifice the compromised layer and continue. Manual override procedures, validated quarterly, are the OT sector's equivalent of a tested greenfield playbook. If operators have not practised running manually in the past 12 months, the capability does not exist.

Compartmentalisation over total rebuild. OT environments are often sectionable. Grid islanding, corridor isolation, plant-level segmentation, and control centre failover allow the operator to sacrifice a section while maintaining critical service elsewhere. The burn is localised rather than total — but the principle is the same: designed-in ability to contain, recover, and restore in sequence rather than all at once.

Long-cycle planned refresh. OT systems have 2040 year lifetimes, but those lifetimes should be planned, not accidental. A utility with a documented 20-year OT refresh programme — component-by-component replacement milestones, firmware escrow, spare parts inventory — is doing the OT equivalent of periodic greenfield: the environment is continuously re-established in controlled segments. Organisations that do not have this programme are not avoiding greenfield; they are deferring it until a crisis forces it under the worst possible conditions.

What the test looks like for OT: "If our SCADA and IT layers were fully compromised tonight, could we maintain critical service from manual procedures within 4 hours, rebuild the IT layer from clean baselines within 48 hours, and restore full automated operation from verified OT configuration backups within two weeks?" If any of those answers is no, the gap is in manual procedures, IT rebuild capability, or OT configuration management — not in greenfield per se, but in the prerequisites that make any form of recovery possible.

For the full OT/critical infrastructure treatment, see Vertical: Power and Utilities.


Mapping to Antifragile Pillars

Antifragile Pillar Move Fast and Fix Things Expression
Structural Decoupling Identify and eliminate hidden dependencies before they become fatal. Greenfield capability is the ultimate expression: if you can rebuild cleanly, no single vendor or compromise holds you hostage.
Optionality Preservation Maximize existing investments to preserve budget for strategic optionality. Greenfield deployment requires vendor independence at every critical layer — build and maintain that independence now.
Stress-to-Signal Conversion Every fix must generate telemetry. Incidents are not failures; they are unpaid penetration tests. Convert their lessons into structure.
Sovereign Intelligence Use what you own first. Your data, your configurations, your runbooks — all under version control, all portable, all yours. Housekeeping keeps it clean. Greenfield capability proves it.
Asymmetric Payoff Design Small, fast fixes on the kill chain yield disproportionate risk reduction. Housekeeping and greenfield capability are the highest-leverage long-term investments: small ongoing cost, enormous reduction in catastrophic risk.

Mapping to Standards

We do not treat compliance as the goal. We treat it as a side effect of doing the right things fast.

Standard How We Map
CIS Controls v8 IG1 is the floor, not the ceiling. We aim for IG1 completeness in 90 days because it is the minimum viable security posture. See CIS Controls Mapping.
NIST CSF 2.0 We align to Identify, Protect, Detect, Respond, Recover—but we emphasize GOVERN as the missing piece in most organizations. See NIST CSF Mapping.
ISO 27001 Annex A controls are addressed through the kill chain-first methodology, not checklist compliance.
DORA / NIS2 Operational resilience and ICT risk management are natural outcomes of the antifragile rapid-modernisation approach.

The Consultant's Stance

When you walk into a client environment, bring these assumptions:

  1. They already own enough software. Your job is to configure, integrate, and operationalize—not to shop.
  2. Their technical debt is worse than they admit. Your job is to find the kill chain and fix it without shaming.
  3. Speed builds trust. A visible fix in week one is worth more than a perfect report in week twelve.
  4. Honesty is the product. You are not a reseller. You are an independent advisor. Say what you would do with your own company's data.

The Opening Pitch

"Most consultants will sell you a shopping list. We start with what you already bought. Our job is to find the gaps that matter, fix them fast, and make sure they stay fixed. We move fast. We fix things. And we do it with the tools you already own."


Engagement Principles

Week 1: Brutal Honesty Audit

  • Inventory existing tooling and its utilization rate
  • Identify the kill chain
  • Pick three fixes that can be completed before the next steering committee
  • Execute them

Month 1: Momentum Through Visibility

  • Show the client what they could not see before
  • Close the highest-risk gaps
  • Demonstrate value from existing tools
  • Build political capital for harder changes

Quarter 1: Structural Change

  • Convert fixes into process
  • Automate detection and response
  • Establish the antifragile feedback loop: incident → learning → structure

The AI Distraction

There is a recurring pattern in security consulting: a client opens with "we want AI-powered threat detection" or "can AI help us with our security posture?" and the instinct — especially from vendors — is to say yes and start selling.

The correct response is to ask: "Do your domain admins have MFA enforced?"

We call this pattern the AI Mythos: the belief that intelligence-layer tooling is the primary answer to security problems. It is not. AI is a multiplier. A multiplier applied to an absent foundation produces nothing. An AI-powered SOC that generates alerts from a network with no MFA, no patching cadence, and no tested backups is generating expensive noise about a patient who already has a terminal condition.

The Multiplier Principle

Security capabilities stack in layers. Each layer requires the layer below it to function.

Foundation   → Identity hygiene, endpoint coverage, patching, tested backups, basic logging
Signal       → Logging turned on, SIEM ingesting the right sources, alerts with owners
Intelligence → Detection engineering, threat hunting, AI-assisted analysis

AI lives at layer three. Organisations that have not completed layer one do not benefit from layer three — they buy something that has nothing to amplify.

The test: Ask "what would have stopped this breach?" For the overwhelming majority of incidents — credential theft, ransomware, insider threat, misconfiguration exploitation — the answer is a layer-one control: MFA, patched systems, least-privilege accounts, a working backup. Not AI detection. Not an AI SOC. Not AI-powered SIEM correlation.

The CIS Controls make this explicit. IG1 — 56 safeguards covering basic inventory, secure configuration, data protection, account management, patching, and backup — is the minimum viable security posture. Every organisation should complete IG1 before spending money on anything above it. AI-powered security tools are not IG1 controls. They are IG3 multipliers applied to an IG1 foundation.

What to Do When a Client Leads with AI

The client who opens with AI is not wrong to want it. They are wrong about sequencing. Your job is to redirect without dismissing.

The redirect:

"AI security tools are most valuable when you have a strong signal to amplify. The fastest path to benefiting from AI is making sure the basics are right first — because AI on a broken foundation is just expensive noise. Let's start with the Brownhat Diagnostic, find your kill chain, and close the gaps that AI can't compensate for. Then you'll actually get value from the AI layer on top."

This reframes AI as a reward for good hygiene, not a substitute for it. It respects the client's interest in AI while directing the budget where it produces real risk reduction.

The Sequencing Rule

The antifragile pillars are not equally weighted at the start of an engagement. They are sequenced:

  1. Structural Decoupling (Pillar 1) and Optionality Preservation (Pillar 2) are foundations — you establish these first by mapping and removing dangerous dependencies.
  2. Stress-to-Signal Conversion (Pillar 3) requires having something to instrument — logging, monitoring, telemetry. This is layer two.
  3. Sovereign Intelligence (Pillar 4) — AI sovereignty, local models, owned cognitive infrastructure — presupposes that you have a foundation worth protecting and a signal worth amplifying. It is not the starting point.
  4. Asymmetric Payoff Design (Pillar 5) is the lens applied throughout — concentrate effort where failure is existential.

A client excited about Pillar 4 who has not addressed Pillar 1 is building a sophisticated roof on a house with no walls.

What "Move Fast" Means Here

Moving fast does not mean buying AI tools quickly. It means closing the kill chain quickly — with unglamorous, proven controls that stop breaches:

  • Enforce MFA on every account. Today.
  • Patch internet-facing systems. This week.
  • Verify that backups restore. This month.
  • Remove stale privileged accounts. In week one.
  • Turn on logging where it is off. Before anything else.

These are not interesting. They are not cutting-edge. They are the interventions that would have prevented most of the incidents in the headlines. The AI tools that make headlines did not prevent those incidents.


When the Vulnerability Surface Is Effectively Infinite

Recent AI-assisted security research — including large-scale automated vulnerability discovery across entire software stacks — has surfaced a reality that was always true but is now undeniable: the number of exploitable vulnerabilities in any complex environment exceeds any organisation's capacity to patch them. This is not a new problem. It is a shift in visibility. The vulnerabilities existed before. We can now find them faster than we can fix them.

The vendor response to this is predictable: "You need AI-assisted patching." Faster discovery paired with faster remediation, AI all the way down.

This is the wrong frame. It accepts a race you cannot win.

The Architectural Response

The correct response to an effectively infinite vulnerability surface is not to patch faster. It is to move to a realm where most vulnerabilities matter less — by designing systems architecturally so that the exploitation of any single vulnerability does not lead to existential compromise.

This is not a new idea. It is the fundamental premise of defence in depth, blast radius limitation, and kill chain thinking. What has changed is the urgency: when AI can identify thousands of vulnerabilities across your stack in hours, the "patch-first" strategy is exposed as insufficient. The architectural strategy becomes the only viable long-term position.

The moves:

Kill chain awareness — Not every CVE is existential. The ones that matter are the ones that sit on the path from "nothing bad has happened yet" to "the organisation cannot operate." Concentrate protection there. A critical vulnerability in a segmented, non-privileged system is a low-priority finding. The same vulnerability on a domain controller, a backup server, or an OT control system is P0. The vulnerability is the same; the kill chain position is what changes the priority.

Blast radius limitation — Segmentation, least privilege, and structural decoupling mean that exploiting a vulnerability in one component cannot pivot freely through the environment. A flat network with over-privileged accounts converts every vulnerability into a potential total compromise. A segmented, least-privilege environment converts most vulnerabilities into limited-scope incidents.

Assume breach posture — Design for rapid detection and recovery rather than prevention of every entry. If architectural controls are in place, a compromised component is an isolated incident, not a catastrophe. The question shifts from "how do we keep attackers out?" to "how quickly do we detect, contain, and recover?" This is Pillar 3 (Stress-to-Signal Conversion) applied to the vulnerability layer.

Known-good baseline — Configuration management (ASTRAL) and system state tracking mean that after a compromise, you can restore to a verified baseline. The ability to rebuild rapidly from a known-good state reduces the cost of successful exploitation dramatically.

What This Means for Prioritisation

When clients ask how to respond to the AI vulnerability discovery story, the answer is not a new patching tool. It is a sequenced architectural programme:

  1. Map and close the kill chain — the vulnerabilities that sit on the path to existential compromise get patched first, regardless of CVSS score.
  2. Reduce blast radius — segmentation and least privilege limit the value of any single exploit.
  3. Build detection and recovery capability — assume some vulnerabilities will be exploited; make exploitation detectable and recoverable.
  4. Then consider tooling to accelerate patch velocity for the long tail.

The correct posture is: a well-segmented, least-privilege, T0-protected environment with fast recovery capability survives more CVEs than a flat, over-privileged environment with a fast patch programme. Architecture beats velocity in the vulnerability race. It is the only bet you can actually win.


Contrast With "Move Fast and Break Things"

The Silicon Valley mantra was an excuse for externalizing harm. "Move fast and fix things" is its responsible successor:

Move Fast and Break Things Move Fast and Fix Things
Ship now, fix later Fix now, ship sustainably
Externalize risk to users Internalize risk and reduce it
Growth at all costs Resilience as the foundation of growth
Ignore technical debt Pay down the highest-interest debt first
Disrupt without accountability Build trust through visible repair

Next: CIS Controls Mapping Previous: Antifragile Manifest