feat: Add critical infrastructure adaptation for Rule 5 (greenfield)

move-fast-and-fix-things.md: 'The Critical Infrastructure Adaptation' section in Rule 5. OT/NT environments where full greenfield is impossible. Five-layer adapted stack: IT greenfield protects OT, OT config as code, manual operation as fallback, compartmentalisation as partial burn, long-cycle planned refresh. OT greenfield test with 4h/48h/2w targets. vertical-power-utilities.md: New 'The Controlled Burn Adaptation' section. Full treatment of when greenfield is not an option. Five-layer OT-adapted stack. Explicit acceptance statement framework for genuinely irreplaceable OT components (name, isolate, monitor, plan replacement). The OT greenfield test. Reference back to Rule 5. Co-Authored-By: Tom Kracmar <tom+claude@cat6.cz>
2026-06-05 06:58:07 +00:00
parent a337af7ddf
commit bcebf8ebb3
2 changed files with 69 additions and 0 deletions
@@ -279,6 +279,53 @@ See [M365 E3 Hardening](../playbooks/m365-e3-hardening.md) for tactical hardenin

 ---

+## The Controlled Burn Adaptation: When Greenfield Is Not an Option
+
+The antifragile framework holds that organisations should build toward the ability to deploy greenfield — rebuild from scratch, on clean infrastructure, from version-controlled configuration. This is the ultimate expression of structural decoupling: if you can rebuild the environment, no adversary and no vendor holds you hostage.
+
+Power utilities, water suppliers, and telecom network operators frequently view this principle as inapplicable. The grid does not go dark for a rebuild exercise. Protection relays cannot be factory-reset during a fault. OT systems operate under safety cases that require regulatory approval for any configuration change. The controlled burn, taken literally, cannot happen.
+
+This is correct. It is also not the end of the conversation.
+
+**The goal of greenfield capability is to eliminate inherited compromise and return to a known-good operational state.** For IT environments, the method is rebuild. For OT/NT environments, the method is different — but the goal is identical, and it is achievable. The absence of a literal rebuild path does not justify the absence of a recovery plan.
+
+### The OT-Adapted Greenfield Stack
+
+**Layer 1: IT greenfield protects OT.** The corporate IT environment, M365 tenant, SCADA servers, historian, engineering workstations, and HMI layer can almost always be made greenfield-capable even when OT hardware cannot. An adversary who compromises the IT layer and finds a clean rebuild path loses their persistence and pivot path without a single OT device being touched. IT greenfield is the outer perimeter of an OT environment that cannot be rebuilt itself. This is the first investment.
+
+**Layer 2: OT configuration as code.** PLC logic, IED settings files, protection relay configuration archives, SCADA database snapshots, DCS export files — all of these belong in version-controlled backups with integrity verification. The ability to restore a known-good configuration to existing hardware is the OT equivalent of greenfield: the hardware remains, but the software state is wiped and rebuilt from a verified baseline. This is not a backup exercise. It is a discipline — with the same rigour that ASTRAL applies to M365 configuration, applied to OT configuration archives. Every piece of OT configuration that exists only in the device and nowhere else is a single point of failure.
+
+**Layer 3: Manual operation as the fallback layer.** The ability to operate critical systems without the automation layer is, in practice, the ability to drop the compromised layer and continue service. A power utility that can maintain 70–80% of service from manual procedures during a SCADA compromise has a fundamentally different risk profile than one that cannot. Manual override procedures must be:
+- Documented in detail, not just referenced in an emergency plan
+- Tested under realistic conditions, not just reviewed in a tabletop
+- Known by currently assigned operations staff, not just veterans who may have left
+- Validated at least annually — capability that is not practised does not exist when it is needed
+
+**Layer 4: Compartmentalisation as partial burn.** OT environments are typically sectionable. Grid islanding, substation isolation, plant-level control separation, and control centre failover allow the operator to sacrifice and rebuild one section while maintaining critical service in others. This is the OT equivalent of the controlled burn: localised rather than total, sequential rather than simultaneous, but governed by the same principle — designed-in ability to contain, recover, and restore without waiting for a complete environment to be clean.
+
+**Layer 5: Planned long-cycle refresh.** OT systems have 20–40 year operational lifetimes, but those lifetimes should be a programme, not an accident. Organisations without a documented OT refresh schedule — with component-by-component replacement milestones, firmware escrow requirements, spare parts inventory targets, and vendor succession planning — are not avoiding greenfield. They are deferring it until a crisis forces it under the worst possible conditions: compromised hardware, unavailable vendors, missing documentation, and no tested procedures.
+
+### The Acceptance Statement
+
+Some OT components in critical infrastructure genuinely cannot be replaced on any timescale that security planning can influence. Legacy protection relays on operational transmission lines. Nuclear instrumentation systems under active safety cases. Water treatment chemical dosing controllers that predate the organisation's current IT function.
+
+For these systems, the correct position is explicit acceptance, not avoidance:
+
+1. **Name them.** Identify specifically which systems are outside the rebuild envelope and why.
+2. **Isolate them.** The isolation must be proportional to the acknowledged unrepairability. A system that cannot be patched, cannot be replaced, and cannot be rebuilt must be surrounded by compensating controls so thorough that its compromise cannot propagate.
+3. **Monitor them obsessively.** Configuration integrity monitoring, network traffic baselining, and anomaly detection for these specific systems — because when you cannot fix the asset, detection and containment are the only remaining defences.
+4. **Plan their eventual replacement.** "This system cannot be replaced in the current operational context" is acceptable. "This system will never be replaced" is not a security posture — it is a deferred decision that will be made under worse conditions later.
+
+The acceptance statement is not a sign of weakness. It is the honest foundation of a credible security programme. Regulators, insurers, and incident responders all prefer an organisation that knows exactly where its limits are and has compensating controls in place over one that claims no limits and has no plan.
+
+### The OT Greenfield Test
+
+*"If our IT and SCADA layers were fully compromised tonight: could we maintain critical service from manual procedures within 4 hours? Rebuild the IT layer from clean baselines within 48 hours? Restore full automated operation from verified OT configuration backups within two weeks? And have we actually tested each of these in the past 12 months?"*
+
+If any answer is no, the gap is in manual procedures, IT rebuild capability, OT configuration management, or test cadence — not in the impossibility of the OT environment itself.
+
+---
+
 ## Evidence Package for Regulators

 | Requirement | Evidence from Antifragile Program |