diff --git a/antifragile-consulting/core/move-fast-and-fix-things.md b/antifragile-consulting/core/move-fast-and-fix-things.md index 4ffd183..7661cf7 100644 --- a/antifragile-consulting/core/move-fast-and-fix-things.md +++ b/antifragile-consulting/core/move-fast-and-fix-things.md @@ -129,6 +129,28 @@ This is the ultimate defender's power move. An attacker's leverage in a breach d **The controlled burn**: forests that are never burned accumulate the fuel for catastrophic fires. Organisations that are never greenfield-deployed accumulate technical debt, legacy dependencies, and accumulated compromise that makes eventual failure more severe. Planned greenfield on a 5-year cycle is the controlled burn that prevents the uncontrolled one. +### The Critical Infrastructure Adaptation + +For organisations operating OT/NT environments — power generation, transmission, water utilities, telecoms network infrastructure — a full greenfield rebuild is often genuinely not possible. Protection relays run for 30 years. PLCs controlling turbines cannot be taken offline for a rebuild exercise. Safety systems require regulatory approval for any change. The controlled burn, taken literally, cannot be applied. + +The goal remains the same. The method changes. + +**The purpose of greenfield capability is to eliminate inherited compromise and return to a known-good operational state.** In OT environments, this is achieved through a different set of moves — but the test is identical: *"If our control systems were completely compromised and had to be restored, could we maintain critical service delivery and return to full automated operation from a verified baseline?"* + +**IT layer greenfield protects the OT layer.** The corporate IT environment, SCADA servers, historian, HMI workstations, and M365 tenant can almost always be made greenfield-capable even when the OT hardware cannot. When the IT layer can be rebuilt clean, an adversary who compromised it loses their persistence and pivot path without a single OT system being touched. IT greenfield is the outer defence of an OT environment that cannot be rebuilt itself. + +**Configuration as code for OT.** PLC logic, IED settings, protection relay configurations, SCADA databases, and DCS configurations belong in version control. The ability to restore a verified configuration to existing hardware is the OT equivalent of greenfield: the hardware stays, but the software state is erased and rebuilt from a known-good baseline. Configuration backup and integrity checking for OT systems is not optional — it is the closest available substitute for the rebuild capability that IT environments take for granted. ASTRAL for M365 is the pattern; the same discipline applied to OT configuration archives is the OT equivalent. + +**Manual operation capability is a form of "drop the compromised layer."** A power utility that can maintain 80% of service from manual procedures during a SCADA compromise has a fundamentally different risk profile than one that cannot. The ability to operate without the automation layer is, in effect, the ability to sacrifice the compromised layer and continue. Manual override procedures, validated quarterly, are the OT sector's equivalent of a tested greenfield playbook. If operators have not practised running manually in the past 12 months, the capability does not exist. + +**Compartmentalisation over total rebuild.** OT environments are often sectionable. Grid islanding, corridor isolation, plant-level segmentation, and control centre failover allow the operator to sacrifice a section while maintaining critical service elsewhere. The burn is localised rather than total — but the principle is the same: designed-in ability to contain, recover, and restore in sequence rather than all at once. + +**Long-cycle planned refresh.** OT systems have 20–40 year lifetimes, but those lifetimes should be planned, not accidental. A utility with a documented 20-year OT refresh programme — component-by-component replacement milestones, firmware escrow, spare parts inventory — is doing the OT equivalent of periodic greenfield: the environment is continuously re-established in controlled segments. Organisations that do not have this programme are not avoiding greenfield; they are deferring it until a crisis forces it under the worst possible conditions. + +**What the test looks like for OT**: *"If our SCADA and IT layers were fully compromised tonight, could we maintain critical service from manual procedures within 4 hours, rebuild the IT layer from clean baselines within 48 hours, and restore full automated operation from verified OT configuration backups within two weeks?"* If any of those answers is no, the gap is in manual procedures, IT rebuild capability, or OT configuration management — not in greenfield per se, but in the prerequisites that make any form of recovery possible. + +For the full OT/critical infrastructure treatment, see [Vertical: Power and Utilities](../reference/vertical-power-utilities.md). + --- ## Mapping to Antifragile Pillars diff --git a/antifragile-consulting/reference/vertical-power-utilities.md b/antifragile-consulting/reference/vertical-power-utilities.md index 50a447c..5424c44 100644 --- a/antifragile-consulting/reference/vertical-power-utilities.md +++ b/antifragile-consulting/reference/vertical-power-utilities.md @@ -279,6 +279,53 @@ See [M365 E3 Hardening](../playbooks/m365-e3-hardening.md) for tactical hardenin --- +## The Controlled Burn Adaptation: When Greenfield Is Not an Option + +The antifragile framework holds that organisations should build toward the ability to deploy greenfield — rebuild from scratch, on clean infrastructure, from version-controlled configuration. This is the ultimate expression of structural decoupling: if you can rebuild the environment, no adversary and no vendor holds you hostage. + +Power utilities, water suppliers, and telecom network operators frequently view this principle as inapplicable. The grid does not go dark for a rebuild exercise. Protection relays cannot be factory-reset during a fault. OT systems operate under safety cases that require regulatory approval for any configuration change. The controlled burn, taken literally, cannot happen. + +This is correct. It is also not the end of the conversation. + +**The goal of greenfield capability is to eliminate inherited compromise and return to a known-good operational state.** For IT environments, the method is rebuild. For OT/NT environments, the method is different — but the goal is identical, and it is achievable. The absence of a literal rebuild path does not justify the absence of a recovery plan. + +### The OT-Adapted Greenfield Stack + +**Layer 1: IT greenfield protects OT.** The corporate IT environment, M365 tenant, SCADA servers, historian, engineering workstations, and HMI layer can almost always be made greenfield-capable even when OT hardware cannot. An adversary who compromises the IT layer and finds a clean rebuild path loses their persistence and pivot path without a single OT device being touched. IT greenfield is the outer perimeter of an OT environment that cannot be rebuilt itself. This is the first investment. + +**Layer 2: OT configuration as code.** PLC logic, IED settings files, protection relay configuration archives, SCADA database snapshots, DCS export files — all of these belong in version-controlled backups with integrity verification. The ability to restore a known-good configuration to existing hardware is the OT equivalent of greenfield: the hardware remains, but the software state is wiped and rebuilt from a verified baseline. This is not a backup exercise. It is a discipline — with the same rigour that ASTRAL applies to M365 configuration, applied to OT configuration archives. Every piece of OT configuration that exists only in the device and nowhere else is a single point of failure. + +**Layer 3: Manual operation as the fallback layer.** The ability to operate critical systems without the automation layer is, in practice, the ability to drop the compromised layer and continue service. A power utility that can maintain 70–80% of service from manual procedures during a SCADA compromise has a fundamentally different risk profile than one that cannot. Manual override procedures must be: +- Documented in detail, not just referenced in an emergency plan +- Tested under realistic conditions, not just reviewed in a tabletop +- Known by currently assigned operations staff, not just veterans who may have left +- Validated at least annually — capability that is not practised does not exist when it is needed + +**Layer 4: Compartmentalisation as partial burn.** OT environments are typically sectionable. Grid islanding, substation isolation, plant-level control separation, and control centre failover allow the operator to sacrifice and rebuild one section while maintaining critical service in others. This is the OT equivalent of the controlled burn: localised rather than total, sequential rather than simultaneous, but governed by the same principle — designed-in ability to contain, recover, and restore without waiting for a complete environment to be clean. + +**Layer 5: Planned long-cycle refresh.** OT systems have 20–40 year operational lifetimes, but those lifetimes should be a programme, not an accident. Organisations without a documented OT refresh schedule — with component-by-component replacement milestones, firmware escrow requirements, spare parts inventory targets, and vendor succession planning — are not avoiding greenfield. They are deferring it until a crisis forces it under the worst possible conditions: compromised hardware, unavailable vendors, missing documentation, and no tested procedures. + +### The Acceptance Statement + +Some OT components in critical infrastructure genuinely cannot be replaced on any timescale that security planning can influence. Legacy protection relays on operational transmission lines. Nuclear instrumentation systems under active safety cases. Water treatment chemical dosing controllers that predate the organisation's current IT function. + +For these systems, the correct position is explicit acceptance, not avoidance: + +1. **Name them.** Identify specifically which systems are outside the rebuild envelope and why. +2. **Isolate them.** The isolation must be proportional to the acknowledged unrepairability. A system that cannot be patched, cannot be replaced, and cannot be rebuilt must be surrounded by compensating controls so thorough that its compromise cannot propagate. +3. **Monitor them obsessively.** Configuration integrity monitoring, network traffic baselining, and anomaly detection for these specific systems — because when you cannot fix the asset, detection and containment are the only remaining defences. +4. **Plan their eventual replacement.** "This system cannot be replaced in the current operational context" is acceptable. "This system will never be replaced" is not a security posture — it is a deferred decision that will be made under worse conditions later. + +The acceptance statement is not a sign of weakness. It is the honest foundation of a credible security programme. Regulators, insurers, and incident responders all prefer an organisation that knows exactly where its limits are and has compensating controls in place over one that claims no limits and has no plan. + +### The OT Greenfield Test + +*"If our IT and SCADA layers were fully compromised tonight: could we maintain critical service from manual procedures within 4 hours? Rebuild the IT layer from clean baselines within 48 hours? Restore full automated operation from verified OT configuration backups within two weeks? And have we actually tested each of these in the past 12 months?"* + +If any answer is no, the gap is in manual procedures, IT rebuild capability, OT configuration management, or test cadence — not in the impossibility of the OT environment itself. + +--- + ## Evidence Package for Regulators | Requirement | Evidence from Antifragile Program |