Complete repository of frameworks, playbooks, and assessment resources for cybersecurity consultations focused on antifragile enterprise design. Includes: - Core philosophy and manifest (5 pillars) - 12 modular engagement packages - AI sovereignty and operations frameworks - Zero-budget vulnerability discovery and hardening playbooks - M365 E3 hardening and antifragile project plans - Osquery sovereign discovery platform blueprint - Perimeter scanning capability guide - AI-assisted TVM blueprint for AI-powered adversaries - Vertical specializations: banking, telco, power/utilities - CIS Controls v8 and NIST CSF 2.0 mappings - Risk registers and assessment templates - C-suite conversation guide and business case templates
17 KiB
Vertical Reference: Power and Utilities
"The grid does not care about your quarterly targets. It cares whether you understood the boundary between IT and operations before the adversary did."
This document adapts the antifragile rapid modernisation approach for power generation, transmission, distribution, and water utilities. These organizations operate industrial control systems (ICS/SCADA) where safety and availability are paramount, regulatory oversight is intense, and the convergence of IT and OT creates existential attack surfaces.
The Power and Utility Context
What Makes This Sector Different
| Factor | Enterprise Default | Power/Utility Reality |
|---|---|---|
| Downtime tolerance | Hours | Seconds to minutes (protection systems); hours for generation |
| Safety impact | Data loss, financial harm | Physical harm, loss of life, environmental catastrophe |
| System lifetime | 3-5 years | 20-40 years (generation, transmission, protection relays) |
| Regulatory driver | GDPR, industry standards | NIS2, CER, IEC 62351, NERC CIP (North America), national energy regulators |
| OT/IT boundary | Often porous or nonexistent | Legally and physically mandated; convergence is the primary risk |
| Supply chain | Moderate depth | Extreme (multi-vendor, multi-national, obsolete equipment) |
| Remote access | Common, convenient | Heavily restricted; often requires physical presence or dedicated lines |
The IT/OT Convergence Problem
Power utilities historically operated OT networks (SCADA, EMS, DMS, protection relays) as air-gapped systems. Over the past two decades, convergence has introduced:
- Remote diagnostics over internet-connected VPNs
- Centralized patch management through IT SCCM/WSUS
- Business intelligence systems reading OT historian data
- Vendor remote support terminals in control centers
- Smart grid and Advanced Metering Infrastructure (AMI) connecting customer-facing IT to grid operations
Every convergence point is a potential bridge for adversaries from IT to OT.
The executive framing:
"Your control room does not need email. Your protection relays do not need internet access. Every connection between your IT network and your operational technology is a connection an adversary can cross. We are not adding bureaucracy. We are re-establishing the boundary that keeps the lights on."
Regulatory Landscape
EU NIS2 Directive (2023)
Power utilities and water suppliers are classified as essential entities under NIS2.
| NIS2 Requirement | Power/Utility Application |
|---|---|
| Risk management measures | Kill chain analysis for IT→OT bridges; physical security assessment |
| Supply chain security | Vendor access inventory for all OT equipment; firmware provenance tracking |
| Incident reporting (24h → 72h) | Automated detection and reporting to national CSIRT and energy regulator |
| Business continuity | Black start capability; grid islanding procedures; manual override validation |
| Cryptography | Encrypted communications for all IT/OT integration points |
| MFA | Hardware tokens for all remote access to OT or critical IT systems |
| Vulnerability handling | Risk-based prioritization with safety impact assessment |
CER Directive (Critical Entities Resilience)
Requires power utilities to demonstrate resilience against:
- Natural disasters
- Cyberattacks
- Supply chain disruptions
- Pandemics and workforce unavailability
Antifragile application: Chaos engineering for non-safety systems; cross-training for manual procedures; distributed spare parts inventory.
Sector-Specific Standards
| Standard | Scope |
|---|---|
| IEC 62351 | Power systems cybersecurity: communications protocols, authentication, encryption |
| IEC 61850 | Substation communication (GOOSE, SV); security extensions for IEC 61850-90-20 |
| NERC CIP | North American electric reliability; mandatory standards with heavy penalties |
| ENTSO-E Cybersecurity Guidance | European transmission system operator requirements |
| BDEW Whitepaper | German energy sector cybersecurity best practices |
The Antifragile Posture for Power and Utilities
Pillar 1: Structural Decoupling — The IT/OT Firewall
Principle: IT and OT must be decoupled to the maximum extent compatible with operational requirements. The air gap is the default. Any bridge must be justified, documented, and monitored.
Antifragile Moves:
| Action | Implementation | Priority |
|---|---|---|
| Network segmentation | Physically separate IT and OT; unidirectional gateway or data diode for IT→OT data flows | P0 |
| No AD trust to OT | OT AD (if any) must be a separate forest with one-way trust or no trust | P0 |
| Jump host architecture | All IT-to-OT access via hardened, monitored jump hosts with session recording | P1 |
| Vendor access airlock | Vendor VPNs terminate in dedicated DMZ; no direct OT access; remote hands or on-site escort for OT | P1 |
| Remove internet from OT | OT VLANs have no direct internet egress; updates via offline media or controlled proxy | P0 |
| AMI/ Smart Grid isolation | Advanced Metering Infrastructure on dedicated network; no direct path to SCADA or EMS | P1 |
Pillar 2: Optionality Preservation — Vendor and Technology Independence
Principle: Power utilities depend on vendors for SCADA, protection relays, turbine control, and substation automation. This dependency must not become a single point of failure.
Antifragile Moves:
- Multi-vendor strategy for critical systems: No single vendor should control >50% of protection, control, or monitoring functions
- Spare parts inventory: Maintain critical spares for legacy OT equipment that vendors no longer support
- Firmware escrow and provenance: Require vendors to deposit firmware; verify cryptographic signatures before deployment
- Local competence: Train internal staff to operate and maintain systems without vendor support for 30 days
- Protocol independence: Where possible, support multiple communication protocols to avoid single-vendor lock-in
Pillar 3: Stress-to-Signal Conversion — OT Incident Learning
Principle: OT incidents are rare but high-impact. The organization must learn from every anomaly, near-miss, and exercise.
Antifragile Moves:
- OT security operations centre (SOC) integration: Feed OT alarms into the SOC with analysts trained on industrial protocols
- Monthly tabletop exercises: Simulate OT-specific scenarios (compromised EMS, rogue protection relay settings, ransomware on engineering workstations)
- Post-incident structural mandate: Every OT incident or near-miss must produce at least one architectural or procedural change
- Red team with bounded OT scope: Annual exercise including OT reconnaissance, constrained by safety requirements
Pillar 4: Sovereign Intelligence — Local AI for the Grid
Principle: Grid data is among the most sensitive an organization possesses. It reveals generation capacity, topology, switching patterns, load profiles, and operational routines.
Antifragile Moves:
- Local AI for OT anomaly detection: Analyze historian data, DCS logs, and protection relay events without cloud exfiltration
- Closed-loop digital twin: Train models on local OT data to predict equipment failures; never export raw telemetry
- Air-gapped AI inference: Deploy inference nodes in OT DMZ with no return path to IT or internet
- Load forecasting sovereignty: Local models for demand prediction using proprietary grid data
The executive framing:
"Your grid data tells an adversary exactly when and where to strike. It tells a competitor your capacity constraints. Sending it to a cloud AI for 'optimization' is not a technology decision. It is a national security and competitive intelligence decision. Local models on local hardware. Full stop."
Pillar 5: Asymmetric Payoff — Resilience Over Prevention
Principle: In power utilities, perfect prevention is impossible. The goal is to survive and recover faster than the adversary can exploit.
Antifragile Moves:
- Black start capability: Maintain the ability to restart the grid from shutdown without external power
- Grid islanding: Design systems so that sections can disconnect and operate independently during disturbances
- Manual override procedures: Every automated system must have a documented, tested manual procedure
- Redundant communication paths: Power line carrier, microwave, satellite backup for SCADA and protection communications
- Protection relay independence: Electromechanical or static relays as backup for digital relays in critical paths
The Rapid Modernisation Plan: Power/Utility Variant
Phase 1: Hygiene (Days 0-30)
In addition to standard hygiene:
| Action | Owner | Deliverable |
|---|---|---|
| Inventory all OT assets: DCS, SCADA, EMS, protection relays, RTUs, AMI | OT Security / Engineering | OT asset inventory with vendor and firmware versions |
| Map all IT-to-OT network connections | Network / OT | Connection matrix with business justification per connection |
| Audit vendor remote access: who, how, when, for how long | OT Security / Procurement | Vendor access log and hardened policy |
| Identify OT systems with internet connectivity | Network | List with immediate remediation plan |
| Document manual override procedures for critical systems | OT Engineering | Procedure manual, signed off by operations and safety |
| Validate backup of EMS / DMS configurations | OT Engineering | Backup integrity test report |
Phase 2: Control (Days 30-60)
| Action | Owner | Deliverable |
|---|---|---|
| Implement network segmentation: IT/OT DMZ with unidirectional gateway | Network / OT | Segmentation architecture and validated firewall rules |
| Harden vendor access: time-bounded, session-recorded, MFA with hardware tokens | OT Security | Vendor access gateway operational |
| Enable OT logging: historian, DCS, firewall, protection relay events | OT Security | Centralized OT log aggregation (air-gapped SIEM or historian) |
| Patch OT systems: test in lab, deploy in maintenance windows | OT Engineering | Patch management procedure with safety gates |
| Secure engineering workstations (EWS): application whitelisting, no internet | OT Security | EWS hardening standard deployed |
Phase 3: Sovereignty (Days 60-90)
| Action | Owner | Deliverable |
|---|---|---|
| Deploy local AI for OT anomaly detection pilot | AI / OT Security | OT anomaly detection with false positive tuning |
| Validate black start / islanding procedures | Operations | Test report with time-to-recovery metrics |
| Conduct OT-specific tabletop exercise | Security / Operations | Exercise report with structural improvements |
| Implement firmware integrity monitoring | OT Security | Baseline hashes for critical OT firmware |
| Test protection relay fail-over to electromechanical backup | Engineering | Fail-over test report |
Phase 4: Antifragility (Days 90-180)
| Action | Owner | Deliverable |
|---|---|---|
| Annual red team with bounded OT scope | Security | Red team report with kill chain analysis |
| Chaos engineering on non-safety IT systems | Resilience | Monthly experiment schedule and findings |
| Vendor exit architecture for critical OT platforms | Procurement / Engineering | 90-day vendor transition plan per critical system |
| Cross-training: operations staff on manual procedures | Operations | Training completion metrics |
| Participate in sector ISAC information sharing | Security | Threat intelligence integration report |
Substation and Protection Specifics
IEC 61850 Security
IEC 61850 (substation communication) uses GOOSE and Sampled Values (SV) that were not designed with security in mind.
Hardening priorities:
- IEC 61850-90-20: Implement cybersecurity recommendations for IEC 61850 networks
- Authentication: Digitally sign GOOSE messages where IEDs support it
- Network segmentation: GOOSE/SV traffic on dedicated VLAN; no routing to IT networks
- IED hardening: Disable unused services; change default passwords; enable logging
- Configuration management: Version control for SCL files; change detection for IED settings
Protection Relay Security
Protection relays are the safety-critical edge of the grid. Compromise can cause physical damage.
| Control | Implementation |
|---|---|
| Access control | Vaulted credentials; multi-person approval for settings changes |
| Logging | All settings changes logged with before/after values |
| Integrity | Cryptographic checksums for firmware and settings files |
| Redundancy | Independent protection schemes (e.g., distance + differential) |
| Manual backup | Electromechanical or static relay backup for critical digital protections |
Generation-Specific Considerations
Thermal / Nuclear / Hydro
| Generation Type | Specific Risk | Control |
|---|---|---|
| Thermal | Turbine control system compromise | Dedicated turbine control network; no IT connectivity |
| Nuclear | Safety system interference | Air-gapped safety systems; regulatory compliance with national nuclear authority |
| Hydro | Dam control / spillway gate manipulation | Physical controls for critical water management; redundant level sensors |
| Renewables | Inverter-based resource (IBR) vulnerability | Secure firmware updates; anti-islanding protection; grid support function validation |
Distributed Energy Resources (DER)
Solar, wind, and battery inverters connect to the distribution grid with varying security maturity.
- Action: DER interconnection standards must include cybersecurity requirements
- Action: Monitor DER communications for anomalous commands or settings changes
- Action: Aggregate DER visibility in DMS/ADMS without direct control paths
Water and Wastewater Utilities
Water utilities share many characteristics with power but have additional concerns:
| Concern | Application |
|---|---|
| Safety | Contamination prevention, pressure management, chemical dosing control |
| SCADA/OT | Treatment plant automation, distribution pump control, reservoir level management |
| Criticality | Water is life-sustaining; outages have immediate public health impact |
| Regulation | EPA (US), Drinking Water Inspectorate (UK), national health authorities |
Additional controls for water utilities:
- Physical security for treatment chemicals (chlorine, fluoride) to prevent intentional contamination
- Redundant water quality sensors with cross-validation
- Manual override capability for all automated chemical dosing systems
- Isolation of IT from operational water quality monitoring
M365 in Power and Utilities
Corporate IT in power utilities uses M365 but must be strictly separated from OT.
| Consideration | Power/Utility Requirement |
|---|---|
| Data residency | M365 data in EU/national datacenters; verify tenant location |
| Conditional access | Block M365 access from non-corporate devices for privileged users; geo-restrict admin access |
| Guest access | Strictly prohibit in OT-connected tenants; heavily vet in corporate tenant |
| Teams / SharePoint | Never used for OT document sharing or control room communication |
| Mobile device management | Field engineer tablets Intune-managed; restricted app installation |
| Email security | EOP baseline minimum; Defender for Office 365 P2 recommended for critical infrastructure |
See M365 E3 Hardening for tactical hardening, and apply these overlays.
Evidence Package for Regulators
| Requirement | Evidence from Antifragile Program |
|---|---|
| NIS2 risk management | Kill chain analysis, T0 asset classification, IT/OT connection matrix |
| NIS2 incident handling | IR runbooks, OT-specific response procedures, quarterly drill reports |
| NIS2 business continuity | Black start test reports, islanding validation, manual procedure verification |
| NIS2 supply chain security | Vendor risk register, firmware provenance, vendor exit architectures |
| NIS2 encryption | Data classification with encryption mapping, TLS configuration audits |
| NIS2 vulnerability handling | Vulnerability scan reports with safety-impact prioritization |
| CER resilience | Chaos engineering results, cross-training metrics, spare parts inventory |
Previous: NIST CSF Mapping Next: Vertical: Telco