Complete repository of frameworks, playbooks, and assessment resources for cybersecurity consultations focused on antifragile enterprise design. Includes: - Core philosophy and manifest (5 pillars) - 12 modular engagement packages - AI sovereignty and operations frameworks - Zero-budget vulnerability discovery and hardening playbooks - M365 E3 hardening and antifragile project plans - Osquery sovereign discovery platform blueprint - Perimeter scanning capability guide - AI-assisted TVM blueprint for AI-powered adversaries - Vertical specializations: banking, telco, power/utilities - CIS Controls v8 and NIST CSF 2.0 mappings - Risk registers and assessment templates - C-suite conversation guide and business case templates
298 lines
17 KiB
Markdown
298 lines
17 KiB
Markdown
# Vertical Reference: Power and Utilities
|
|
|
|
> *"The grid does not care about your quarterly targets. It cares whether you understood the boundary between IT and operations before the adversary did."*
|
|
|
|
This document adapts the antifragile rapid modernisation approach for power generation, transmission, distribution, and water utilities. These organizations operate industrial control systems (ICS/SCADA) where safety and availability are paramount, regulatory oversight is intense, and the convergence of IT and OT creates existential attack surfaces.
|
|
|
|
---
|
|
|
|
## The Power and Utility Context
|
|
|
|
### What Makes This Sector Different
|
|
|
|
| Factor | Enterprise Default | Power/Utility Reality |
|
|
|--------|-------------------|----------------------|
|
|
| Downtime tolerance | Hours | Seconds to minutes (protection systems); hours for generation |
|
|
| Safety impact | Data loss, financial harm | Physical harm, loss of life, environmental catastrophe |
|
|
| System lifetime | 3-5 years | 20-40 years (generation, transmission, protection relays) |
|
|
| Regulatory driver | GDPR, industry standards | NIS2, CER, IEC 62351, NERC CIP (North America), national energy regulators |
|
|
| OT/IT boundary | Often porous or nonexistent | Legally and physically mandated; convergence is the primary risk |
|
|
| Supply chain | Moderate depth | Extreme (multi-vendor, multi-national, obsolete equipment) |
|
|
| Remote access | Common, convenient | Heavily restricted; often requires physical presence or dedicated lines |
|
|
|
|
### The IT/OT Convergence Problem
|
|
|
|
Power utilities historically operated OT networks (SCADA, EMS, DMS, protection relays) as **air-gapped systems**. Over the past two decades, convergence has introduced:
|
|
|
|
- Remote diagnostics over internet-connected VPNs
|
|
- Centralized patch management through IT SCCM/WSUS
|
|
- Business intelligence systems reading OT historian data
|
|
- Vendor remote support terminals in control centers
|
|
- Smart grid and Advanced Metering Infrastructure (AMI) connecting customer-facing IT to grid operations
|
|
|
|
Every convergence point is a **potential bridge for adversaries** from IT to OT.
|
|
|
|
**The executive framing**:
|
|
|
|
> *"Your control room does not need email. Your protection relays do not need internet access. Every connection between your IT network and your operational technology is a connection an adversary can cross. We are not adding bureaucracy. We are re-establishing the boundary that keeps the lights on."*
|
|
|
|
---
|
|
|
|
## Regulatory Landscape
|
|
|
|
### EU NIS2 Directive (2023)
|
|
|
|
Power utilities and water suppliers are classified as **essential entities** under NIS2.
|
|
|
|
| NIS2 Requirement | Power/Utility Application |
|
|
|-----------------|--------------------------|
|
|
| Risk management measures | Kill chain analysis for IT→OT bridges; physical security assessment |
|
|
| Supply chain security | Vendor access inventory for all OT equipment; firmware provenance tracking |
|
|
| Incident reporting (24h → 72h) | Automated detection and reporting to national CSIRT and energy regulator |
|
|
| Business continuity | Black start capability; grid islanding procedures; manual override validation |
|
|
| Cryptography | Encrypted communications for all IT/OT integration points |
|
|
| MFA | Hardware tokens for all remote access to OT or critical IT systems |
|
|
| Vulnerability handling | Risk-based prioritization with **safety impact assessment** |
|
|
|
|
### CER Directive (Critical Entities Resilience)
|
|
|
|
Requires power utilities to demonstrate resilience against:
|
|
|
|
- Natural disasters
|
|
- Cyberattacks
|
|
- Supply chain disruptions
|
|
- Pandemics and workforce unavailability
|
|
|
|
**Antifragile application**: Chaos engineering for non-safety systems; cross-training for manual procedures; distributed spare parts inventory.
|
|
|
|
### Sector-Specific Standards
|
|
|
|
| Standard | Scope |
|
|
|----------|-------|
|
|
| **IEC 62351** | Power systems cybersecurity: communications protocols, authentication, encryption |
|
|
| **IEC 61850** | Substation communication (GOOSE, SV); security extensions for IEC 61850-90-20 |
|
|
| **NERC CIP** | North American electric reliability; mandatory standards with heavy penalties |
|
|
| **ENTSO-E Cybersecurity Guidance** | European transmission system operator requirements |
|
|
| **BDEW Whitepaper** | German energy sector cybersecurity best practices |
|
|
|
|
---
|
|
|
|
## The Antifragile Posture for Power and Utilities
|
|
|
|
### Pillar 1: Structural Decoupling — The IT/OT Firewall
|
|
|
|
**Principle**: IT and OT must be decoupled to the maximum extent compatible with operational requirements. The air gap is the default. Any bridge must be justified, documented, and monitored.
|
|
|
|
**Antifragile Moves**:
|
|
|
|
| Action | Implementation | Priority |
|
|
|--------|---------------|----------|
|
|
| **Network segmentation** | Physically separate IT and OT; unidirectional gateway or data diode for IT→OT data flows | P0 |
|
|
| **No AD trust to OT** | OT AD (if any) must be a separate forest with one-way trust or no trust | P0 |
|
|
| **Jump host architecture** | All IT-to-OT access via hardened, monitored jump hosts with session recording | P1 |
|
|
| **Vendor access airlock** | Vendor VPNs terminate in dedicated DMZ; no direct OT access; remote hands or on-site escort for OT | P1 |
|
|
| **Remove internet from OT** | OT VLANs have no direct internet egress; updates via offline media or controlled proxy | P0 |
|
|
| **AMI/ Smart Grid isolation** | Advanced Metering Infrastructure on dedicated network; no direct path to SCADA or EMS | P1 |
|
|
|
|
### Pillar 2: Optionality Preservation — Vendor and Technology Independence
|
|
|
|
**Principle**: Power utilities depend on vendors for SCADA, protection relays, turbine control, and substation automation. This dependency must not become a single point of failure.
|
|
|
|
**Antifragile Moves**:
|
|
|
|
- **Multi-vendor strategy for critical systems**: No single vendor should control >50% of protection, control, or monitoring functions
|
|
- **Spare parts inventory**: Maintain critical spares for legacy OT equipment that vendors no longer support
|
|
- **Firmware escrow and provenance**: Require vendors to deposit firmware; verify cryptographic signatures before deployment
|
|
- **Local competence**: Train internal staff to operate and maintain systems without vendor support for 30 days
|
|
- **Protocol independence**: Where possible, support multiple communication protocols to avoid single-vendor lock-in
|
|
|
|
### Pillar 3: Stress-to-Signal Conversion — OT Incident Learning
|
|
|
|
**Principle**: OT incidents are rare but high-impact. The organization must learn from every anomaly, near-miss, and exercise.
|
|
|
|
**Antifragile Moves**:
|
|
|
|
- **OT security operations centre (SOC) integration**: Feed OT alarms into the SOC with analysts trained on industrial protocols
|
|
- **Monthly tabletop exercises**: Simulate OT-specific scenarios (compromised EMS, rogue protection relay settings, ransomware on engineering workstations)
|
|
- **Post-incident structural mandate**: Every OT incident or near-miss must produce at least one architectural or procedural change
|
|
- **Red team with bounded OT scope**: Annual exercise including OT reconnaissance, constrained by safety requirements
|
|
|
|
### Pillar 4: Sovereign Intelligence — Local AI for the Grid
|
|
|
|
**Principle**: Grid data is among the most sensitive an organization possesses. It reveals generation capacity, topology, switching patterns, load profiles, and operational routines.
|
|
|
|
**Antifragile Moves**:
|
|
|
|
- **Local AI for OT anomaly detection**: Analyze historian data, DCS logs, and protection relay events without cloud exfiltration
|
|
- **Closed-loop digital twin**: Train models on local OT data to predict equipment failures; never export raw telemetry
|
|
- **Air-gapped AI inference**: Deploy inference nodes in OT DMZ with no return path to IT or internet
|
|
- **Load forecasting sovereignty**: Local models for demand prediction using proprietary grid data
|
|
|
|
**The executive framing**:
|
|
|
|
> *"Your grid data tells an adversary exactly when and where to strike. It tells a competitor your capacity constraints. Sending it to a cloud AI for 'optimization' is not a technology decision. It is a national security and competitive intelligence decision. Local models on local hardware. Full stop."*
|
|
|
|
### Pillar 5: Asymmetric Payoff — Resilience Over Prevention
|
|
|
|
**Principle**: In power utilities, perfect prevention is impossible. The goal is to survive and recover faster than the adversary can exploit.
|
|
|
|
**Antifragile Moves**:
|
|
|
|
- **Black start capability**: Maintain the ability to restart the grid from shutdown without external power
|
|
- **Grid islanding**: Design systems so that sections can disconnect and operate independently during disturbances
|
|
- **Manual override procedures**: Every automated system must have a documented, tested manual procedure
|
|
- **Redundant communication paths**: Power line carrier, microwave, satellite backup for SCADA and protection communications
|
|
- **Protection relay independence**: Electromechanical or static relays as backup for digital relays in critical paths
|
|
|
|
---
|
|
|
|
## The Rapid Modernisation Plan: Power/Utility Variant
|
|
|
|
### Phase 1: Hygiene (Days 0-30)
|
|
|
|
In addition to standard hygiene:
|
|
|
|
| Action | Owner | Deliverable |
|
|
|--------|-------|-------------|
|
|
| Inventory all OT assets: DCS, SCADA, EMS, protection relays, RTUs, AMI | OT Security / Engineering | OT asset inventory with vendor and firmware versions |
|
|
| Map all IT-to-OT network connections | Network / OT | Connection matrix with business justification per connection |
|
|
| Audit vendor remote access: who, how, when, for how long | OT Security / Procurement | Vendor access log and hardened policy |
|
|
| Identify OT systems with internet connectivity | Network | List with immediate remediation plan |
|
|
| Document manual override procedures for critical systems | OT Engineering | Procedure manual, signed off by operations and safety |
|
|
| Validate backup of EMS / DMS configurations | OT Engineering | Backup integrity test report |
|
|
|
|
### Phase 2: Control (Days 30-60)
|
|
|
|
| Action | Owner | Deliverable |
|
|
|--------|-------|-------------|
|
|
| Implement network segmentation: IT/OT DMZ with unidirectional gateway | Network / OT | Segmentation architecture and validated firewall rules |
|
|
| Harden vendor access: time-bounded, session-recorded, MFA with hardware tokens | OT Security | Vendor access gateway operational |
|
|
| Enable OT logging: historian, DCS, firewall, protection relay events | OT Security | Centralized OT log aggregation (air-gapped SIEM or historian) |
|
|
| Patch OT systems: test in lab, deploy in maintenance windows | OT Engineering | Patch management procedure with safety gates |
|
|
| Secure engineering workstations (EWS): application whitelisting, no internet | OT Security | EWS hardening standard deployed |
|
|
|
|
### Phase 3: Sovereignty (Days 60-90)
|
|
|
|
| Action | Owner | Deliverable |
|
|
|--------|-------|-------------|
|
|
| Deploy local AI for OT anomaly detection pilot | AI / OT Security | OT anomaly detection with false positive tuning |
|
|
| Validate black start / islanding procedures | Operations | Test report with time-to-recovery metrics |
|
|
| Conduct OT-specific tabletop exercise | Security / Operations | Exercise report with structural improvements |
|
|
| Implement firmware integrity monitoring | OT Security | Baseline hashes for critical OT firmware |
|
|
| Test protection relay fail-over to electromechanical backup | Engineering | Fail-over test report |
|
|
|
|
### Phase 4: Antifragility (Days 90-180)
|
|
|
|
| Action | Owner | Deliverable |
|
|
|--------|-------|-------------|
|
|
| Annual red team with bounded OT scope | Security | Red team report with kill chain analysis |
|
|
| Chaos engineering on non-safety IT systems | Resilience | Monthly experiment schedule and findings |
|
|
| Vendor exit architecture for critical OT platforms | Procurement / Engineering | 90-day vendor transition plan per critical system |
|
|
| Cross-training: operations staff on manual procedures | Operations | Training completion metrics |
|
|
| Participate in sector ISAC information sharing | Security | Threat intelligence integration report |
|
|
|
|
---
|
|
|
|
## Substation and Protection Specifics
|
|
|
|
### IEC 61850 Security
|
|
|
|
IEC 61850 (substation communication) uses GOOSE and Sampled Values (SV) that were not designed with security in mind.
|
|
|
|
**Hardening priorities**:
|
|
|
|
- **IEC 61850-90-20**: Implement cybersecurity recommendations for IEC 61850 networks
|
|
- **Authentication**: Digitally sign GOOSE messages where IEDs support it
|
|
- **Network segmentation**: GOOSE/SV traffic on dedicated VLAN; no routing to IT networks
|
|
- **IED hardening**: Disable unused services; change default passwords; enable logging
|
|
- **Configuration management**: Version control for SCL files; change detection for IED settings
|
|
|
|
### Protection Relay Security
|
|
|
|
Protection relays are the **safety-critical edge** of the grid. Compromise can cause physical damage.
|
|
|
|
| Control | Implementation |
|
|
|---------|---------------|
|
|
| Access control | Vaulted credentials; multi-person approval for settings changes |
|
|
| Logging | All settings changes logged with before/after values |
|
|
| Integrity | Cryptographic checksums for firmware and settings files |
|
|
| Redundancy | Independent protection schemes (e.g., distance + differential) |
|
|
| Manual backup | Electromechanical or static relay backup for critical digital protections |
|
|
|
|
---
|
|
|
|
## Generation-Specific Considerations
|
|
|
|
### Thermal / Nuclear / Hydro
|
|
|
|
| Generation Type | Specific Risk | Control |
|
|
|----------------|--------------|---------|
|
|
| **Thermal** | Turbine control system compromise | Dedicated turbine control network; no IT connectivity |
|
|
| **Nuclear** | Safety system interference | Air-gapped safety systems; regulatory compliance with national nuclear authority |
|
|
| **Hydro** | Dam control / spillway gate manipulation | Physical controls for critical water management; redundant level sensors |
|
|
| **Renewables** | Inverter-based resource (IBR) vulnerability | Secure firmware updates; anti-islanding protection; grid support function validation |
|
|
|
|
### Distributed Energy Resources (DER)
|
|
|
|
Solar, wind, and battery inverters connect to the distribution grid with varying security maturity.
|
|
|
|
- **Action**: DER interconnection standards must include cybersecurity requirements
|
|
- **Action**: Monitor DER communications for anomalous commands or settings changes
|
|
- **Action**: Aggregate DER visibility in DMS/ADMS without direct control paths
|
|
|
|
---
|
|
|
|
## Water and Wastewater Utilities
|
|
|
|
Water utilities share many characteristics with power but have additional concerns:
|
|
|
|
| Concern | Application |
|
|
|---------|-------------|
|
|
| **Safety** | Contamination prevention, pressure management, chemical dosing control |
|
|
| **SCADA/OT** | Treatment plant automation, distribution pump control, reservoir level management |
|
|
| **Criticality** | Water is life-sustaining; outages have immediate public health impact |
|
|
| **Regulation** | EPA (US), Drinking Water Inspectorate (UK), national health authorities |
|
|
|
|
**Additional controls for water utilities**:
|
|
|
|
- **Physical security** for treatment chemicals (chlorine, fluoride) to prevent intentional contamination
|
|
- **Redundant water quality sensors** with cross-validation
|
|
- **Manual override capability** for all automated chemical dosing systems
|
|
- **Isolation of IT from operational water quality monitoring**
|
|
|
|
---
|
|
|
|
## M365 in Power and Utilities
|
|
|
|
Corporate IT in power utilities uses M365 but must be strictly separated from OT.
|
|
|
|
| Consideration | Power/Utility Requirement |
|
|
|--------------|--------------------------|
|
|
| **Data residency** | M365 data in EU/national datacenters; verify tenant location |
|
|
| **Conditional access** | Block M365 access from non-corporate devices for privileged users; geo-restrict admin access |
|
|
| **Guest access** | Strictly prohibit in OT-connected tenants; heavily vet in corporate tenant |
|
|
| **Teams / SharePoint** | Never used for OT document sharing or control room communication |
|
|
| **Mobile device management** | Field engineer tablets Intune-managed; restricted app installation |
|
|
| **Email security** | EOP baseline minimum; Defender for Office 365 P2 recommended for critical infrastructure |
|
|
|
|
See [M365 E3 Hardening](../playbooks/m365-e3-hardening.md) for tactical hardening, and apply these overlays.
|
|
|
|
---
|
|
|
|
## Evidence Package for Regulators
|
|
|
|
| Requirement | Evidence from Antifragile Program |
|
|
|------------|----------------------------------|
|
|
| NIS2 risk management | Kill chain analysis, T0 asset classification, IT/OT connection matrix |
|
|
| NIS2 incident handling | IR runbooks, OT-specific response procedures, quarterly drill reports |
|
|
| NIS2 business continuity | Black start test reports, islanding validation, manual procedure verification |
|
|
| NIS2 supply chain security | Vendor risk register, firmware provenance, vendor exit architectures |
|
|
| NIS2 encryption | Data classification with encryption mapping, TLS configuration audits |
|
|
| NIS2 vulnerability handling | Vulnerability scan reports with safety-impact prioritization |
|
|
| CER resilience | Chaos engineering results, cross-training metrics, spare parts inventory |
|
|
|
|
---
|
|
|
|
*Previous: [NIST CSF Mapping](nist-csf-mapping.md)*
|
|
*Next: [Vertical: Telco](vertical-telco.md)*
|