Initial commit: antifragile cybersecurity consulting blueprint

Complete repository of frameworks, playbooks, and assessment resources
for cybersecurity consultations focused on antifragile enterprise design.

Includes:
- Core philosophy and manifest (5 pillars)
- 12 modular engagement packages
- AI sovereignty and operations frameworks
- Zero-budget vulnerability discovery and hardening playbooks
- M365 E3 hardening and antifragile project plans
- Osquery sovereign discovery platform blueprint
- Perimeter scanning capability guide
- AI-assisted TVM blueprint for AI-powered adversaries
- Vertical specializations: banking, telco, power/utilities
- CIS Controls v8 and NIST CSF 2.0 mappings
- Risk registers and assessment templates
- C-suite conversation guide and business case templates
This commit is contained in:
2026-05-09 16:53:22 +02:00
commit 763da003d3
35 changed files with 9711 additions and 0 deletions

View File

@@ -0,0 +1,209 @@
# CIS Controls v8 Mapping
> *"CIS IG1 is 56 safeguards that every organization must implement. It is not aspirational. It is the floor."*
This document maps the [Rapid Modernisation Plan](../playbooks/rapid-modernisation-plan.md) and the antifragile workstreams to CIS Controls v8 Implementation Groups. The goal is to show clients that antifragile hardening is not an alternative to standards—it is the fastest path to meeting them while building real resilience.
---
## Implementation Group 1 (IG1): The Minimum Viable Posture
IG1 is the **safeguards that every organization should implement to protect against common, known threats**. We treat IG1 as a non-negotiable 90-day target. Most organizations can achieve IG1 primarily through **configuration of existing tools** rather than new procurement.
### Control 1: Inventory and Control of Enterprise Assets
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Active Directory / cloud IAM census | Existing identity provider |
| Hygiene (Days 0-30) | CMDB seeding with T0/T1 assets | Existing ITAM or spreadsheet |
| Control (Days 30-60) | Automated discovery of new assets | Existing EDR or NAC |
**Antifragile Angle**: You cannot defend what you cannot see. But inventory without ownership is just a list. Every asset in the CMDB must have an owner, a criticality rating, and a dependency map.
### Control 2: Inventory and Control of Software Assets
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Software inventory via EDR or SCCM | Existing endpoint management |
| Hygiene (Days 0-30) | Unauthorized software detection | Existing EDR |
| Sovereignty (Days 60-90) | AI tool inventory and shadow AI discovery | Proxy logs + interviews |
**Antifragile Angle**: Software inventory is not about license compliance. It is about understanding your **attack surface**. Every unauthorized application is a potential path for an adversary.
### Control 3: Data Protection
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Data classification by criticality | Manual + existing DLP if available |
| Sovereignty (Days 60-90) | Ensure proprietary AI data never leaves perimeter | Local AI infrastructure |
| Antifragility (Days 90-180) | Automated data loss prevention | Existing CASB or DLP |
**Antifragile Angle**: Data protection is not encryption at rest. It is **ensuring your proprietary signal does not train your competitor's model**. Local AI is a data protection control.
### Control 4: Secure Configuration of Enterprise Assets and Software
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Control (Days 30-60) | ASR rule deployment on endpoints | Microsoft Defender (often already owned) |
| Control (Days 30-60) | Secure baseline for cloud resources | Azure Policy / AWS Config / GCP Org Policy |
| Antifragility (Days 90-180) | Automated drift detection and remediation | Existing configuration management |
**Antifragile Angle**: Secure configuration is not a project. It is a **continuous state**. Every deviation from baseline is a fragility. Automate the detection and remediation of drift.
### Control 5: Account Management
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Identity census and orphan elimination | Existing AD / IAM |
| Hygiene (Days 0-30) | Privileged account inventory and rotation | Existing AD / IAM + PAM if owned |
| Control (Days 30-60) | JIT elevation and PAW deployment | Existing PAM or native tools (PIM, AWS IAM Identity Center) |
**Antifragile Angle**: Account management is not about password complexity. It is about **reducing the number of keys that can unlock the kingdom**. Every account is a latent failure mode.
### Control 6: Access Control Management
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Control (Days 30-60) | Least-privilege review across platforms | Existing IAM + manual review |
| Control (Days 30-60) | Conditional access policies | Entra ID / Okta / native cloud IAM |
| Antifragility (Days 90-180) | Automated access reviews and revocation | Existing IAM or GRC tool |
**Antifragile Angle**: Access control is not about denying access. It is about **ensuring every allowed access is known, justified, and temporary**.
### Control 7: Continuous Vulnerability Management
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | External vulnerability scanning | Open-source or existing scanner |
| Control (Days 30-60) | Internal vulnerability scanning | Existing scanner or EDR-integrated |
| Antifragility (Days 90-180) | Risk-based prioritization and SLA | Existing vulnerability management platform |
**Antifragile Angle**: Vulnerability management is not about scanning everything. It is about **finding the shortest path to compromise and closing it first**.
### Control 8: Audit Log Management
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Centralized log aggregation for critical systems | Existing SIEM or syslog server |
| Control (Days 30-60) | Log integrity protection | Existing SIEM or file integrity monitoring |
| Antifragility (Days 90-180) | Automated log analysis and anomaly detection | Existing SIEM or local AI pilot |
**Antifragile Angle**: Logs are not compliance artifacts. They are **the raw material of organizational memory**. If an attacker deletes your logs, they delete your ability to learn.
### Control 9: Email and Web Browser Protections
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Control (Days 30-60) | Anti-phishing and safe links | Microsoft Defender for O365 (often already owned) |
| Control (Days 30-60) | Browser isolation or hardening | Existing endpoint management |
**Antifragile Angle**: Email is the primary initial access vector for most adversaries. Hardening it is not optional. Fortunately, most organizations already own the tools to do so.
### Control 10: Malware Defenses
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | EDR deployment and coverage validation | Existing EDR |
| Control (Days 30-60) | ASR rules and exploit protection | Microsoft Defender (often already owned) |
| Antifragility (Days 90-180) | Behavioral detection tuning | Existing EDR |
**Antifragile Angle**: Malware defence is not about signature updates. It is about **behavioural visibility**: can you see anomalous process execution, lateral movement, and data staging?
### Control 11: Data Recovery
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Backup coverage inventory | Existing backup solution |
| Sovereignty (Days 60-90) | Recovery drill: one critical system | Existing backup solution |
| Antifragility (Days 90-180) | Automated backup verification and recovery testing | Existing backup solution + scripting |
**Antifragile Angle**: Backups that have not been restored are **theological constructs**. They require faith, not evidence. We test.
### Control 12: Network Infrastructure Management
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Network diagram and firewall rule audit | Existing firewall management |
| Control (Days 30-60) | DNS security and network segmentation | Existing DNS and firewall infrastructure |
| Antifragility (Days 90-180) | Automated network policy validation | Existing configuration management |
**Antifragile Angle**: Network infrastructure is not about speed. It is about **containment**: when one segment fails, how many others can you save?
### Control 13: Network Monitoring and Defense
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Control (Days 30-60) | Network sensor deployment at critical boundaries | Existing IDS/IPS or open-source Zeek/Suricata |
| Antifragility (Days 90-180) | Automated threat detection and response | Existing SIEM + SOAR or scripted response |
**Antifragile Angle**: Network monitoring is not about catching everything. It is about **detecting the anomaly that matters before it becomes the incident that kills you**.
### Control 14: Security Awareness and Skills Training
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Control (Days 30-60) | Phishing simulation and targeted training | Existing security awareness platform |
| Antifragility (Days 90-180) | Security champions program | No tool required—organizational design |
**Antifragile Angle**: Awareness is not about compliance videos. It is about **building a human sensor network** that reports anomalies faster than any technology.
### Control 15: Service Provider Management
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Vendor access audit and inventory | Manual + existing IAM |
| Control (Days 30-60) | Supplier access lockdown and time-bounding | Existing PAM or IAM |
| Sovereignty (Days 60-90) | AI vendor risk assessment and exit planning | Manual + legal review |
**Antifragile Angle**: Supplier management is not about contracts. It is about **ensuring your suppliers cannot become your single point of failure**.
### Control 16: Application Software Security
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Sovereignty (Days 60-90) | AI-assisted code review pilot | Local AI on existing hardware |
| Antifragility (Days 90-180) | SAST/DAST integration into CI/CD | Existing DevOps tooling |
**Antifragile Angle**: Application security is not about finding every bug. It is about **making the development pipeline inhospitable to entire classes of vulnerabilities**.
### Control 17: Incident Response Management
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | IR contact list and escalation paths | Manual + existing ticketing |
| Sovereignty (Days 60-90) | AI-specific incident response runbook | Manual + existing IR framework |
| Antifragility (Days 90-180) | Automated containment playbooks | Existing SOAR or scripted response |
**Antifragile Angle**: Incident response is not about playbooks. It is about **the speed at which you convert an incident into a structural improvement**.
### Control 18: Penetration Testing
| Rapid Modernisation Phase | Action | Typical Tool Investment |
|--------------------------|--------|------------------------|
| Antifragility (Days 90-180) | Red team engagement or adversarial simulation | External provider or internal team |
| Antifragility (Days 90-180) | Continuous purple team exercises | Existing EDR + internal team |
**Antifragile Angle**: Penetration testing is not a compliance checkbox. It is **controlled failure that teaches you where your kill chain lives**.
---
## IG2 and IG3: The Antifragile Extension
We do not stop at IG1. IG2 and IG3 are implemented selectively based on the organization's kill chain and risk profile:
| IG | When We Pursue It | How We Fund It |
|----|-------------------|----------------|
| IG1 | Always. Non-negotiable 90-day target. | Primarily existing tool configuration |
| IG2 | When the organization processes sensitive data or faces targeted threats. | Reallocated savings from IG1 efficiency |
| IG3 | When the organization is critical infrastructure or faces advanced persistent threats. | Strategic security investment, justified by kill chain analysis |
---
## The IG1-as-Foundation Pitch
> *"CIS IG1 is 56 safeguards. Most organizations we assess have implemented fewer than 20. We are not suggesting you buy 36 new products. We are suggesting you configure what you already own to meet the minimum viable security posture. This is not a procurement project. It is a configuration project. And we can prove value in the first 30 days."*
---
*Next: [NIST CSF Mapping](nist-csf-mapping.md)*
*Previous: [Move Fast and Fix Things](../core/move-fast-and-fix-things.md)*

View File

@@ -0,0 +1,163 @@
# NIST Cybersecurity Framework 2.0 Mapping
> *"The CSF is not a checklist. It is a language for talking about risk. We speak it fluently, but we never let it slow us down."*
This document maps the antifragile rapid modernisation approach to the NIST Cybersecurity Framework (CSF) 2.0 functions. It is designed for consultants who must bridge the gap between operational speed and regulatory or stakeholder expectations.
---
## The Six Functions
NIST CSF 2.0 organizes cybersecurity outcomes into six functions: **GOVERN, IDENTIFY, PROTECT, DETECT, RESPOND, RECOVER**. The antifragile approach treats GOVERN as the missing keystone in most organizations and emphasizes continuous learning across all functions.
### GOVERN
**NIST Definition**: Establish and monitor the organization's cybersecurity risk management strategy, expectations, and policy.
**The Gap**: Most organizations have policies. Few have governance that is **alive**—updated by incidents, informed by stress, and capable of adaptation.
**Antifragile Expression**:
| Rapid Modernisation Phase | Action | Existing Tool Leverage |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Establish kill chain risk register | Spreadsheet or existing GRC tool |
| Hygiene (Days 0-30) | Define T0 asset classification policy | Manual + existing asset management |
| Control (Days 30-60) | Integrate security into change management | Existing ITSM (ServiceNow, Jira, etc.) |
| Antifragility (Days 90-180) | Quarterly governance review tied to incident learning | Existing meeting cadence + decision log |
**Key Principle**: Governance is not a document. It is a **feedback loop** between risk, decision, action, and learning.
### IDENTIFY
**NIST Definition**: Understand the organization's current cybersecurity risks.
**The Gap**: Organizations often know their assets but not their **dependencies**. They know their vulnerabilities but not their **kill chain**.
**Antifragile Expression**:
| Rapid Modernisation Phase | Action | Existing Tool Leverage |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Asset inventory with dependency mapping | Existing AD, EDR, cloud IAM |
| Hygiene (Days 0-30) | External attack surface enumeration | Open-source tools + existing vulnerability scanner |
| Control (Days 30-60) | Vendor and supplier dependency mapping | Existing procurement + IAM data |
| Sovereignty (Days 60-90) | AI usage and data flow discovery | Proxy logs + interviews |
**Key Principle**: Identification is not about completeness. It is about **finding the shortest path to failure and illuminating it**.
### PROTECT
**NIST Definition**: Use safeguards to prevent or reduce cybersecurity risk.
**The Gap**: Protection is often equated with purchasing. We equate it with **configuration, reduction, and ownership**.
**Antifragile Expression**:
| Rapid Modernisation Phase | Action | Existing Tool Leverage |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Identity hardening: disable, rotate, enforce hygiene | Existing AD / IAM |
| Control (Days 30-60) | ASR, MFA, conditional access, PAWs | Microsoft Defender / Entra ID (often already owned) |
| Control (Days 30-60) | Network segmentation and DNS security | Existing firewall and DNS infrastructure |
| Sovereignty (Days 60-90) | Local AI deployment with T0 controls | Existing server hardware or sovereign cloud |
| Antifragility (Days 90-180) | Chaos engineering and graceful degradation | Existing infrastructure + open-source tools |
**Key Principle**: The best protection is not a thicker wall. It is **reducing the attack surface that the wall must defend**.
### DETECT
**NIST Definition**: Find and analyze possible cybersecurity attacks and compromises.
**The Gap**: Detection is often about alert volume. We focus on **signal quality** and the speed of conversion from anomaly to understanding.
**Antifragile Expression**:
| Rapid Modernisation Phase | Action | Existing Tool Leverage |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Centralized logging for critical systems | Existing SIEM or syslog infrastructure |
| Control (Days 30-60) | EDR behavioural detection tuning | Existing EDR |
| Control (Days 30-60) | Network anomaly detection at boundaries | Existing IDS/IPS or Zeek/Suricata |
| Antifragility (Days 90-180) | AI-assisted log analysis and threat hunting | Local AI pilot on proprietary data |
**Key Principle**: Detection is not about seeing everything. It is about **seeing the thing that matters before it becomes the thing that kills you**.
### RESPOND
**NIST Definition**: Take action regarding a detected cybersecurity incident.
**The Gap**: Response is often reactive and manual. We build **pre-positioned capability** that activates faster than human coordination.
**Antifragile Expression**:
| Rapid Modernisation Phase | Action | Existing Tool Leverage |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | IR contact matrix and escalation paths | Existing communication tools |
| Control (Days 30-60) | Automated containment for high-confidence alerts | Existing SOAR or scripted playbooks |
| Sovereignty (Days 60-90) | AI-specific incident response runbooks | Existing IR framework + local knowledge |
| Antifragility (Days 90-180) | Red team validation of response speed | Internal or external red team |
**Key Principle**: Response is not about heroics. It is about **the mean time between detection and containment approaching zero**.
### RECOVER
**NIST Definition | Restore assets and operations affected by cybersecurity incidents.
**The Gap**: Recovery is often theoretical. Backups exist but have never been tested. Runbooks exist but have never been executed.
**Antifragile Expression**:
| Rapid Modernisation Phase | Action | Existing Tool Leverage |
|--------------------------|--------|------------------------|
| Hygiene (Days 0-30) | Backup coverage inventory and gap analysis | Existing backup solution |
| Sovereignty (Days 60-90) | Live recovery drill: one critical system | Existing backup solution |
| Antifragility (Days 90-180) | Quarterly recovery drills with automation | Existing backup + orchestration scripts |
| Antifragility (Days 90-180) | Chaos engineering: simulate infrastructure failure | Existing infrastructure + open-source tools |
**Key Principle**: Recovery is not about having backups. It is about **knowing—provably—that you can rebuild faster than your adversary can destroy**.
---
## The Antifragile CSF Profile
A CSF Profile describes the organization's current and target state. The antifragile profile is distinctive:
| Function | Typical Organization | Antifragile Organization |
|----------|---------------------|-------------------------|
| **GOVERN** | Annual policy review | Continuous governance updated by every incident |
| **IDENTIFY** | Asset inventory updated quarterly | Real-time dependency mapping with kill chain focus |
| **PROTECT** | Layered defenses purchased annually | Reduced attack surface through ownership and decoupling |
| **DETECT** | SIEM with thousands of daily alerts | High-signal detection with AI-assisted analysis |
| **RESPOND** | Incident response plan in a binder | Automated containment with human oversight |
| **RECOVER** | Backups with annual test | Quarterly validated recovery with chaos engineering |
---
## Communicating to Auditors and Regulators
When auditors ask how the antifragile approach maps to "accepted frameworks":
> *"Our approach is fully aligned with NIST CSF 2.0. We emphasize GOVERN as the enabling function and integrate continuous learning across IDENTIFY, PROTECT, DETECT, RESPOND, and RECOVER. Our 180-day roadmap delivers measurable maturity improvement against every CSF function, with evidence produced at each phase gate."*
**Evidence Package per Phase**:
| Phase | CSF Functions Addressed | Evidence Produced |
|-------|------------------------|-------------------|
| Hygiene (0-30 days) | GOVERN, IDENTIFY | Asset inventory, risk register, kill chain analysis |
| Control (30-60 days) | PROTECT, DETECT | Configuration baselines, detection rule effectiveness, MFA coverage |
| Sovereignty (60-90 days) | PROTECT, GOVERN | Local AI deployment evidence, vendor risk assessments, recovery drill results |
| Antifragility (90-180 days) | All six | Chaos experiment reports, structural fix metrics, maturity assessment |
---
## Crosswalk: NIST CSF ↔ CIS Controls ↔ Antifragile Actions
| NIST CSF Function | CIS Controls v8 | Antifragile Action |
|-------------------|-----------------|-------------------|
| GOVERN | Control 1, 2 (governance integration) | Kill chain risk register, T0 classification |
| IDENTIFY | Control 1, 2, 7 | Asset census, dependency mapping, shadow AI discovery |
| PROTECT | Control 4, 5, 6, 9, 10, 11, 12, 15 | ASR, MFA, PAWs, local AI, backup validation |
| DETECT | Control 8, 13 | Centralized logging, EDR tuning, network sensors |
| RESPOND | Control 17 | Automated containment, IR runbooks, red team validation |
| RECOVER | Control 11, 18 | Recovery drills, chaos engineering, structural improvement |
---
*Previous: [CIS Controls Mapping](cis-controls-mapping.md)*

View File

@@ -0,0 +1,292 @@
# Vertical Reference: Banking and Financial Services
> *"A bank's trust is its only real asset. Technical debt in security is a withdrawal from that account."*
This document adapts the antifragile rapid modernisation approach for banking and financial services—one of the most regulated, most targeted, and most technologically heterogeneous sectors. Banks face adversaries ranging from criminal syndicates to nation-states, while navigating DORA, PSD2, GDPR, NIS2, and national banking regulations.
---
## The Banking Security Context
### What Makes Banking Different
| Factor | Enterprise Default | Banking Reality |
|--------|-------------------|-----------------|
| Regulatory density | Moderate | Extreme (DORA, PSD2, GDPR, NIS2, Basel, national banking laws) |
| Adversary motivation | Financial (ransomware, fraud) | Financial + espionage + destabilization |
| Transaction speed | Batch, daily | Real-time, 24/7, instant payments |
| Legacy systems | 5-10 years old | 20-40 years old (mainframes, COBOL) |
| Third-party reliance | Moderate | High (fintech APIs, payment processors, SWIFT) |
| Data sensitivity | Personal data | Personal + financial + transaction patterns + behavioural biometrics |
### The Legacy Problem
Many banks run core banking systems on mainframes or mid-range systems that predate modern security architecture. These systems:
- Use legacy authentication (no MFA natively)
- Log minimally or opaquely
- Have no API layer; integration occurs via file transfer or terminal emulation
- Run on operating systems with limited patch support
Our approach does not demand legacy replacement. It demands **compensating controls** and **isolation architecture**.
---
## Regulatory Landscape
### DORA (Digital Operational Resilience Act) — EU
Effective January 2025, DORA imposes comprehensive ICT risk management requirements on EU financial entities.
| DORA Requirement | Antifragile Application |
|-----------------|------------------------|
| ICT risk management framework (Article 6) | Kill chain analysis as primary risk methodology; T0 asset classification for critical banking systems |
| ICT-related incident management (Article 10) | Sub-hour detection and containment targets; automated reporting to lead overseer |
| Digital operational resilience testing (Article 11) | Quarterly recovery drills for core banking; annual red team; threat-led penetration testing (TLPT) |
| ICT third-party risk (Article 12) | Vendor exit architectures for all critical ICT providers; contract clawbacks for security failures |
| Information sharing (Article 14) | Anonymized incident signals shared via sector ISACs; defensive AI trained on collective threat data |
### PSD2 (Revised Payment Services Directive)
| PSD2 Requirement | Security Implication |
|-----------------|---------------------|
| Strong Customer Authentication (SCA) | MFA for payment initiation and account access |
| Dynamic linking | Authentication code must be specific to transaction amount and payee |
| Secure communication | TLS 1.2+, mutual authentication for TPP APIs |
| Access for TPPs (Third Party Providers) | New API attack surface; strict OAuth scope control |
### NIS2 for Systemic Banks
Systemic banks fall under NIS2 as "essential entities" with:
- 24-hour incident reporting to CSIRT
- Supply chain security obligations
- Board-level accountability for cybersecurity
### National Regulations
| Jurisdiction | Key Regulation |
|-------------|---------------|
| Germany | BAIT (BAIT-Rahmen), MaRisk |
| UK | CBEST, STAR, SYSC |
| US | FFIEC guidelines, SOX, GLBA |
| Switzerland | FINMA Circular 2023/1 |
---
## The Antifragile Posture for Banking
### Pillar 1: Structural Decoupling — Core Banking Isolation
**Principle**: The core banking system must be structurally isolated from internet-facing channels, third-party APIs, and general corporate IT.
**Antifragile Moves**:
| Layer | Isolation Requirement |
|-------|----------------------|
| **Channel layer** | Internet banking, mobile apps, open banking APIs → DMZ, WAF, API gateway |
| **Integration layer** | API gateway, middleware, ESB → validates, transforms, rate-limits all traffic |
| **Core layer** | Core banking, payments engine, general ledger → no direct internet; access only via integration layer |
| **Data layer** | Customer databases, transaction history → encrypted at rest; access via service accounts only |
| **Reporting layer** | Data warehouse, BI, regulatory reporting → read-only from core; no write-back |
**The Conversation**:
> *"Your core banking system is a Tier 0 asset. It should not know the internet exists. Every request must pass through an integration layer that validates, logs, and rate-limits. If a mobile app vulnerability is exploited, the adversary should hit the API gateway—not the general ledger."*
### Pillar 2: Optionality Preservation — Fintech and TPP Independence
**Principle**: Open banking and fintech integration create dependencies. The bank must retain the option to disconnect, replace, or limit any third party without operational paralysis.
**Antifragile Moves**:
- **API abstraction layer**: All TPP connections via bank-controlled API gateway; no direct TPP-to-core connections
- **Scope-limited OAuth**: TPP tokens granted only for specific accounts, specific data sets, specific time windows
- **Circuit breakers**: Automatic disconnection of TPPs exhibiting anomalous behaviour (high request rates, unusual data access patterns)
- **TPP risk register**: Every connected TPP rated for security maturity with quarterly re-assessment
- **Exit architecture**: Technical and contractual ability to revoke TPP access within 1 hour
### Pillar 3: Stress-to-Signal Conversion — Fraud as Intelligence
**Principle**: Every fraud attempt, successful or not, is free threat intelligence. The bank must learn faster than the adversary adapts.
**Antifragile Moves**:
- **Real-time fraud detection**: Local AI models trained on proprietary transaction data to detect anomalies without cloud exfiltration
- **Fraud-to-structure pipeline**: Every confirmed fraud case must produce at least one control improvement
- **Behavioral biometrics**: Device fingerprinting, typing cadence, mouse movement patterns—signals that improve with volume
- **Mule account detection**: Graph analysis on account opening and transaction patterns to identify money laundering networks
### Pillar 4: Sovereign Intelligence — Payments Data Never Leaves
**Principle**: Payment transaction data reveals economic behaviour, business relationships, and operational patterns. It must never train a third-party AI.
**Antifragile Moves**:
- **Local fraud models**: Train on transaction history, merchant categories, geolocation, and temporal patterns locally
- **On-premise transaction monitoring**: AML/sanctions screening engines run on bank-controlled hardware
- **Closed-loop analytics**: Customer segmentation, product recommendation, and risk scoring using local models
- **Data residency by design**: Primary data storage in national or EU jurisdiction; encryption keys in HSM under bank control
**The Conversation**:
> *"Your payments data is not just customer data. It is a map of your economy. Sending it to a cloud AI for 'fraud optimization' is not a technology partnership. It is an intelligence transfer. Local models. Local hardware. Local keys."*
### Pillar 5: Asymmetric Payoff — Resilience Over Perfection
**Principle**: Banks cannot prevent all fraud or all attacks. The antifragile bank designs systems where small security investments yield disproportionate reductions in catastrophic risk.
**Antifragile Moves**:
- **Segmented transaction limits**: Real-time limits by channel, geography, time, and customer segment; limits the blast radius of compromised credentials
- **Synthetic account testing**: Maintain honeypot accounts that alert on any access attempt
- **Rapid account freezing**: Sub-60-second ability to freeze accounts, revoke tokens, and block cards
- **Distributed ledger backup**: Critical transaction records replicated to immutable, geographically distributed storage
---
## The Rapid Modernisation Plan: Banking Variant
### Phase 1: Hygiene (Days 0-30) — Banking-Specific Additions
In addition to standard hygiene:
| Action | Owner | Deliverable | Regulatory Link |
|--------|-------|-------------|----------------|
| Inventory all systems processing payment data | Security / Architecture | PCI-DSS / payment system asset inventory | PSD2, PCI-DSS |
| Map all open banking / TPP connections | API Team | TPP connection matrix with data flows | PSD2 |
| Audit SWIFT infrastructure access and messaging | Security / Treasury | SWIFT CSP compliance gap analysis | SWIFT CSP |
| Verify data residency for customer and transaction data | Legal / Cloud | Data residency attestation | GDPR, DORA |
| Inventory cryptographic key material and HSMs | Security | Key management inventory | DORA, national crypto regs |
### Phase 2: Control (Days 30-60) — Banking-Specific Additions
| Action | Owner | Deliverable | Regulatory Link |
|--------|-------|-------------|----------------|
| Implement API gateway security: rate limiting, OAuth scope enforcement, input validation | API / Security | API security configuration audit | PSD2, DORA |
| Harden SWIFT infrastructure: dedicated network, restricted access, CSP controls | Security / Treasury | SWIFT CSP self-assessment | SWIFT CSP |
| Deploy tokenization for card data where not already present | Security / Payments | Tokenization coverage report | PCI-DSS |
| Implement privileged access vaulting for core banking admins | Security | PAM coverage for core banking | DORA, internal audit |
| Encrypt all backup and archive data with HSM-managed keys | Backup / Security | Encryption coverage report | GDPR, DORA |
### Phase 3: Sovereignty (Days 60-90) — Banking-Specific Additions
| Action | Owner | Deliverable | Regulatory Link |
|--------|-------|-------------|----------------|
| Deploy local AI for fraud detection pilot | AI / Fraud | Fraud detection model with false positive/negative rates | DORA (resilience testing) |
| Conduct core banking recovery drill | Operations / Security | Recovery time objective (RTO) validation | DORA Article 11 |
| Test TPP disconnection procedure | API / Security | TPP revocation time measurement | PSD2, DORA |
| Validate incident reporting automation to regulator | Security / Legal | Automated reporting pipeline test | DORA Article 10 |
### Phase 4: Antifragility (Days 90-180) — Banking-Specific Additions
| Action | Owner | Deliverable | Regulatory Link |
|--------|-------|-------------|----------------|
| Threat-led penetration testing (TLPT) | External / Security | TLPT report with remediation | DORA Article 11 |
| Chaos engineering on channel layer (non-production) | Resilience | Chaos experiment findings | DORA resilience testing |
| Red team exercise including TPP exploitation | Security | Red team report with kill chain | DORA, internal audit |
| Board-level cybersecurity briefing with antifragile metrics | CISO / Board | Quarterly board report | DORA governance, NIS2 |
---
## SWIFT Customer Security Programme (CSP)
For banks using SWIFT messaging:
| CSP Control | Antifragile Implementation |
|------------|---------------------------|
| 1.1: Restrict Internet Access | SWIFT infrastructure on dedicated VLAN with no internet; jump host access only |
| 1.2: Secure the Operating System | Hardened OS baseline, automated patching, application whitelisting |
| 1.3: Restrict Logical Access | Vaulted credentials, MFA, session recording for all SWIFT access |
| 1.4: Malware Protection | EDR on SWIFT workstations, network segmentation, email security |
| 1.5: Software Integrity | Signed software only, integrity monitoring, change control |
| 2.1: Internal Data Flow Security | Encryption for all SWIFT data in transit within the bank |
| 2.2: Security Event Monitoring | Dedicated logging for SWIFT infrastructure; alerting on anomalous access |
| 2.3: Transaction Business Controls | Dual authorization for high-value messages; anomaly detection on message patterns |
| 2.4: Connection Integrity | Mutual TLS, certificate pinning, connection anomaly detection |
| 2.5: Service Providers | Due diligence on SWIFT service bureaus; exit clauses; audit rights |
| 2.6: Customer Environment Security | Annual self-assessment with independent validation |
| 2.7: Penetration Testing | Annual penetration testing of SWIFT infrastructure |
| 2.8: Cyber Incident Information Sharing | Participation in sector ISACs; anonymized threat sharing |
| 2.9: Transaction Controls for Funds Transfers | Additional validation for high-risk corridors and counterparties |
| 2.10: Operational Risk Management | Integration of SWIFT risk into enterprise operational risk framework |
| 2.11: Security Awareness Training | Role-specific training for SWIFT operators and administrators |
---
## M365 in Banking
Banks often use M365 for corporate functions while maintaining strict separation from payment systems.
| Consideration | Banking Requirement |
|--------------|---------------------|
| **License tier** | E3 is common; E5 for security/ compliance officers. Defender for Office 365 P2 strongly recommended for email security. |
| **Data loss prevention** | E3 has no native DLP. Critical gap for banks. Recommend Purview or third-party DLP. |
| **Email archiving** | 7+ year immutable retention for regulatory inquiries. Requires Exchange Online Plan 2 or add-on. |
| **eDiscovery** | Legal hold and eDiscovery required for litigation and regulatory requests. Purview required for advanced features. |
| **Customer data in M365** | Strictly prohibit customer PII in Teams/SharePoint unless DLP and encryption are active |
| **Third-party apps** | Disable user consent; require admin approval for all enterprise apps |
| **Mobile access** | Intune-managed devices only; block unmanaged device access to email and SharePoint |
See [M365 E3 Hardening](../playbooks/m365-e3-hardening.md) for tactical guidance, and apply these banking overlays.
---
## Core Banking and Legacy System Security
### Compensating Controls for Legacy
When core banking systems cannot be modernized directly:
| Legacy Limitation | Compensating Control |
|------------------|---------------------|
| No native MFA | Place terminal access behind PAM vault with MFA gate; no direct user login |
| Minimal logging | Deploy screen/session recording for all access; instrument file transfers |
| No encryption in transit | Force all connectivity through TLS-terminating proxy or VPN |
| Weak password policies | Vault all service account passwords; rotate automatically; no human knowledge |
| No patch support | Isolate on dedicated network segment; application whitelisting; intrusion detection |
| File-based integration | Scan all files at transfer points; validate checksums; log all movements |
### The Integration Layer as Security Boundary
For banks with legacy core systems, the integration layer (API gateway, ESB, middleware) becomes the **security control point**:
- All authentication modernized at the integration layer
- All logging enriched at the integration layer
- All rate limiting and circuit breaking enforced at the integration layer
- All input validation performed at the integration layer
The core banking system sees only validated, logged, controlled traffic.
---
## Cryptography and Key Management
Banking regulators are increasingly specific about cryptographic controls.
| Control | Implementation |
|---------|---------------|
| **Key generation** | HSM-generated for all production keys; dual control for key ceremonies |
| **Key storage** | HSM or hardware-backed key stores only; no software-only keys for signing or encryption |
| **Key rotation** | Automated rotation for TLS keys; annual rotation for long-term signing keys |
| **Quantum readiness** | Inventory all cryptographic implementations; begin crypto-agility planning |
| **Key escrow** | Split knowledge for backup keys; geographic separation of escrow components |
---
## Evidence Package for Regulators and Auditors
| Regulatory Request | Evidence from Antifragile Program |
|-------------------|----------------------------------|
| DORA ICT risk framework | Kill chain analysis, T0 asset register, risk-based vulnerability prioritization |
| DORA resilience testing | Quarterly recovery drill reports, annual TLPT/penetration test, chaos engineering results |
| DORA incident reporting | Mean-time-to-detect, mean-time-to-contain, automated reporting pipeline test results |
| DORA third-party risk | Vendor risk register, exit architectures, contract security clauses |
| PSD2 SCA compliance | MFA coverage report, dynamic linking validation, TPP access audit |
| SWIFT CSP | Self-assessment with independent validation, penetration test report |
| GDPR data protection | Data residency attestation, encryption coverage, DLP policy, breach notification test |
| Internal audit | Antifragile maturity assessment, control effectiveness metrics, remediation tracking |
---
*Previous: [Vertical: Power Utilities](vertical-power-utilities.md)*

View File

@@ -0,0 +1,297 @@
# Vertical Reference: Power and Utilities
> *"The grid does not care about your quarterly targets. It cares whether you understood the boundary between IT and operations before the adversary did."*
This document adapts the antifragile rapid modernisation approach for power generation, transmission, distribution, and water utilities. These organizations operate industrial control systems (ICS/SCADA) where safety and availability are paramount, regulatory oversight is intense, and the convergence of IT and OT creates existential attack surfaces.
---
## The Power and Utility Context
### What Makes This Sector Different
| Factor | Enterprise Default | Power/Utility Reality |
|--------|-------------------|----------------------|
| Downtime tolerance | Hours | Seconds to minutes (protection systems); hours for generation |
| Safety impact | Data loss, financial harm | Physical harm, loss of life, environmental catastrophe |
| System lifetime | 3-5 years | 20-40 years (generation, transmission, protection relays) |
| Regulatory driver | GDPR, industry standards | NIS2, CER, IEC 62351, NERC CIP (North America), national energy regulators |
| OT/IT boundary | Often porous or nonexistent | Legally and physically mandated; convergence is the primary risk |
| Supply chain | Moderate depth | Extreme (multi-vendor, multi-national, obsolete equipment) |
| Remote access | Common, convenient | Heavily restricted; often requires physical presence or dedicated lines |
### The IT/OT Convergence Problem
Power utilities historically operated OT networks (SCADA, EMS, DMS, protection relays) as **air-gapped systems**. Over the past two decades, convergence has introduced:
- Remote diagnostics over internet-connected VPNs
- Centralized patch management through IT SCCM/WSUS
- Business intelligence systems reading OT historian data
- Vendor remote support terminals in control centers
- Smart grid and Advanced Metering Infrastructure (AMI) connecting customer-facing IT to grid operations
Every convergence point is a **potential bridge for adversaries** from IT to OT.
**The executive framing**:
> *"Your control room does not need email. Your protection relays do not need internet access. Every connection between your IT network and your operational technology is a connection an adversary can cross. We are not adding bureaucracy. We are re-establishing the boundary that keeps the lights on."*
---
## Regulatory Landscape
### EU NIS2 Directive (2023)
Power utilities and water suppliers are classified as **essential entities** under NIS2.
| NIS2 Requirement | Power/Utility Application |
|-----------------|--------------------------|
| Risk management measures | Kill chain analysis for IT→OT bridges; physical security assessment |
| Supply chain security | Vendor access inventory for all OT equipment; firmware provenance tracking |
| Incident reporting (24h → 72h) | Automated detection and reporting to national CSIRT and energy regulator |
| Business continuity | Black start capability; grid islanding procedures; manual override validation |
| Cryptography | Encrypted communications for all IT/OT integration points |
| MFA | Hardware tokens for all remote access to OT or critical IT systems |
| Vulnerability handling | Risk-based prioritization with **safety impact assessment** |
### CER Directive (Critical Entities Resilience)
Requires power utilities to demonstrate resilience against:
- Natural disasters
- Cyberattacks
- Supply chain disruptions
- Pandemics and workforce unavailability
**Antifragile application**: Chaos engineering for non-safety systems; cross-training for manual procedures; distributed spare parts inventory.
### Sector-Specific Standards
| Standard | Scope |
|----------|-------|
| **IEC 62351** | Power systems cybersecurity: communications protocols, authentication, encryption |
| **IEC 61850** | Substation communication (GOOSE, SV); security extensions for IEC 61850-90-20 |
| **NERC CIP** | North American electric reliability; mandatory standards with heavy penalties |
| **ENTSO-E Cybersecurity Guidance** | European transmission system operator requirements |
| **BDEW Whitepaper** | German energy sector cybersecurity best practices |
---
## The Antifragile Posture for Power and Utilities
### Pillar 1: Structural Decoupling — The IT/OT Firewall
**Principle**: IT and OT must be decoupled to the maximum extent compatible with operational requirements. The air gap is the default. Any bridge must be justified, documented, and monitored.
**Antifragile Moves**:
| Action | Implementation | Priority |
|--------|---------------|----------|
| **Network segmentation** | Physically separate IT and OT; unidirectional gateway or data diode for IT→OT data flows | P0 |
| **No AD trust to OT** | OT AD (if any) must be a separate forest with one-way trust or no trust | P0 |
| **Jump host architecture** | All IT-to-OT access via hardened, monitored jump hosts with session recording | P1 |
| **Vendor access airlock** | Vendor VPNs terminate in dedicated DMZ; no direct OT access; remote hands or on-site escort for OT | P1 |
| **Remove internet from OT** | OT VLANs have no direct internet egress; updates via offline media or controlled proxy | P0 |
| **AMI/ Smart Grid isolation** | Advanced Metering Infrastructure on dedicated network; no direct path to SCADA or EMS | P1 |
### Pillar 2: Optionality Preservation — Vendor and Technology Independence
**Principle**: Power utilities depend on vendors for SCADA, protection relays, turbine control, and substation automation. This dependency must not become a single point of failure.
**Antifragile Moves**:
- **Multi-vendor strategy for critical systems**: No single vendor should control >50% of protection, control, or monitoring functions
- **Spare parts inventory**: Maintain critical spares for legacy OT equipment that vendors no longer support
- **Firmware escrow and provenance**: Require vendors to deposit firmware; verify cryptographic signatures before deployment
- **Local competence**: Train internal staff to operate and maintain systems without vendor support for 30 days
- **Protocol independence**: Where possible, support multiple communication protocols to avoid single-vendor lock-in
### Pillar 3: Stress-to-Signal Conversion — OT Incident Learning
**Principle**: OT incidents are rare but high-impact. The organization must learn from every anomaly, near-miss, and exercise.
**Antifragile Moves**:
- **OT security operations centre (SOC) integration**: Feed OT alarms into the SOC with analysts trained on industrial protocols
- **Monthly tabletop exercises**: Simulate OT-specific scenarios (compromised EMS, rogue protection relay settings, ransomware on engineering workstations)
- **Post-incident structural mandate**: Every OT incident or near-miss must produce at least one architectural or procedural change
- **Red team with bounded OT scope**: Annual exercise including OT reconnaissance, constrained by safety requirements
### Pillar 4: Sovereign Intelligence — Local AI for the Grid
**Principle**: Grid data is among the most sensitive an organization possesses. It reveals generation capacity, topology, switching patterns, load profiles, and operational routines.
**Antifragile Moves**:
- **Local AI for OT anomaly detection**: Analyze historian data, DCS logs, and protection relay events without cloud exfiltration
- **Closed-loop digital twin**: Train models on local OT data to predict equipment failures; never export raw telemetry
- **Air-gapped AI inference**: Deploy inference nodes in OT DMZ with no return path to IT or internet
- **Load forecasting sovereignty**: Local models for demand prediction using proprietary grid data
**The executive framing**:
> *"Your grid data tells an adversary exactly when and where to strike. It tells a competitor your capacity constraints. Sending it to a cloud AI for 'optimization' is not a technology decision. It is a national security and competitive intelligence decision. Local models on local hardware. Full stop."*
### Pillar 5: Asymmetric Payoff — Resilience Over Prevention
**Principle**: In power utilities, perfect prevention is impossible. The goal is to survive and recover faster than the adversary can exploit.
**Antifragile Moves**:
- **Black start capability**: Maintain the ability to restart the grid from shutdown without external power
- **Grid islanding**: Design systems so that sections can disconnect and operate independently during disturbances
- **Manual override procedures**: Every automated system must have a documented, tested manual procedure
- **Redundant communication paths**: Power line carrier, microwave, satellite backup for SCADA and protection communications
- **Protection relay independence**: Electromechanical or static relays as backup for digital relays in critical paths
---
## The Rapid Modernisation Plan: Power/Utility Variant
### Phase 1: Hygiene (Days 0-30)
In addition to standard hygiene:
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Inventory all OT assets: DCS, SCADA, EMS, protection relays, RTUs, AMI | OT Security / Engineering | OT asset inventory with vendor and firmware versions |
| Map all IT-to-OT network connections | Network / OT | Connection matrix with business justification per connection |
| Audit vendor remote access: who, how, when, for how long | OT Security / Procurement | Vendor access log and hardened policy |
| Identify OT systems with internet connectivity | Network | List with immediate remediation plan |
| Document manual override procedures for critical systems | OT Engineering | Procedure manual, signed off by operations and safety |
| Validate backup of EMS / DMS configurations | OT Engineering | Backup integrity test report |
### Phase 2: Control (Days 30-60)
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Implement network segmentation: IT/OT DMZ with unidirectional gateway | Network / OT | Segmentation architecture and validated firewall rules |
| Harden vendor access: time-bounded, session-recorded, MFA with hardware tokens | OT Security | Vendor access gateway operational |
| Enable OT logging: historian, DCS, firewall, protection relay events | OT Security | Centralized OT log aggregation (air-gapped SIEM or historian) |
| Patch OT systems: test in lab, deploy in maintenance windows | OT Engineering | Patch management procedure with safety gates |
| Secure engineering workstations (EWS): application whitelisting, no internet | OT Security | EWS hardening standard deployed |
### Phase 3: Sovereignty (Days 60-90)
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Deploy local AI for OT anomaly detection pilot | AI / OT Security | OT anomaly detection with false positive tuning |
| Validate black start / islanding procedures | Operations | Test report with time-to-recovery metrics |
| Conduct OT-specific tabletop exercise | Security / Operations | Exercise report with structural improvements |
| Implement firmware integrity monitoring | OT Security | Baseline hashes for critical OT firmware |
| Test protection relay fail-over to electromechanical backup | Engineering | Fail-over test report |
### Phase 4: Antifragility (Days 90-180)
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Annual red team with bounded OT scope | Security | Red team report with kill chain analysis |
| Chaos engineering on non-safety IT systems | Resilience | Monthly experiment schedule and findings |
| Vendor exit architecture for critical OT platforms | Procurement / Engineering | 90-day vendor transition plan per critical system |
| Cross-training: operations staff on manual procedures | Operations | Training completion metrics |
| Participate in sector ISAC information sharing | Security | Threat intelligence integration report |
---
## Substation and Protection Specifics
### IEC 61850 Security
IEC 61850 (substation communication) uses GOOSE and Sampled Values (SV) that were not designed with security in mind.
**Hardening priorities**:
- **IEC 61850-90-20**: Implement cybersecurity recommendations for IEC 61850 networks
- **Authentication**: Digitally sign GOOSE messages where IEDs support it
- **Network segmentation**: GOOSE/SV traffic on dedicated VLAN; no routing to IT networks
- **IED hardening**: Disable unused services; change default passwords; enable logging
- **Configuration management**: Version control for SCL files; change detection for IED settings
### Protection Relay Security
Protection relays are the **safety-critical edge** of the grid. Compromise can cause physical damage.
| Control | Implementation |
|---------|---------------|
| Access control | Vaulted credentials; multi-person approval for settings changes |
| Logging | All settings changes logged with before/after values |
| Integrity | Cryptographic checksums for firmware and settings files |
| Redundancy | Independent protection schemes (e.g., distance + differential) |
| Manual backup | Electromechanical or static relay backup for critical digital protections |
---
## Generation-Specific Considerations
### Thermal / Nuclear / Hydro
| Generation Type | Specific Risk | Control |
|----------------|--------------|---------|
| **Thermal** | Turbine control system compromise | Dedicated turbine control network; no IT connectivity |
| **Nuclear** | Safety system interference | Air-gapped safety systems; regulatory compliance with national nuclear authority |
| **Hydro** | Dam control / spillway gate manipulation | Physical controls for critical water management; redundant level sensors |
| **Renewables** | Inverter-based resource (IBR) vulnerability | Secure firmware updates; anti-islanding protection; grid support function validation |
### Distributed Energy Resources (DER)
Solar, wind, and battery inverters connect to the distribution grid with varying security maturity.
- **Action**: DER interconnection standards must include cybersecurity requirements
- **Action**: Monitor DER communications for anomalous commands or settings changes
- **Action**: Aggregate DER visibility in DMS/ADMS without direct control paths
---
## Water and Wastewater Utilities
Water utilities share many characteristics with power but have additional concerns:
| Concern | Application |
|---------|-------------|
| **Safety** | Contamination prevention, pressure management, chemical dosing control |
| **SCADA/OT** | Treatment plant automation, distribution pump control, reservoir level management |
| **Criticality** | Water is life-sustaining; outages have immediate public health impact |
| **Regulation** | EPA (US), Drinking Water Inspectorate (UK), national health authorities |
**Additional controls for water utilities**:
- **Physical security** for treatment chemicals (chlorine, fluoride) to prevent intentional contamination
- **Redundant water quality sensors** with cross-validation
- **Manual override capability** for all automated chemical dosing systems
- **Isolation of IT from operational water quality monitoring**
---
## M365 in Power and Utilities
Corporate IT in power utilities uses M365 but must be strictly separated from OT.
| Consideration | Power/Utility Requirement |
|--------------|--------------------------|
| **Data residency** | M365 data in EU/national datacenters; verify tenant location |
| **Conditional access** | Block M365 access from non-corporate devices for privileged users; geo-restrict admin access |
| **Guest access** | Strictly prohibit in OT-connected tenants; heavily vet in corporate tenant |
| **Teams / SharePoint** | Never used for OT document sharing or control room communication |
| **Mobile device management** | Field engineer tablets Intune-managed; restricted app installation |
| **Email security** | EOP baseline minimum; Defender for Office 365 P2 recommended for critical infrastructure |
See [M365 E3 Hardening](../playbooks/m365-e3-hardening.md) for tactical hardening, and apply these overlays.
---
## Evidence Package for Regulators
| Requirement | Evidence from Antifragile Program |
|------------|----------------------------------|
| NIS2 risk management | Kill chain analysis, T0 asset classification, IT/OT connection matrix |
| NIS2 incident handling | IR runbooks, OT-specific response procedures, quarterly drill reports |
| NIS2 business continuity | Black start test reports, islanding validation, manual procedure verification |
| NIS2 supply chain security | Vendor risk register, firmware provenance, vendor exit architectures |
| NIS2 encryption | Data classification with encryption mapping, TLS configuration audits |
| NIS2 vulnerability handling | Vulnerability scan reports with safety-impact prioritization |
| CER resilience | Chaos engineering results, cross-training metrics, spare parts inventory |
---
*Previous: [NIST CSF Mapping](nist-csf-mapping.md)*
*Next: [Vertical: Telco](vertical-telco.md)*

View File

@@ -0,0 +1,307 @@
# Vertical Reference: Telecommunications
> *"A telco's network is its nervous system. Compromise it, and you do not just steal data—you control the medium through which a nation communicates."*
This document adapts the antifragile rapid modernisation approach for telecommunications providers—mobile network operators, fixed-line operators, internet service providers, and converged operators. These organizations manage national infrastructure, process massive volumes of subscriber data, and face adversaries ranging from criminal fraudsters to nation-state actors seeking communications intelligence.
---
## The Telecommunications Context
### What Makes Telco Different
| Factor | Enterprise Default | Telco Reality |
|--------|-------------------|---------------|
| Scale | Thousands of endpoints | Millions of subscribers, hundreds of thousands of network elements |
| Real-time requirement | Batch acceptable | Call setup, SMS, data sessions are real-time; latency matters |
| Regulatory driver | GDPR, industry standards | GDPR + NIS2 + telecom-specific security frameworks + national licensing conditions |
| Adversary motivation | Financial (ransomware, fraud) | Financial + espionage + surveillance + network disruption |
| Signaling exposure | Minimal | SS7, Diameter, GTP, SIP are exposed to hundreds of partner networks globally |
| Supply chain | Moderate | Extreme (equipment vendors from multiple geopolitical blocs, legacy switches, proprietary protocols) |
| Customer data depth | Personal data | Personal + location + communication patterns + device identity + lawful intercept capability |
### The Convergence Challenge
Telcos are converging previously separate networks:
- **Fixed and mobile** (FMC — Fixed Mobile Convergence)
- **IT and network** (cloud-native 5G core, NFV, SDN)
- **Consumer and enterprise** (unified platforms, shared infrastructure)
- **Communications and content** (streaming, advertising, IoT platforms)
Every convergence multiplies the attack surface and blurs accountability.
---
## Regulatory Landscape
### EU NIS2 Directive (2023)
Telcos are classified as **essential entities** under NIS2 with stringent obligations.
| NIS2 Requirement | Telco Application |
|-----------------|------------------|
| Risk management measures | Network-wide kill chain analysis; signaling security assessment |
| Supply chain security | Equipment vendor risk (especially high-risk vendors); firmware provenance |
| Incident reporting (24h → 72h) | Automated detection and reporting to national regulator and ENISA |
| Business continuity | Network resilience testing; disaster recovery for core network functions |
| Cryptography | Encryption for signaling, management, and subscriber data |
| MFA | Hardware tokens for all core network and network management access |
| Vulnerability handling | Rapid patching of network elements with service continuity planning |
### Telecom-Specific Security Frameworks
| Framework | Scope |
|-----------|-------|
| **ETSI EN 303 645** | Cybersecurity for consumer IoT devices (relevant for telco IoT offerings) |
| **GSMA FS.38** | Fraud and security framework for mobile operators |
| **GSMA Network Equipment Security Assurance Scheme (NESAS)** | Vendor security assessment for 5G equipment |
| **3GPP SA3** | Security architecture and procedures for mobile systems |
### National Telecom Security Frameworks
Many EU member states have additional national requirements:
- **Germany**: Telekommunikation-Sicherheitsverordnung (TSI)
- **UK**: Telecommunications (Security) Act 2021
- **France**: ANSSI guides for operators of vital importance
---
## The Antifragile Posture for Telecommunications
### Pillar 1: Structural Decoupling — Network Segmentation
**Principle**: The core network must be structurally isolated from internet-facing services, enterprise IT, and third-party APIs.
**Antifragile Moves**:
| Layer | Isolation Requirement |
|-------|----------------------|
| **Core network** | Signaling (MME, AMF, HSS/UDM, PCRF/PCF) on dedicated network; no direct internet access |
| **Radio access network (RAN)** | gNodeB / eNodeB management plane separated from user plane; no direct core access from RAN management |
| **Customer-facing services** | BSS (billing, CRM), OSS (operations), customer portals in DMZ with strict core access controls |
| **Enterprise services** | MPLS, SD-WAN, dedicated APNs on isolated infrastructure segments |
| **IoT platforms** | Dedicated network slice or APN; no direct subscriber data access without API gateway |
| **Interconnect** | SS7, Diameter, SIP, GTP signaling firewalls at every partner boundary |
### Pillar 2: Optionality Preservation — Vendor and Protocol Independence
**Principle**: Telcos depend on a small number of equipment vendors for core network functions. This concentration is a strategic vulnerability.
**Antifragile Moves**:
- **Multi-vendor RAN**: Open RAN architectures reduce dependency on single radio vendors
- **Cloud-native core portability**: 5G core deployed on container platforms portable across cloud providers
- **Protocol abstraction**: API gateways abstract subscriber-facing services from core network protocols
- **Vendor exit architecture**: Technical ability to replace core network vendor within defined timeframe
- **Firmware diversity**: Avoid identical firmware versions across all instances of a network element
### Pillar 3: Stress-to-Signal Conversion — Fraud and Attack Intelligence
**Principle**: Telcos process billions of transactions. Every fraud attempt, signaling anomaly, and attack probe is intelligence that should improve defences.
**Antifragile Moves**:
- **Real-time fraud detection**: Local AI models on call detail records, signaling data, and subscriber behaviour
- **Signaling anomaly detection**: SS7/Diameter/GTP firewalls with behavioural analysis
- **SIM swap detection**: Correlate SIM changes with account access, device fingerprint, and location
- **Wangiri / IRSF detection**: Identify missed-call fraud and international revenue share fraud patterns
- **Fraud-to-structure pipeline**: Every confirmed fraud case produces control improvement
### Pillar 4: Sovereign Intelligence — Subscriber Data Never Leaves
**Principle**: Subscriber data (location, communication patterns, device identity, web browsing) is among the most sensitive data a state or criminal actor can access.
**Antifragile Moves**:
- **Local AI for network optimization**: Traffic prediction, energy saving, capacity planning on local infrastructure
- **Closed-loop fraud models**: Train on proprietary CDR and signaling data without cloud exfiltration
- **On-premise lawful intercept management**: Strict control over intercept capabilities; no third-party access
- **Data minimization for analytics**: Aggregate where possible; pseudonymize where individual analysis required
**The executive framing**:
> *"Your subscribers' location history, communication patterns, and digital behaviour are a map of your society. Sending that data to a cloud AI for 'network optimization' is not a technology partnership. It is an intelligence transfer. Local models. Local hardware. Local accountability."*
### Pillar 5: Asymmetric Payoff — Resilience at Scale
**Principle**: Telco failures affect millions instantly. Small investments in redundancy and rapid recovery yield massive reductions in societal and financial impact.
**Antifragile Moves**:
- **Distributed core architecture**: 5G core functions geographically distributed; failure of one data centre does not disable a region
- **Automated failover**: Base station controllers, DNS, and authentication functions with sub-minute failover
- **Synthetic monitoring**: Continuous health checks from subscriber perspective (call setup, data throughput, SMS delivery)
- **Chaos engineering on non-real-time systems**: Test resilience of billing, provisioning, and analytics without impacting calls
---
## Signaling Security
### SS7 and SIGTRAN
SS7 is the legacy signaling protocol connecting mobile networks globally. It was designed without security and remains vulnerable:
| Vulnerability | Risk | Control |
|--------------|------|---------|
| Location tracking | Subscriber location exposed to any SS7 peer | SS7 firewall with location query filtering; home routing for SMS |
| Call/SMS interception | Forwarding rules modified remotely | SS7 firewall with message screening; MAP operation filtering |
| Fraud (CLID spoofing) | Caller ID manipulated for fraud | SS7 firewall with consistency checks; whitelist trusted partners |
| Denial of service | Flood of signaling messages | Rate limiting; anomaly detection; SS7 firewall with DDoS mitigation |
**Action**: Deploy SS7/STP firewalls (e.g., Oracle, Procera, Mavenir) with strict filtering rules. Monitor for anomalous signaling patterns.
### Diameter and GTP
Diameter (LTE) and GTP (GPRS Tunneling Protocol) have replaced some SS7 functions but introduce their own vulnerabilities:
| Vulnerability | Risk | Control |
|--------------|------|---------|
| Diameter impersonation | Fake HSS/PCRF responses | Diameter edge agent with mutual authentication |
| GTP tunnel hijacking | Subscriber session takeover | GTP firewall; tunnel endpoint validation |
| Interconnect bypass | Roaming fraud via fake partner | Roaming hub validation; partner security assessment |
### SIP Security (VoLTE/VoNR / IMS)
The IP Multimedia Subsystem (IMS) enables voice over LTE/5G using SIP.
- **SIP firewall**: Filter malformed messages, prevent enumeration, block unauthorized registration
- **Toll fraud prevention**: Restrict international calling routes; detect anomalous call patterns
- **SPIT prevention**: Voice spam detection and filtering
---
## 5G Security Specifics
### 5G Core (5GC) Architecture
5G introduces a cloud-native, service-based architecture (SBA) with new security considerations:
| Element | Security Consideration |
|---------|----------------------|
| **AMF (Access and Mobility Management Function)** | Authentication gateway; compromise enables subscriber impersonation |
| **SMF (Session Management Function)** | Controls data sessions; compromise enables traffic redirection |
| **UPF (User Plane Function)** | Data forwarding; must be distributed and physically secured |
| **AUSF (Authentication Server Function)** | 5G-AKA authentication; keys must be HSM-protected |
| **UDM (Unified Data Management)** | Subscriber database; encryption at rest and strict access control |
| **PCF (Policy Control Function)** | QoS and charging policies; integrity critical for revenue assurance |
| **NRF (NF Repository Function)** | Service discovery; compromise enables man-in-the-middle between network functions |
**Security controls**:
- **TLS 1.3** for all service-based interfaces (SBI)
- **OAuth 2.0** for NF-to-NF authentication
- **Network slice isolation**: Strict separation between enterprise, consumer, and IoT slices
- **Edge security**: MEC (Multi-Access Edge Computing) nodes are physically distributed and harder to secure
### Network Slicing
Network slicing creates logical separation on shared physical infrastructure.
- **Slice isolation is logical, not physical**: A hypervisor compromise can bridge slices
- **Action**: Micro-segmentation between slices; independent encryption keys per slice
- **Action**: Slice-specific monitoring and anomaly detection
- **Action**: Independent security policies per slice (enterprise slice stricter than consumer)
---
## The Rapid Modernisation Plan: Telco Variant
### Phase 1: Hygiene (Days 0-30)
In addition to standard hygiene:
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Inventory all network elements: RAN, core, transport, OSS, BSS | Network Engineering | Network asset inventory with vendor and firmware versions |
| Map all signaling interconnects: SS7, Diameter, GTP, SIP | Network Security | Interconnect matrix with partner security assessment |
| Audit roaming partner access and security posture | Roaming / Security | Partner risk register |
| Inventory subscriber data flows and storage locations | Data Protection / Security | Data flow map with residency verification |
| Identify all network management interfaces with internet exposure | Network Security | Exposure list with remediation plan |
### Phase 2: Control (Days 30-60)
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Deploy signaling firewalls (SS7, Diameter, GTP, SIP) | Network Security | Firewall ruleset with anomaly detection |
| Implement network slice security policies | 5G Core Team | Slice isolation validation report |
| Harden network management: dedicated NOC access, MFA, session recording | Operations / Security | NOC access control operational |
| Encrypt management traffic across all network layers | Network Engineering | Encryption coverage report |
| Patch critical network elements with service continuity planning | Network Engineering | Patch schedule with rollback procedures |
### Phase 3: Sovereignty (Days 60-90)
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Deploy local AI for fraud detection and network anomaly detection | AI / Security | Fraud detection pilot with false positive tuning |
| Validate core network disaster recovery and failover | Operations | Failover test report with recovery times |
| Conduct signaling security tabletop exercise | Security / Network | Exercise report with structural improvements |
| Implement firmware integrity monitoring for network elements | Network Security | Baseline hashes for critical firmware |
| Test lawful intercept process security and audit | Legal / Security | LI audit report |
### Phase 4: Antifragility (Days 90-180)
| Action | Owner | Deliverable |
|--------|-------|-------------|
| Red team exercise including signaling and core network reconnaissance | Security | Red team report with kill chain |
| Chaos engineering on OSS/BSS systems | Resilience | Experiment findings |
| Vendor exit architecture for critical network platforms | Procurement / Engineering | 90-day transition plan per critical vendor |
| Cross-training: NOC staff on manual procedures | Operations | Training completion metrics |
| Participate in sector ISAC and GSMA intelligence sharing | Security | Threat intelligence integration report |
---
## Subscriber Data and Privacy
Telcos hold massive PII datasets with unique sensitivity:
| Data Type | Sensitivity | Control |
|-----------|------------|---------|
| **Location data** | Extreme: real-time and historical location | Strict access control; pseudonymization for analytics; retain only as legally required |
| **Call detail records (CDR)** | High: communication patterns | Encryption at rest; audit all access; data minimization |
| **Internet browsing (DNS, DPI)** | High: digital behavior | Aggregate where possible; DPI for security only with legal review |
| **Device identity (IMEI, IMSI)** | Moderate: device tracking | Secure storage; restrict access to fraud and network operations |
| **Lawful intercept data** | Extreme: legal and ethical | Strict chain of custody; independent audit; minimal retention |
**GDPR implications**:
- Subscriber data processing must have clear legal basis
- Data retention periods must be justified and enforced
- Subject access requests must be fulfillable across all systems
- Data breach notification: 72 hours to regulator
---
## M365 in Telecommunications
Corporate telco functions use M365 but must be separated from network operations.
| Consideration | Telco Requirement |
|--------------|------------------|
| **Data residency** | Subscriber data must remain in national/EU boundaries; verify M365 tenant location |
| **Conditional access** | Block admin access from non-corporate devices; geo-restrict privileged accounts |
| **Guest access** | Strictly vet all guests; prohibit in tenant with network engineering data |
| **Teams / SharePoint** | Never used for network topology, subscriber data, or security incident details |
| **Mobile device management** | Sales and field engineer devices Intune-managed; restricted app installation |
| **Email security** | EOP baseline; Defender for Office 365 P2 strongly recommended due to phishing targeting |
See [M365 E3 Hardening](../playbooks/m365-e3-hardening.md) for tactical hardening, and apply these overlays.
---
## Evidence Package for Regulators
| Requirement | Evidence from Antifragile Program |
|------------|----------------------------------|
| NIS2 risk management | Kill chain analysis, T0 asset classification, signaling security assessment |
| NIS2 incident handling | IR runbooks, signaling-specific response procedures, quarterly drill reports |
| NIS2 business continuity | Core network failover test reports, disaster recovery validation |
| NIS2 supply chain security | Vendor risk register (especially high-risk vendors), firmware provenance |
| NIS2 encryption | Encryption coverage for signaling, management, and subscriber data |
| NIS2 vulnerability handling | Vulnerability scan reports with network-impact prioritization |
| Telecom licensing | Lawful intercept audit, subscriber data protection evidence, network resilience metrics |
---
*Previous: [Vertical: Power and Utilities](vertical-power-utilities.md)*
*Next: [Vertical: Banking](vertical-banking.md)*