feat: Rewrite rapid-modernisation-plan and business-case for realism

rapid-modernisation-plan.md: - Add honest framing section: what 180 days delivers vs. what takes 2-3 years - Extend Phase 1 from 30 to 60 days; rename to Visibility - Remove dangerous 'disable all unknown accounts in week 1-2' instruction - Replace Phase 3 (AI Sovereignty) with Signal and Retained Capability - Phase 3 now: detection engineering, alert runbooks, knowledge transfer - Phase 4 made explicitly open-ended (not complete at day 180) - Fix success metrics: remove unverifiable targets, replace with honest ones - Remove 'compress Phases 1-2 into 30 days for small orgs' adaptation - Add 'What This Plan Is Not' practitioner section - ASTRAL and PULSAR integrated as Phase 1 deliverables - AI Sovereignty moved to multi-year parallel initiative business-case-template.md: - Break-even corrected: Day 90 -> 12-18 months post-programme - Phase budget table updated: 30/30/30/90 -> 60/60/60/ongoing - Phase names and deliverables aligned with revised RMP - AI sovereignty removed from core deliverables - Sensitivity analysis: 3 scenarios -> 4 including abort condition - Alternatives table: AI sovereignty removed from Antifragile programme description - ROI table: cloud AI cost line replaced with audit preparation time saving - The Ask: 30-day first gate -> 60-day first gate Co-Authored-By: Tom Kracmar <tom+claude@cat6.cz>
2026-06-05 09:47:25 +00:00
parent 3062e435ca
commit 878fca3f0b
2 changed files with 236 additions and 236 deletions
@@ -14,9 +14,9 @@ This template provides a reusable structure for building financial justification

 | Element | Content |
 |---------|---------|
-| **Investment ask** | €[X] over 180 days, phase-gated with go/no-go decisions at days 30, 60, 90 |
-| **Primary return** | Reduction of existential cyber risk; regulatory compliance evidence; competitive differentiation through AI sovereignty |
-| **Break-even** | Day 90 (via avoided regulatory fine exposure, reduced insurance premiums, or operational resilience) |
+| **Investment ask** | €[X] over 180 days, phase-gated with go/no-go decisions at days 60, 120, 180 |
+| **Primary return** | Reduction of existential cyber risk; regulatory compliance evidence; operational resilience demonstrable to auditors and insurers |
+| **Break-even** | 12–18 months post-programme: insurance premium reductions take one renewal cycle; regulatory evidence value accumulates from day 1; incident avoidance value is probabilistic but compounding |
 | **Risk of inaction** | Quantified below; summary: [X]% probability of material incident within 24 months at estimated cost of €[Y] |

 ### Page 2: Cost of Inaction
@@ -58,11 +58,11 @@ Present this as: *"Without intervention, the organization faces an expected loss

 | Phase | Timeline | Primary Activity | Estimated Cost | Go/No-Go Gate |
 |-------|----------|-----------------|----------------|---------------|
-| **1. Hygiene** | Days 0-30 | Configuration of existing tools; identity cleanse; visibility | €[X] (primarily labor) | Day 30: Demonstrate risk reduction or stop |
-| **2. Control** | Days 30-60 | ASR, MFA enforcement, network segmentation, vendor lockdown | €[X] (labor + minimal tooling) | Day 60: Validate control effectiveness |
-| **3. Sovereignty** | Days 60-90 | Local AI pilot; recovery drills; T0 asset protection | €[X] (labor + local inference hardware if needed) | Day 90: Prove local AI viability |
-| **4. Antifragility** | Days 90-180 | Chaos engineering; red team; continuous improvement | €[X] (labor + external testing) | Day 180: Maturity assessment and next-phase planning |
-| **Total** | 180 days | | **€[X]** | |
+| **1. Visibility** | Days 0–60 | Kill chain mapping; T0 identity hardening; ASTRAL/PULSAR deployment; T0 backup verified | €[X] (primarily labor) | Day 60: Kill chain documented and T0 hardening complete |
+| **2. Control** | Days 60–120 | MFA for all users; CA baseline; attack surface reduction; vendor hardening | €[X] (labor + minimal tooling) | Day 120: MFA enforced 100%; P0/P1 vulnerabilities closed |
+| **3. Signal** | Days 120–180 | Detection rules; alert runbooks; knowledge transfer; housekeeping stream operational | €[X] (labor) | Day 180: Client operates independently; housekeeping running |
+| **4. Retained capability** | Ongoing | Quarterly retained scope; detection engineering; housekeeping; structural improvements | €[X]/quarter | Ongoing: measurable queue reduction; annual BloodHound/Elysium |
+| **Total (180-day programme)** | 180 days | | **€[X]** | |

 #### Cost Categories

@@ -78,11 +78,11 @@ Present this as: *"Without intervention, the organization faces an expected loss

 | Alternative Approach | Cost | Timeline | Risk |
 |---------------------|------|----------|------|
-| **Do nothing** | €0 | — | Expected loss €[X] over 24 months |
-| **Traditional security audit** | €[X] | 90 days | Produces report; no structural change |
-| **Full E5 licensing upgrade** | €[X]/user/year | 30 days | Solves some gaps; does not address architecture or AI sovereignty |
-| **Managed security service (MSSP)** | €[X]/month | Ongoing | Outsources detection; does not reduce structural fragility |
-| **Antifragile program (this proposal)** | €[X] | 180 days | Structural change, regulatory evidence, AI sovereignty, measurable resilience |
+| **Do nothing** | €0 | — | Expected loss €[X] over 24 months; growing regulatory exposure |
+| **Traditional security audit** | €[X] | 90 days | Produces report; no structural change; findings age immediately |
+| **Full E5 licensing upgrade** | €[X]/user/year | 30 days | Solves tooling gaps; does not address architecture, process, or accumulated technical debt |
+| **Managed security service (MSSP)** | €[X]/month | Ongoing | Outsources detection; does not reduce structural fragility; dependency without capability transfer |
+| **Antifragile programme (this proposal)** | €[X] | 180 days + retained | Structural change, regulatory evidence, measurable kill chain closure, client operational independence |

 ---

@@ -97,7 +97,7 @@ Present this as: *"Without intervention, the organization faces an expected loss
 | Avoided ransomware recovery | Probability reduction × €4.5M | €[X] | €[Y] |
 | Avoided regulatory fine | Probability reduction × % GT | €[X] | €[Y] |
 | Insurance premium reduction | 10-20% reduction on cyber premium | €[X] | €[Y] |
-| Cloud AI cost stabilization | Shift from variable API costs to fixed infra | €[X] | €[Y] |
+| Audit preparation time reduction | ASTRAL Git trail replaces manual evidence gathering for ISO 27001, NIS2, DORA | €[X] | €[Y] |
 | Reduced incident response cost | Faster detection and containment | €[X] | €[Y] |
 | **Total Quantifiable Return** | | **€[X]** | **€[Y]** |

@@ -105,7 +105,7 @@ Present this as: *"Without intervention, the organization faces an expected loss

 | Return Category | Description |
 |----------------|-------------|
-| **Competitive moat** | Proprietary data improves only your models; competitors cannot replicate your operational intelligence |
+| **Regulatory agility** | Demonstrable continuous controls accelerate regulatory approvals, certification audits, and partnership due diligence |
 | **Regulatory agility** | Demonstrable resilience accelerates regulatory approvals, market entries, and partnership discussions |
 | **Talent retention** | Engineers and security professionals prefer organizations that invest in durability over firefighting |
 | **M&A readiness** | Clean identity architecture, tested recovery, and documented controls increase valuation and reduce due-diligence friction |
@@ -139,17 +139,18 @@ Present as: *"This program delivers a [X]% return in year one, rising to [Y]% in

 | Scenario | Investment Adjustment | Outcome |
 |----------|----------------------|---------|
-| **Best case** | No additional tooling needed | Program completes under budget; all value from configuration |
-| **Base case** | Local AI hardware required for pilot | Slight budget increase; sovereign intelligence proven |
-| **Worst case** | Deeper technical debt than anticipated | Extend Phase 1 by 30 days; additional labor cost; still cheaper than incident |
+| **Best case** | No additional tooling needed; client IT team engaged and responsive | Programme completes on timeline; all value from configuration; client operational independence achieved at day 180 |
+| **Base case** | Minor tooling additions; moderate IT team availability; some change management friction | Programme completes with 2–4 week slippage on Phase 2 (MFA rollout change management is the usual bottleneck); strong kill chain closure and detection capability |
+| **Challenging** | Significant technical debt discovered in Phase 1; IT team constrained; change windows infrequent | Phase 1 extended by 4–6 weeks; Phase 2 scope narrowed to kill chain critical path; programme value is still genuine — the findings alone are worth the investment; honest client conversation required at day 60 gate |
+| **Abort condition** | Executive sponsor departure; IT team fully occupied by another major project; scope fundamentally different from discovery call | Programme paused or stopped at the next gate. Partial phases produce partial value — ASTRAL/PULSAR deployed, kill chain documented. Better to stop honestly than to produce a report that nobody acts on. |

 ---

 ### Page 6: Recommendation and Next Steps

-**The Ask (Full Program)**:
+**The Ask (Full Programme)**:

-> *"We recommend approval of a 180-day antifragile enterprise program, structured in four 30-60-90-180 day phases with hard go/no-go gates. The initial 30-day investment is €[X] with a defined deliverable: identification and initial closure of the organizational kill chain. If measurable risk reduction is not demonstrated by Day 30, the program stops with no further obligation."*
+> *"We recommend approval of a 180-day antifragile enterprise programme, structured in three 60-day phases with hard go/no-go gates. The initial 60-day investment is €[X] with a defined deliverable: the kill chain documented, T0 accounts hardened, and ASTRAL/PULSAR deployed. If the kill chain is not closed by day 60, the programme stops with no further obligation. The 180-day programme produces a hardened foundation and a client team that can operate it independently — not a complete transformation. What comes after that is a retained capability engagement, scoped separately."*

 **The Ask (Modular Alternative)**:

@@ -4,20 +4,24 @@

 ## For the Executive Reader

-This is not a three-year digital transformation. It is a **180-day strategic reset** with measurable business outcomes at each phase gate.
+This is not a three-year digital transformation. It is a **180-day foundation programme** with measurable progress at each phase gate.

 | Phase | Timeline | What the Board Sees |
 |-------|----------|---------------------|
-| **Hygiene** | Days 0-30 | Visibility. For the first time, we know every identity, asset, and gap that could end the company. |
-| **Control** | Days 30-60 | Containment. The highest-risk exposures are closed using tools already owned. |
-| **Sovereignty** | Days 60-90 | Ownership. Proprietary intelligence is reclaimed. Recovery from disaster is proven, not assumed. |
-| **Antifragility** | Days 90-180 | Advantage. The organization learns faster from disruption than competitors do. |
+| **Visibility** | Days 0–60 | We know the kill chain. T0 assets are identified, critical privileges are mapped, and logging is operational. |
+| **Control** | Days 60–120 | The highest-risk kill chain nodes are closed. MFA is enforced on privileged accounts. Critical gaps have evidence-backed remediation. |
+| **Signal** | Days 120–180 | Detection capability is built on the hardened foundation. Housekeeping is running as a permanent stream. The organisation can operate and maintain what was built. |
+| **Antifragility** | Ongoing | Structural improvement, retained capability, and progressive reduction of technical debt. This phase does not end. |
+
+**What 180 days delivers**: A hardened foundation, closed kill chain, operational detection capability, and the processes to sustain them. Not a complete transformation — a credible, maintained starting point.
+
+**What 180 days does not deliver**: Elimination of all technical debt (that takes years), full AI sovereignty (that is a multi-year journey), or zero vendor dependencies (that is an ongoing programme). Promising otherwise is dishonest and destroys client trust when reality arrives.

 **Investment principle**: Configuration first. Procurement only if justified. Most value is extracted from existing tools before any new purchase is discussed.

-**Governance**: Weekly steering committee. Monthly board update. Quarterly antifragility assessment. Hard go/no-go gates at days 30, 60, and 90.
+**Governance**: Weekly check-in with named client lead. Monthly steering committee. Hard go/no-go gates at days 60, 120, and 180.

-**Modularity**: While this document presents the full 180-day program, every phase can be delivered as an independent, fixed-scope module. See [Modular Engagements](../core/modular-engagements.md) for the menu of standalone engagements.
+**Modularity**: Every phase can be delivered as an independent, fixed-scope module. See [Modular Engagements](../core/modular-engagements.md) for the standalone engagement menu.

 *For the business case and financial justification, see [Business Case Template](business-case-template.md).*
 *For board conversation guidance, see [C-Suite Conversation Guide](../core/c-suite-conversation-guide.md).*
@@ -26,295 +30,290 @@ This is not a three-year digital transformation. It is a **180-day strategic res

 ## For the Practitioner

-This playbook provides a **time-boxed, phase-gated roadmap** for transforming a fragile enterprise into an antifragile one. It is designed for immediate deployment in consulting engagements and can be adapted to organizational size, industry, and regulatory context.
+### What This Plan Is Not

-The plan is structured in **four phases**: Hygiene (30 days), Control (60 days), Sovereignty (90 days), and Antifragility (180 days). Each phase builds on the previous. Skipping phases creates the illusion of progress while leaving structural fragility intact.
+Before using this roadmap with a client, be honest about what it commits to.

-> **Core tenet**: Before any new purchase is discussed, exhaust the capabilities of existing tooling. See the [Zero-Budget Hardening Playbook](zero-budget-hardening.md) for the tactical expression of this principle.
+**Not a sprint.** The most common failure mode is treating security modernisation as a project that ends. It does not end. The 180-day programme establishes processes and capabilities that must run permanently. If the client does not have the internal resources to continue what we build, we need to have that conversation before we start.
+
+**Not a full audit.** Phase 1 does not produce a complete identity inventory, a comprehensive vulnerability assessment, or an exhaustive compliance gap analysis. It produces a kill chain map and enough visibility to close existential risks. The full audit takes months and tends to produce reports that paralyse rather than mobilise.
+
+**Not compatible with staff paralysis.** Organisations dealing with active incidents, leadership changes, or major concurrent projects cannot execute this plan on the stated timeline. The timeline is predicated on a named client lead with 30–40% availability and access provisioned before day 1.
+
+**Not vendor-agnostic in execution.** The plan references Microsoft 365 environments as the primary context because that is most clients' reality. Non-Microsoft environments follow the same logic but require different specific tools. See the Platform Adaptation appendix in [Modular Engagements](../core/modular-engagements.md).

 ---

-## Phase 1: Hygiene (Days 0–30)
+## Phase 1: Visibility (Days 0–60)

-**Theme**: *You cannot defend what you cannot see.*
+**Theme**: *You cannot defend what you cannot see. You cannot fix what you cannot prioritise.*

-The first 30 days are aggressive, disruptive, and non-negotiable. The goal is not perfection; it is **visibility**. Every unknown identity, unmapped dependency, and unmonitored access path is a latent failure waiting to happen.
+The first 60 days are about **kill chain mapping and critical visibility** — not about fixing everything. The goal is a clear, ranked picture of what would end the organisation, and initial closure of the most accessible existential gaps.

-### Week 1-2: Identity and Access Blitz
+> **Why 60 days, not 30**: A 30-day identity blitz sounds fast. It is also the fastest path to disabling a service account that runs payroll at 2 AM on Friday. Week 1 is documentation and baseline. Fixes require understanding the environment first. See the engagement model's week 1 discipline — it applies to every phase of this plan.

-**Tool strategy**: Use existing AD / Entra ID / IAM. No new purchases.
+### Weeks 1–2: Baseline and Kill Chain Mapping

-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Aggressive identity audit | IAM / Security | Complete inventory of all human and non-human identities | ADUC, Entra ID portal, AWS IAM console |
-| Disable all unknown / unused accounts | IAM | List of disabled accounts with business justification for exceptions | Existing IAM + PowerShell / CLI scripts |
-| Rotate all critical passwords and shared secrets | Security Ops | Rotation log with verification | Existing IAM + LAPS (free from Microsoft) |
-| Target: admin accounts, service accounts, krbtgt equivalents | AD / Cloud IAM | Documentation of every privileged account | Existing directory services |
-| Implement password hygiene (minimum: audit) | IAM | Baseline report on password policy compliance | Native password policies + audit logs |
+**No changes in week 1.** Document and understand.

-### Week 2-3: Perimeter and Communication Mapping
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Export current identity state: all accounts, groups, privilege assignments | IAM / Security | Identity inventory — stale, active, privileged, service |
+| Run BloodHound collection; run Elysium password audit | Security | AD attack path map; compromised credential list |
+| Run CAExporter for Conditional Access documentation | Security | Human-readable CA policy register with gaps highlighted |
+| Deploy ASTRAL for M365 configuration baseline | Security | Committed tenant baseline; first drift detection operational |
+| Map all public-facing assets | Security | External attack surface register with P0 classification |
+| Identify the kill chain: shortest path from "nothing bad" to "organisation fails" | Security Architect | Kill chain document — maximum 2 pages; reviewed with executive sponsor |

-**Tool strategy**: Use native firewall management, open-source scanners, and manual audit before purchasing new NDR/VM platforms.
+### Weeks 3–4: T0 Identity Hardening

-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Audit all vendor / supplier access paths | Security / Procurement | Inventory of VPN, RDP, Citrix, SSH, FTP, SCP, API keys | Existing IAM, VPN logs, firewall logs |
-| Review and document firewall rules | Network Team | Rule set with business justification for each | Native firewall management interfaces |
-| Map public-facing assets from external perspective | Security | Attack surface report with P0 classification | Free/open-source: Shodan, certificate transparency logs, nmap |
-| Implement aggressive vulnerability scanning | Security | Weekly scan results with trending | Existing scanner, Microsoft Defender Vulnerability Management, or OpenVAS |
+Target: privileged accounts only. Not all accounts.

-### Week 3-4: Visibility and Monitoring Baseline
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Force-reset accounts identified as compromised by Elysium (P0) | IAM | Password reset log with verification |
+| Enforce MFA on all T0 accounts: Global Admins, Domain Admins, backup admins, service principals with high privilege | IAM | MFA coverage report for T0 accounts |
+| Review and disable accounts that are clearly orphaned: departed employees confirmed by HR | IAM | Disable log — only accounts with confirmed ownership resolution |
+| Rotate KRBTGT and critical service account passwords | AD | Rotation log; tested without service disruption |
+| Review and remove direct Global Admin assignments; move toward PIM or named individual accounts | IAM | Privilege assignment review |

-**Tool strategy**: Maximize existing EDR/SIEM before considering new platforms. A spreadsheet CMDB is infinitely better than no CMDB.
+> **What we do not do in weeks 3–4**: We do not attempt to disable all unknown accounts. We do not attempt to resolve all service account ownership. We do not attempt to achieve 100% MFA on all users. These are Phase 2 activities, started after the kill chain is closed and the environment is understood.

-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Deploy endpoint detection on all managed devices | SOC / MDE | Coverage report: % of estate monitored | Existing EDR (Defender, CrowdStrike, SentinelOne) |
-| Establish log aggregation for critical systems | Security | Centralized logging for T0 and T1 assets | Existing SIEM, syslog server, or cloud native logging (Sentinel, CloudWatch, Cloud Logging) |
-| Create initial CMDB seed for critical systems | IT / Security | CMDB populated with crown jewels | Existing ITAM, ServiceNow, or spreadsheet |
-| Document "kill chain": shortest path to organizational failure | Security Architect | Threat model and mitigation map | Manual analysis + stakeholder interviews |
+### Weeks 5–6: Logging, Perimeter, and Critical Asset Inventory
+
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Deploy PULSAR for M365 audit log ingestion | Security | Audit events ingested; watermarks established; search operational |
+| Enable logging for T0 systems where it is missing | Security | Logging coverage report for T0/T1 assets |
+| Audit all vendor and third-party remote access paths | Security / Procurement | Vendor access inventory with remove/restrict list |
+| Scan public-facing assets for critical CVEs | Security | Prioritised findings: P0 (internet-facing, critical CVE), P1, P2 |
+| Seed CMDB with T0 assets | IT / Security | T0 asset register with ownership, backup status, recovery procedure |
+| Validate backup integrity for T0 assets | Backup Admin | Backup test report — at least one successful restore per T0 system |
+
+### Weeks 7–8: Kill Chain Closure and Phase 1 Wrap
+
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Close P0 vulnerabilities identified in week 5–6 scan | Security | Remediation log with verification |
+| Restrict or close the highest-risk vendor access paths | Security / Procurement | Vendor access changes confirmed |
+| Implement basic network segmentation between IT and OT (if applicable) | Network / OT | Segmentation policy; validated firewall rules |
+| Phase 1 review: re-run BloodHound and Elysium against week 1 baseline | Security | Before/after comparison; revised kill chain assessment |
+| Establish housekeeping queue: stale accounts, orphaned permissions, legacy protocols | IAM / Security | Queue populated; named owner; monthly cadence confirmed |

 ### Phase 1 Exit Criteria

- [ ] 100% of identities known and validated
- [ ] 100% of privileged access reviewed
- [ ] All public-facing assets identified and scanned
- [ ] Centralized logging operational for critical systems
- [ ] CMDB seeded with T0/T1 assets
- [ ] Initial "kill chain" documented
+- [ ] Kill chain documented and reviewed with executive sponsor
+- [ ] T0 accounts: MFA enforced, privilege reviewed, compromised credentials reset
+- [ ] P0 vulnerabilities (internet-facing, critical CVE) closed
+- [ ] ASTRAL deployed; M365 baseline committed
+- [ ] PULSAR deployed; M365 audit logs ingesting
+- [ ] T0 asset CMDB complete with backup integrity verified
+- [ ] Vendor access inventory complete; highest-risk paths closed
+- [ ] Housekeeping stream established: named owner, cadence, populated queue

-### Phase 1 Mantra
-
-> *"Do not be afraid to break things temporarily. Disable first, justify second. Visibility before permission."*
+**What "complete" does not mean at day 60**: All identities validated. All shared accounts eliminated. MFA on 100% of users. Zero legacy protocols. These are legitimate targets — they belong in the housekeeping queue and Phase 2 work, tracked, resourced, and given realistic timescales.

 ---

-## Phase 2: Control (Days 30–60)
+## Phase 2: Control (Days 60–120)

-**Theme**: *What we have seen, we must now contain.*
+**Theme**: *Close the kill chain. Build on what is understood, not what is assumed.*

-With visibility established, the next 30 days focus on **closing the highest-risk gaps** without introducing operational paralysis. This is the phase of quick wins and surface reduction.
+Phase 2 takes the kill chain map from Phase 1 and systematically closes the structural gaps. The work is less about discovery and more about verified remediation with proper change management.

-### Week 5-6: Attack Surface Reduction (ASR)
+### Weeks 9–10: MFA and Identity Hardening (Broad Rollout)

-**Tool strategy**: ASR rules and PAWs are native Microsoft capabilities. For non-Microsoft environments, use existing endpoint management.
+Phase 1 hardened T0. Phase 2 extends to all users — with proper change management.

-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Eliminate shared accounts where possible | IAM | Reduction metric: % of shared accounts decommissioned | Existing IAM + access review process |
-| Implement Attack Surface Reduction rules on endpoints | Endpoint Security | ASR policy deployed and compliance measured | Microsoft Defender ASR (already owned in E3/E5) |
-| Harden admin access: dedicated PAWs, no browsing, no email | Security | PAW architecture documented and deployed | Existing Windows / Intune / GPO |
-| Review and minimize permissions across all platforms | IAM / App Owners | Permission matrix with least-privilege gaps identified | Native IAM interfaces + scripts |
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Enforce MFA on all remote access: not just T0, but all users | IAM | MFA coverage report (% of users) — target 100% enforced, not just enrolled |
+| Block legacy authentication protocols tenant-wide | IAM | Legacy auth block confirmed via CAExporter and sign-in log review |
+| Deploy Conditional Access baseline: device compliance, location, sign-in risk | IAM | CA policy set deployed and tested; rollback documented |
+| Continue housekeeping queue: first monthly cycle | IAM | Accounts resolved this cycle; queue status report |

-### Week 6-7: Network and DNS Security
+> **Change management is the constraint here, not technical complexity.** MFA rollout for 500 users requires helpdesk preparation, communication, exception handling, and at minimum two weeks of lead time. Scope this honestly. A rollout that generates 200 support tickets and forces an exception for the CEO because his phone broke is a rollout that gets walked back.

-**Tool strategy**: Use existing DNS infrastructure, firewall segmentation, and open-source sensors (Zeek/Suricata) before buying NDR.
+### Weeks 11–12: Attack Surface Reduction

-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Deploy DNS security (filtering, logging, anomaly detection) | Network | DNS security coverage report | Existing DNS infrastructure, Quad9/Cloudflare free tiers, Microsoft DNS security |
-| Segment IT/OT networks where they intersect | Network / OT | Network segmentation diagram and policy | Existing firewalls and VLANs |
-| Deploy network sensors at critical boundaries | SOC | Sensor coverage map with alerting validated | Zeek or Suricata (open-source) or existing IDS/IPS |
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Deploy Intune compliance policies; enforce device compliance in CA | Endpoint / IAM | Compliance policy set; non-compliant device access blocked |
+| Harden admin access: dedicated admin accounts, PAW where feasible | Security | Admin account architecture; PAW deployed for T0 admins |
+| Implement ASR rules on all managed endpoints | Endpoint Security | ASR policy deployed; compliance measured |
+| Review and remove excessive application permissions (OAuth grants, service principals) | IAM | App permission audit; high-risk grants reviewed and reduced |

-### Week 7-8: Multi-Factor Authentication and Conditional Access
+### Weeks 13–14: Network Hardening and Vendor Governance

-**Tool strategy**: MFA and conditional access are native capabilities of Entra ID, Okta, and cloud IAM. No additional purchase required.
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Implement DNS security: filtering and logging | Network | DNS security coverage report |
+| Harden vendor remote access: time-bounded, MFA, session recording | Security / Procurement | Vendor access gateway operational; access policy enforced |
+| Patch P1 vulnerabilities from Phase 1 scan | Security | Remediation log; rescan confirming closure |
+| Establish change window discipline: all production changes through approved process | IT / Security | Change management process documented and operational |

-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Enforce MFA on all remote access paths | IAM | MFA coverage: 100% of remote access | Entra ID, Okta, Duo, or native cloud IAM MFA |
-| Implement conditional access policies | IAM / Cloud | Policy set: device compliance, location, risk score | Entra ID Conditional Access, AWS IAM, GCP IAM |
-| Review and harden M365 / Google Workspace security | Cloud Team | Cloud security posture report | Microsoft Secure Score, Google Security Health Analytics |
+### Weeks 15–16: Verification and Phase 2 Wrap
+
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Re-run BloodHound, Elysium, and CAExporter against Phase 1 baseline | Security | Attack path reduction report; before/after metrics |
+| Run Purple Knight / E8-CAT against AD and M365 | Security | Security score comparison; residual findings list |
+| Review ASTRAL drift log for Phase 1–2 period | Security | Configuration change audit; unauthorised drift incidents |
+| Review PULSAR audit log: anomalous events flagged, investigated, resolved | Security | Audit review report |
+| Update risk register: what Phase 1–2 closed, what remains open, what Phase 3 addresses | Security | Updated risk register signed off by client lead |
+| Housekeeping queue: second monthly cycle | IAM | Queue status; cumulative accounts resolved |

 ### Phase 2 Exit Criteria

- [ ] Shared accounts reduced by minimum 50%
+- [ ] MFA enforced for 100% of users (not just enrolled — enforced via CA policy)
+- [ ] Legacy authentication blocked tenant-wide
+- [ ] CA baseline deployed and tested
 - [ ] ASR rules active on all managed endpoints
- [ ] MFA enforced on 100% of remote and privileged access
- [ ] DNS security operational
- [ ] Network segmentation policy defined and initial segments implemented
- [ ] Conditional access policies active for cloud workloads
-
-### Phase 2 Mantra
-
-> *"The goal is not to block everything. It is to ensure that every allowed path is known, justified, and monitored."*
+- [ ] P1 vulnerabilities from Phase 1 scan closed
+- [ ] Vendor remote access hardened and inventoried
+- [ ] Attack path reduction measurable against Phase 1 BloodHound baseline
+- [ ] Housekeeping queue running; two cycles completed

 ---

-## Phase 3: Sovereignty (Days 60–90)
+## Phase 3: Signal and Retained Capability (Days 120–180)

-**Theme**: *Reclaim what should never have been rented.*
+**Theme**: *Build detection on the hardened foundation. Build the capability to sustain what was built.*

-This is where the antifragile approach diverges sharply from conventional hardening. The focus shifts from defending the perimeter to **owning the intelligence** that drives the organization.
+Phase 3 starts only after Phase 2 exit criteria are met. Detection engineering on an unhardened environment is waste — the signal-to-noise ratio is too low to produce actionable intelligence.

-### Week 9-10: AI Sovereignty Assessment
+> **Why not AI Sovereignty in Phase 3**: AI sovereignty — local models, owned inference infrastructure, sovereign cognitive capability — is a multi-year programme, not a 30-day sprint. Hardware procurement alone typically takes 6–12 weeks. Claiming it as a Phase 3 deliverable sets up the engagement to fail. AI sovereignty begins with the audit work in Phase 1 (AI usage inventory, classification, assessment of vendor terms) and continues as a separate parallel initiative. The Azure OpenAI Sovereignty Bridge is the appropriate near-term stepping stone. See [AI Sovereignty Framework](../core/ai-sovereignty-framework.md) and [Azure OpenAI Sovereignty Bridge](../core/azure-openai-sovereignty-bridge.md).

-**Tool strategy**: Discovery requires interviews and proxy log analysis. No purchase needed for assessment.
+### Weeks 17–18: Detection Engineering Foundation

-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Inventory all AI usage: approved and shadow | Security / AI Lead | AI usage map with data classification | Proxy logs, SaaS billing review, employee interviews |
-| Classify AI workloads by sovereignty requirement | Security Architect | T0/T1/T2 AI asset classification | Existing data classification framework |
-| Identify highest-value local AI pilot candidate | AI Lead / Business | Pilot scope document with success criteria | Business stakeholder interviews |
-| Assess vendor AI terms: data usage, training, termination | Legal / Security | Risk register for each AI provider | Legal review of existing contracts |
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Write initial PULSAR alert rules: CA policy changes, new Global Admin assignments, bulk mailbox export, app permission grants outside change window | Security | Alert rule set deployed; test-triggered and validated |
+| Review SIEM coverage: which T0 events generate alerts, which do not | Security | Detection coverage map against MITRE ATT&CK top 10 for M365 |
+| Tune ASTRAL rolling PRs: configure reviewer notification, test reject/restore flow | Security | ASTRAL review workflow operational; first restore test completed |
+| Establish alert response runbooks: who gets notified, what they do, what they escalate | Security / Client Lead | Runbooks for top 5 alert types |

-### Week 10-11: Local AI Infrastructure Deployment
+### Weeks 19–20: Endpoint and Identity Detection

-**Tool strategy**: Start with existing hardware or low-cost sovereign cloud. Use open-source inference servers (Ollama, vLLM, llama.cpp).
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Deploy Wazuh or verify existing EDR coverage for on-premise systems | Security | Endpoint detection coverage report |
+| Write custom detection rules for kill chain-specific TTPs identified in Phase 1 | Security | Custom rule set tuned to client environment |
+| Establish weekly threat review cadence: PULSAR event summary + ASTRAL drift review | Security / Client Lead | First weekly review completed; format agreed |
+| AI usage audit: classify current AI workflows by data sensitivity and vendor agreement | Security / Legal | AI usage register; high-risk workflows flagged for remediation |

-| Action | Owner | Deliverable | Existing / Low-Cost Tool Leverage |
-|--------|-------|-------------|----------------------------------|
-| Deploy local inference infrastructure (on-prem or sovereign cloud) | Infrastructure | Operational inference cluster | Underutilized servers, retired workstations, or sovereign cloud VM |
-| Establish model versioning and artifact management | MLOps / Security | Model registry with provenance tracking | Git + DVC or simple artifact storage |
-| Implement access controls for model weights and training data | Security | T0-class protection for AI assets | Existing file servers, encryption, IAM |
-| Deploy initial pilot: RAG or fine-tuned model on proprietary data | AI Team | Working pilot with performance baseline | Ollama, llama.cpp, or vLLM (open-source) + quantized open models |
+### Weeks 21–24: Knowledge Transfer and Handover

-### Week 11-12: Backup, Recovery, and Validation
+The most important deliverable of Phase 3 is **the client's ability to operate everything without us.**

-**Tool strategy**: Use existing backup and DR infrastructure. The goal is to test and document, not to buy.
-
-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Perform full recovery drill of one critical system from backup | IT / Security | Recovery time documented, gaps identified | Existing backup solution |
-| Validate backup integrity for all T0 assets | Backup Admin | Integrity report with sample restorations | Existing backup solution + integrity scripts |
-| Test local AI pilot under degraded network conditions | AI / Infrastructure | Resilience validation report | Existing network infrastructure + manual testing |
-| Document and exercise incident response for AI-specific threats | SOC / Security | Runbook: model poisoning, data exfiltration, adversarial input | Existing IR framework + internal knowledge |
+| Action | Owner | Deliverable |
+|--------|-------|-------------|
+| Runbook completion: every system built or modified has an operating runbook | Security / Client Team | Runbook set reviewed and signed off by client IT lead |
+| Client training: ASTRAL drift review workflow, PULSAR event search, alert response | Security | Training delivered; client IT lead can demonstrate competency |
+| Housekeeping queue: third and fourth monthly cycles | IAM | Queue status; cumulative resolution metrics |
+| Document what was built: configuration baseline document for every module | Security | Module completion package delivered |
+| Phase 3 review: risk register update, metrics summary, Phase 4 / retained capability recommendation | Security | Final 180-day programme review with executive sponsor |

 ### Phase 3 Exit Criteria

- [ ] All AI usage inventoried and classified
- [ ] Local inference infrastructure operational
- [ ] One high-value AI pilot deployed and measured
- [ ] T0 protection applied to model weights and training data
- [ ] Critical system recovery drill completed successfully
- [ ] AI-specific incident response runbook created
-
-### Phase 3 Mantra
-
-> *"We are moving from being consumers of intelligence to manufacturers of our own. The vault is built; now we fill it."*
+- [ ] PULSAR alert rules operational for top 5 M365 risk scenarios
+- [ ] ASTRAL drift review workflow operational; first restore tested
+- [ ] Custom detection rules written for client-specific TTPs
+- [ ] Weekly threat review cadence established and running
+- [ ] All runbooks completed and signed off by client IT lead
+- [ ] Client IT lead can operate ASTRAL and PULSAR without consultant support
+- [ ] AI usage registered and high-risk workflows flagged
+- [ ] Housekeeping queue: four consecutive cycles completed

 ---

-## Phase 4: Antifragility (Days 90–180)
+## Phase 4: Antifragility (Ongoing)

-**Theme**: *Build systems that grow stronger from disruption.*
+**Theme**: *The programme does not end. The organisation learns faster from disruption than competitors do.*

-The final phase converts the hardened foundation into an adaptive, learning organization. This is where antifragility becomes operational reality.
+Phase 4 is not a 30-day sprint. It is an ongoing operational posture. The 180-day programme establishes the foundation; Phase 4 is what happens when that foundation is maintained and extended over months and years.

-### Month 4: Structural Decoupling and Optionality
+**Phase 4 activities** (initiated at 180 days; sustained indefinitely):

-**Tool strategy**: Documentation, architecture, and open-source chaos tools (Chaos Mesh, Gremlin free tier, custom scripts). Work, not purchases.
+- **Retained capability**: Monthly ASTRAL drift review, PULSAR event summaries, quarterly Elysium/BloodHound scans, housekeeping queue advancement
+- **Detection engineering**: Progressive extension of alert rule coverage; tuning based on real events; quarterly rule review
+- **Structural improvement**: Exit architectures for vendor dependencies, progressive elimination of legacy systems, planned OT technology refresh
+- **Chaos engineering**: Controlled failure exercises — starting with non-production, progressing to production once detection and recovery capability is confirmed
+- **Red team exercises**: Annual structured adversarial testing — not before Phase 2 is complete and detection is operational
+- **AI sovereignty programme**: Local inference infrastructure, where justified by workload and capability; AURORA deployment for M365 governance intelligence; sovereign AI as a parallel multi-year initiative
+- **Greenfield capability building**: Configuration as code for all managed systems; tested migration procedures; documented rebuild path

-| Action | Owner | Deliverable | Existing / Free Tool Leverage |
-|--------|-------|-------------|------------------------------|
-| Document exit architecture for all major platform dependencies | Enterprise Architecture | 90-day exit plan per critical vendor | Architecture documentation, existing runbooks |
-| Implement abstraction layers for proprietary integrations | Engineering | Interface documentation and migration test | Existing development tools and frameworks |
-| Establish dual-vendor readiness for one critical category | Procurement / Engineering | Technical proof of capability | Existing engineering capacity, open standards |
-| Deploy chaos engineering: simulate critical dependency failure | Resilience Team | Chaos experiment report with findings | Chaos Mesh (open-source), custom scripts, Gremlin free tier |
-
-### Month 5: Stress-to-Signal Conversion
-
-**Tool strategy**: Process and culture changes require no licensing. Use existing EDR/SIEM for detection validation.
-
-| Action | Owner | Deliverable | Existing Tool Leverage |
-|--------|-------|-------------|------------------------|
-| Implement blameless post-mortem process with structural mandates | Culture / Security | Post-mortem template and governance | Existing collaboration tools (Confluence, SharePoint, Notion) |
-| Deploy production chaos engineering with automated rollback | Resilience Team | Monthly chaos experiment schedule | Existing orchestration + open-source chaos tools |
-| Create feedback loop: incident findings → architecture changes | Security Architect | Closed-loop metrics: mean time to structural fix | Existing ticketing system (Jira, ServiceNow) |
-| Launch "red team as a service": continuous adversarial testing | Security | Monthly red team report | Internal team + existing EDR/SIEM for detection validation |
-
-### Month 6: Defensive AI and Continuous Modernisation
-
-**Tool strategy**: Defensive AI runs on the local inference infrastructure already deployed. Posture measurement uses existing APIs and open-source dashboards.
-
-| Action | Owner | Deliverable | Existing / Low-Cost Tool Leverage |
-|--------|-------|-------------|----------------------------------|
-| Expand local AI to defensive use cases: anomaly detection, code review, vulnerability prioritization | AI / Security | Defensive AI capability map | Local AI cluster deployed in Phase 3 |
-| Implement automated security posture measurement | Security | Continuous compliance dashboard | Existing APIs (Microsoft Graph, AWS APIs) + Grafana or open-source dashboard |
-| Evaluate and migrate additional AI workloads to local infrastructure | AI Lead | Migration roadmap with quarterly targets | Local AI infrastructure + business case templates |
-| Conduct first antifragility maturity assessment | Consultant / Security | Baseline maturity score with gap analysis | Spreadsheet or existing GRC tool |
-| Pilot organizational integration: embed security in one product team | Consultant / Engineering | Shift-left pilot metrics | Existing team structure + collaboration tools |
-| **Deploy AI-assisted TVM operationalization** | AI / Security | AI TVM dashboard; <48h critical CVE response | Defender Exposure Management + Azure OpenAI or local LLM; see [AI-Assisted TVM Blueprint](ai-assisted-tvm.md) |
-
-### Phase 4 Exit Criteria
-
- [ ] Exit architectures documented for top 5 vendor dependencies
- [ ] Chaos engineering operational in production
- [ ] Mean time to structural fix < 14 days from incident
- [ ] Defensive AI pilot operational
- [ ] First antifragility maturity assessment completed
- [ ] Quarterly antifragility review calendar established
-
-### Phase 4 Mantra
-
-> *"We do not want fewer incidents. We want incidents that teach us something we could not have learned any other way."*
-
---
-
-## Governance and Cadence
-
-### Weekly Steering Committee
-
- Review blockers and escalations
- Validate phase exit criteria
- Adjust scope based on organizational readiness
-
-### Monthly Board Update
-
- Risk reduction metrics
- Antifragility maturity trend
- Investment vs. risk-exposure reduction
- Strategic narrative: "This is not a cost centre; it is optionality insurance"
-
-### Quarterly Retrospective
-
- What failed that taught us something?
- What assumptions have been invalidated?
- What new dependencies have emerged?
- What can be simplified or removed?
+**What makes Phase 4 real**: A named person who owns the housekeeping queue. A calendar-blocked weekly threat review. A quarterly retained capability scope. Without these, Phase 4 does not happen — and everything built in 180 days begins to rot.

 ---

 ## Success Metrics

-| Dimension | Metric | Target |
-|-----------|--------|--------|
-| **Visibility** | % of assets in CMDB | 100% of T0/T1 within 30 days |
-| **Control** | Mean time to contain new identity | < 1 hour |
-| **Sovereignty** | % of proprietary AI workloads local | 100% of T0-class within 90 days |
-| **Resilience** | Recovery time for critical system | < 4 hours |
-| **Learning** | Structural fixes per incident | ≥ 1 |
-| **Optionality** | Vendor dependencies without exit plan | 0 |
+| Dimension | Metric | Realistic Target |
+|-----------|--------|-----------------|
+| **Kill chain** | Kill chain nodes closed | 100% of P0 nodes closed by day 120 |
+| **Identity** | MFA enforcement on privileged accounts | 100% of T0 accounts by day 60; 100% of all accounts by day 120 |
+| **Configuration** | ASTRAL drift detected and reviewed | Weekly; 100% of unauthorised drift investigated within 48h |
+| **Audit trail** | PULSAR retention operational | 12+ months of M365 audit events retained by day 60 |
+| **Housekeeping** | Stale accounts resolved per quarter | Measurable queue reduction each cycle; not a fixed % target |
+| **Recovery** | T0 system recovery test completed | At least one per T0 system within 180 days |
+| **Handover** | Client IT lead operational independence | All built systems operable without consultant by day 180 |
+
+> **On metrics and honesty**: Avoid targets that sound like achievements but are not verifiable. "100% of identities validated" cannot be verified in 180 days in any organisation with meaningful history. "All T0 accounts with MFA enforced and verified via CA sign-in logs" is verifiable. Write metrics you can prove, not metrics that sound ambitious.
+
+---
+
+## Governance and Cadence
+
+### Weekly Check-In (30 minutes, every week)
+
+- Change log review: what was completed, what is blocked
+- Client decisions required this week
+- Risks and open items
+
+*If this meeting is consistently cancelled by the client, the engagement pauses until it resumes.*
+
+### Monthly Steering Committee (60 minutes)
+
+- Phase progress against exit criteria
+- Risk register review
+- Housekeeping queue status
+- Budget and scope review
+- Next phase / retained capability planning
+
+### Phase Gate Reviews (Days 60, 120, 180)
+
+Hard go/no-go decisions. Not formalities. If phase exit criteria are not met, the programme does not advance — it addresses the gaps.

 ---

 ## Adaptation Guide

-### Small Organizations (< 100 employees)
+### Small Organisations (< 100 employees)

- Compress Phases 1-2 into 30 days
- Use managed sovereign cloud for local AI instead of on-premises hardware
- Focus on identity, backup, and one high-value AI pilot
- Leverage Microsoft Business Premium or Google Workspace security features fully before any additional purchase
+- Phase 1 focus: kill chain, T0 accounts, ASTRAL/PULSAR deployment. Skip broad identity audit — it is not necessary for small populations.
+- Phase 2 focus: MFA for all users (achievable quickly at small scale), basic CA, device compliance.
+- Phase 3 focus: runbooks and handover. Detection engineering is proportional to environment complexity.
+- **Do not compress the timeline further.** The bottleneck at small organisations is almost always IT resource availability and change management, not technical complexity.

 ### Regulated Industries (Finance, Healthcare, Critical Infrastructure)

- Extend Phase 1 to 45 days for compliance mapping
- Integrate regulatory requirements into T0 classification
- Add compliance validation gates at each phase exit
+- Extend Phase 1 to 90 days where regulatory mapping and OT inventory are required.
+- Add compliance validation gates at each phase exit — specific evidence requirements for NIS2/DORA/GDPR.
+- The housekeeping stream is non-negotiable for regulators who require demonstrable continuous control.

-### Highly Distributed Organizations
+### Organisations with Heavy Technical Debt

- Prioritize network segmentation and DNS security in Phase 1
- Deploy edge inference nodes in Phase 3 instead of central cluster
- Emphasize operational resilience and disconnected operations
+- Accept explicitly, in writing, that 20 years of debt will not be cleared in 180 days.
+- Phase 1 focus is kill chain only. The full debt picture goes into the housekeeping queue and the Phase 4 backlog.
+- The rapid modernisation plan addresses existential risk. The housekeeping stream addresses accumulated risk over time. Both are necessary; neither replaces the other.
+- Adjust Phase 2 exit criteria to reflect the realistic pace of MFA rollout in high-debt environments — legacy systems often require extended exception handling.

-### Organizations with Heavy Technical Debt
+### OT/Critical Infrastructure Environments

- Accept that 20 years of debt cannot be cleared in 180 days
- Use defensive AI in Phase 4 to accelerate debt identification and prioritization
- Focus on "kill chain" protection rather than comprehensive cleanup
- Map every action to CIS IG1 to show standards alignment without additional framework investment
+- Phase 1 must include OT asset inventory and IT/OT connection map.
+- Phase 2 segmentation work (IT/OT boundary) is the primary kill chain closure, not identity hardening.
+- See [Vertical: Power and Utilities](../reference/vertical-power-utilities.md) and the Critical Infrastructure Adaptation in [Move Fast and Fix Things](move-fast-and-fix-things.md#the-critical-infrastructure-adaptation).

 ---