Files

Tomas Kracmar 763da003d3 Initial commit: antifragile cybersecurity consulting blueprint

Complete repository of frameworks, playbooks, and assessment resources
for cybersecurity consultations focused on antifragile enterprise design.

Includes:
- Core philosophy and manifest (5 pillars)
- 12 modular engagement packages
- AI sovereignty and operations frameworks
- Zero-budget vulnerability discovery and hardening playbooks
- M365 E3 hardening and antifragile project plans
- Osquery sovereign discovery platform blueprint
- Perimeter scanning capability guide
- AI-assisted TVM blueprint for AI-powered adversaries
- Vertical specializations: banking, telco, power/utilities
- CIS Controls v8 and NIST CSF 2.0 mappings
- Risk registers and assessment templates
- C-suite conversation guide and business case templates

2026-05-09 16:53:22 +02:00

16 KiB

Raw Blame History

Implementation Playbook

"This is not an upgrade. It is an insurance policy against the obsolescence of your own company."

This playbook provides tactical, step-by-step guidance for delivering the Rapid Modernisation Plan in a client environment. It is organized by workstream and intended for hands-on consultants, security architects, and technical leads.

Engagement Kickoff
Workstream: Identity and Access
Workstream: Perimeter and Visibility
Workstream: AI Sovereignty
Workstream: Resilience and Recovery
Workstream: Culture and Governance
Common Failure Modes
Tools and Templates

Engagement Kickoff

Pre-Engagement Checklist

Before arriving on-site or starting the remote engagement:

Client has signed SOW with explicit scope, authority, and escalation paths
Key stakeholders identified: CISO, CIO, legal, business unit sponsors
Initial data room access granted: AD exports, cloud IAM, network diagrams, CMDB if exists
Emergency contact list established with authority to disable accounts / block access
Backup verification: confirm backups exist and have been tested within last 90 days
"Get out of jail free" letter: written executive authorization for disruptive security actions

Day 0: Stakeholder Interviews

Interview each stakeholder for 30 minutes. Ask the same five questions:

What is the shortest path to a business-ending incident here?
What are you most worried about that you are not telling the board?
What is the one system whose failure would stop revenue for 24 hours?
Where is your proprietary data going that you cannot fully track?
If you had to replace your primary cloud vendor in 90 days, could you?

Document answers. Look for contradictions between stakeholders—these reveal hidden dependencies.

Day 0: Establish the War Room

Physical or virtual space for daily standups
Shared dashboard: tasks, blockers, risks
Direct escalation path to executive sponsor
Decision log: every major decision recorded with rationale and owner

Workstream: Identity and Access

Objective

Eliminate unknown identities, reduce privilege, and establish just-in-time access before attackers exploit standing permissions.

Week 1: Identity Census

Step 1: Export all identities

Active Directory: all users, groups, computers, service accounts
Cloud IAM: AWS IAM, Azure AD / Entra ID, GCP IAM
SaaS platforms with local identity stores
Non-human identities: API keys, service principals, OAuth apps, managed identities

Step 2: Deduplicate and correlate

Match cloud identities to on-premises identities
Identify orphaned accounts: no owner, no recent use, no documented purpose
Identify over-privileged accounts: admin rights without justification

Step 3: Categorize by risk

Category	Action	Timeline
Orphaned, unused > 90 days	Disable immediately	Day 1-2
Shared accounts	Target for elimination or vaulting	Week 1-2
Admin / privileged	Force password rotation + MFA enforcement	Day 3-5
Service accounts with interactive logon	Review and restrict	Week 1-2
External / vendor access	Audit and time-bound	Week 1-2

Week 2: Privilege Reduction

Step 1: Implement Privileged Access Workstations (PAWs)

Dedicated machines for admin tasks
No internet browsing, no email, no non-admin applications
Physical or strongly virtualized separation

Step 2: Deploy Just-in-Time (JIT) elevation where possible

Azure AD PIM, AWS IAM Identity Center, or third-party PAM
Maximum elevation duration: 4 hours
Require approval for standing admin roles

Step 3: Password hygiene enforcement

Minimum 14 characters, no complexity requirements (NIST 800-63B)
Audit against known-breached password lists
Eliminate password rotation mandates unless compromise suspected

Week 3-4: MFA and Conditional Access

Enforce MFA on all remote access: VPN, cloud admin, RDP gateways
Implement risk-based conditional access:
- Unmanaged device → require MFA + compliant device
- Impossible travel → block or step-up
- Legacy authentication → block entirely

Common Pitfalls

Over-scoping: Do not attempt to fix every identity in 30 days. Focus on privileged and external first.
Breaking automation: Service account password rotations can break CI/CD. Coordinate with application owners. Test in non-production first.
Shadow IT identities: SaaS platforms with standalone accounts (Slack, Zoom, etc.) are often missed. Use email domain scanning or CASB tools.

Workstream: Perimeter and Visibility

Objective

Know exactly what the organization looks like from the outside, and monitor every path that crosses the trust boundary.

Week 1-2: External Attack Surface Mapping

Step 1: Passive reconnaissance

Enumerate subdomains: certificate transparency logs, DNS brute force, search engine dorks
Identify exposed services: Shodan, Censys, custom port scanning from external vantage points
Map cloud assets: public S3 buckets, open storage accounts, exposed databases

Step 2: Active validation

Confirm ownership of discovered assets with client
Test for default credentials on exposed management interfaces
Document findings with risk ratings: P0 (immediate), P1 (urgent), P2 (planned)

Week 2-3: Internal Visibility

Step 1: Deploy endpoint detection

Microsoft Defender for Endpoint, CrowdStrike, SentinelOne, or equivalent
Target: 100% of managed Windows, macOS, Linux endpoints
Validate: can you see process execution, network connections, and file modifications?

Step 2: Network monitoring

Deploy sensors at:
- Internet boundary
- Internal network segments (especially IT/OT boundaries)
- Critical server VLANs
Enable DNS query logging and analysis

Step 3: Log aggregation

Centralize logs from: identity systems, endpoints, firewalls, cloud control planes, critical applications
Minimum retention: 90 days hot, 1 year cold
Ensure tamper protection: attackers delete logs

Week 3-4: CMDB Seeding

Populate CMDB with T0 and T1 assets first
For each asset: owner, criticality, dependencies, recovery requirements
Accept imperfection. A partially correct CMDB is infinitely better than no CMDB.

Common Pitfalls

Scanning without authorization: Ensure written approval for active scanning. Some jurisdictions treat unauthorized scanning as criminal.
Alert fatigue: Do not enable every detection rule on day one. Start with high-confidence, high-impact alerts. Tune before expanding.
Log storage costs: Centralized logging is expensive. Prioritize critical systems. Use tiered storage.

Workstream: AI Sovereignty

Objective

Convert intelligence from a rented commodity into an owned, protected, T0-class asset.

Week 1-2: AI Usage Discovery

Step 1: Survey

Interview department heads: engineering, legal, marketing, operations, finance
Ask: "What AI tools are you using? What data are you putting into them?"
Expect 30-50% shadow usage. Employees use personal ChatGPT accounts, browser extensions, and mobile apps.

Step 2: Technical discovery

Review proxy logs for AI API traffic: OpenAI, Anthropic, Google, Azure OpenAI
Review SaaS billing for AI-enabled tools
Review browser extensions and endpoint software inventories

Step 3: Data flow mapping

For each discovered AI tool, document:

Data types entering the tool
Data residency and processing location
Vendor terms: training use, retention, deletion, subprocessing
Regulatory implications: GDPR, DORA, NIS2, industry-specific

Week 3-4: Local AI Infrastructure

Step 1: Select hardware or sovereign cloud

Option	When to Use
On-premise GPU servers	High volume, strict air-gap, existing data centre capacity
Sovereign cloud (EU, national)	Regulatory requirements, no on-premises GPU expertise
Edge inference nodes	Distributed organization, OT environments, low-latency requirements

Step 2: Select initial model

For most organizations, start with:

Base model: Llama 3, Mistral, or Qwen (7B-13B parameters, quantized to 4-bit)
Deployment: Ollama, vLLM, or llama.cpp for inference
Orchestration: LangChain or custom RAG pipeline for proprietary data integration
Fine-tuning: QLoRA for domain adaptation on proprietary datasets

Step 3: Deploy with T0 controls

Network segmentation: inference hosts have no direct internet egress
Access control: model weights encrypted at rest; access requires multi-party approval
Audit: log all prompts, responses, and model access
Backup: immutable backups of weights, configurations, and vector databases

Week 5-8: Pilot and Measure

Select one high-value, low-risk workflow:

Workflow	Why It Works
Internal code review assistant	Proprietary code never leaves perimeter; measurable quality improvement
Security log analysis	Feeds defensive AI directly; reduces analyst workload
Policy / compliance document drafting	High volume, repetitive, proprietary domain knowledge
Customer support triage	Reduces response time; training data is historical tickets

Measurement criteria:

Accuracy vs. cloud baseline (human-evaluated on a sample)
Cost per inference (compute + personnel)
Data leakage incidents: zero
User satisfaction: qualitative survey

Common Pitfalls

Over-engineering the first deployment: Do not build a full MLOps platform for the pilot. Start simple. Prove value. Then scale.
Ignoring GPU availability: GPU procurement can take months. Have a cloud fallback for the pilot if on-premises hardware is delayed.
Neglecting prompt injection: Local models are not immune to adversarial prompts. Implement input validation and output filtering.
Forgetting the human loop: AI augments decisions; it does not replace accountability. Design workflows where humans retain final authority.

Workstream: Resilience and Recovery

Objective

Ensure that when—not if—a critical system fails, recovery is fast, tested, and deterministic.

Week 1-4: Backup Validation

Step 1: Inventory backup coverage

For every T0 and T1 asset: what is backed up, how often, where, by what mechanism
Identify gaps: databases without point-in-time recovery, VMs without application-consistent snapshots

Step 2: Test restoration

Select one critical system per week
Perform full restoration to isolated environment
Document: time to restore, data loss window, manual steps required, blockers encountered

Step 3: Fix what breaks

If a backup cannot be restored, the backup does not exist
Update procedures, fix tooling, re-test

Month 2-3: Recovery Automation

Automate the most common recovery scenarios: VM restore, database point-in-time recovery, Active Directory forest recovery
Document runbooks for scenarios that cannot be fully automated
Train multiple team members on each runbook

Month 3-6: Chaos Engineering

Step 1: Game days

Scheduled, announced simulations of failure scenarios
Example: simulate domain controller failure during business hours
Measure: detection time, escalation time, resolution time, communication quality

Step 2: Chaos experiments

Unannounced, bounded experiments in non-production
Example: terminate API service instances, block DNS resolution, fill disk space
Validate: auto-scaling, alerting, runbook accuracy

Step 3: Production chaos

Only after months of successful game days and non-production experiments
Start with low-risk failures: single instance termination, network latency injection
Always have automated rollback and a human kill switch

Common Pitfalls

Assuming backups work: Untested backups are prayers, not plans.
Recovery without validation: A restored system that cannot authenticate users or connect to databases is not recovered.
Chaos without guardrails: Never run chaos experiments when the organization is already under stress (active incident, change freeze, key personnel on leave).

Workstream: Culture and Governance

Objective

Embed antifragile principles into decision-making, hiring, and organizational habits.

Tactics

Blameless Post-Mortems

Within 48 hours of significant incident
Focus: what about the system allowed this mistake? Not: who made the mistake?
Mandatory output: at least one structural change (policy, architecture, or procedure)
Publish internally: transparency builds trust and disseminates learning

Security Champions Program

Identify one volunteer per team who acts as security liaison
Monthly 1-hour meeting: new threats, policy changes, team-specific concerns
Champions feed team context up and security guidance down

Red Team as a Service

Monthly or quarterly adversarial simulations
Report to CISO and board, not just IT
Measure: time to detect, time to contain, time to evict
Trend over time: the organization should get faster, not just more compliant

Antifragile Metrics Review

Monthly steering committee reviews:
- Mean time to structural fix (from incident)
- Number of chaos experiments run and lessons learned
- % of vendor dependencies with documented exit plan
- AI sovereignty maturity score

Common Pitfalls

Post-mortems without action: If findings are not tracked to completion, they become theater.
Security champions without authority: Champions need time allocation and executive backing, or they become scapegoats.
Metrics without narrative: Numbers alone do not persuade boards. Pair metrics with stories: "Here is what we learned, here is what we changed, here is why we are safer."

Common Failure Modes

Failure Mode	Symptom	Remedy
Scope creep	30-day phase stretches to 90 days	Time-box ruthlessly. Document deferred items for next phase.
Tool obsession	Team debates SIEM vendor for 3 weeks	Pick the good-enough tool. Implementation beats selection.
Perfectionism	CMDB project stalls waiting for completeness	Seed with critical assets. Expand iteratively.
Vendor capture	Recommendations always favor one provider	Disclose relationships. Maintain independence. Document alternatives.
Executive fatigue	Board stops attending updates	Lead with business risk, not technical detail. Show cost of inaction.
Operational resistance	IT refuses to disable legacy accounts	Use the "get out of jail free" letter. Escalate to executive sponsor.
Pilot purgatory	Local AI pilot runs forever without production migration	Define hard success criteria and production migration date before starting.

Tools and Templates

Templates Included in This Repository

T0 Asset Classification Worksheet
AI Usage Discovery Interview Guide (see Workstream: AI Sovereignty)
Blameless Post-Mortem Template (to be added)
Chaos Experiment Planning Template (to be added)
Vendor Exit Architecture Template (to be added)

Recommended External Tools

Category	Options	Notes
Endpoint Detection	Microsoft Defender, CrowdStrike, SentinelOne	Choose based on existing Microsoft footprint
SIEM / Log Analysis	Sentinel, Splunk, Elastic, Wazuh	Wazuh is open-source and sufficient for many environments
Identity Governance	Azure AD / Entra ID, Okta, Saviynt	Match to primary cloud identity provider
PAM / Vault	CyberArk, Delinea, HashiCorp Vault	Essential for service account and secret management
CMDB	ServiceNow, Device42, GLPI, or spreadsheet	Any CMDB is better than no CMDB
Local AI Inference	Ollama, vLLM, llama.cpp, TGI	Start simple; scale to TGI or vLLM for production load
Chaos Engineering	Gremlin, Chaos Mesh, custom scripts	Gremlin for enterprise; Chaos Mesh for Kubernetes

This playbook is a living document. Update it with lessons from every engagement.

Previous: Rapid Modernisation Plan

16 KiB Raw Blame History

Implementation Playbook

Table of Contents

Engagement Kickoff

Pre-Engagement Checklist

Day 0: Stakeholder Interviews

Day 0: Establish the War Room

Workstream: Identity and Access

Objective

Week 1: Identity Census

Week 2: Privilege Reduction

Week 3-4: MFA and Conditional Access

Common Pitfalls

Workstream: Perimeter and Visibility

Objective

Week 1-2: External Attack Surface Mapping

Week 2-3: Internal Visibility

Week 3-4: CMDB Seeding

Common Pitfalls

Workstream: AI Sovereignty

Objective

Week 1-2: AI Usage Discovery

Week 3-4: Local AI Infrastructure

Week 5-8: Pilot and Measure

Common Pitfalls

Workstream: Resilience and Recovery

Objective

Week 1-4: Backup Validation

Month 2-3: Recovery Automation

Month 3-6: Chaos Engineering

Common Pitfalls

Workstream: Culture and Governance

Objective

Tactics

Common Pitfalls

Common Failure Modes

Tools and Templates

Templates Included in This Repository

Recommended External Tools

16 KiB

Raw Blame History