Files

T

tomas.kracmar 7ff4fad953 feat: Add management overlay pattern (Nebula T0 / Tailscale T1) and cloud admin VM guidance

2026-06-09 14:40:34 +02:00

21 KiB

Raw Blame History

Privileged Access Architecture

"Your VPN authenticates people to your network. Your PAM authenticates people to specific resources inside it. Most organisations deploy neither correctly. The result is a flat network where a compromised laptop reaches every server, and a stolen VPN credential reaches everything else."

For the Executive Reader

Every organisation has two access control problems hiding behind the label "VPN":

Who can reach the network? The VPN problem — getting authorised people onto the network at all.
Who can touch which specific systems? The PAM problem — ensuring that once inside the network, users can only reach what they need, nothing more, and every action is recorded.

Most organisations solve the first problem badly (legacy VPN, IP whitelisting, overlapping access methods nobody remembers creating) and ignore the second entirely. An attacker who compromises a VPN credential in this configuration has access to everything.

The antifragile answer is a two-layer architecture: network access (Tailscale or Headscale) sitting in front of protocol-aware privileged access (Teleport). Each layer can be deployed independently. Together they close the most common kill chain in the playbook.

For module selection, see Modular Engagements. For the asset classification that determines which systems require PAM, see T0 Asset Framework.

When overlay management networks help — and when they don't

Enterprises with their own data centres already have the physical substrate for a proper management network: dedicated VLANs, hardware segmentation, jump boxes. Adding an overlay management network introduces a new Tier 0 component (the coordinator) on top of infrastructure that already solves the problem. The complexity cost outweighs the benefit. Traditional management VLAN segmentation, done properly, is the right answer.

SME clients with multi-cloud resources, containers, and DevOps workloads have a different problem: there is no physical network to segment. Resources are scattered across Azure, AWS, a colo, and maybe on-prem. The management plane does not exist yet — you are building it. An overlay is how you build it, and it is the right answer for this context.

The T0/T1 split — applying the tier model to the overlay itself:

T0 systems (domain controllers, ADCS, Entra Connect sync server — the identity control plane): use Nebula. No coordinator in the runtime path — once certificates are distributed, the overlay functions with zero external dependencies. The Nebula CA is the only Tier 0 component, and it can be kept offline. This means no coordinator to compromise, no external API call, no cloud service availability dependency for reaching your most critical systems.
T1 systems (member servers, cloud workloads, Kubernetes clusters, multi-cloud management): use Tailscale (or Headscale for sovereign requirements). Per-node ACLs, Entra OIDC integration, per-session MFA via key expiry and IdP enforcement. The coordinator trust concern is more acceptable at T1 — a compromised coordinator affects T1 access, not T0.

The T0 node count is not scary. For a 5,000-person organisation, the realistic T0 Nebula population is:

Component	Count
Domain Controllers	4–8
Entra Connect / Cloud Sync server	1–2
ADCS issuing CA	1–2
AD FS servers (if not yet removed)	0–4
Cloud admin VMs / PAWs	5–10
Total	~15–25 nodes

Certificate management for 15–25 nodes is a documented procedure, not an operational burden. The CA signing ceremony happens a few times a year when a PAW is replaced or an admin leaves. This is tractable.

The PAW problem and the cloud admin VM

Physical PAWs are the right principle. They almost never get deployed. Hardware procurement, second device on the desk, behaviour change — the project dies before it starts.

The cloud-hosted admin workstation preserves the essential security properties without the hardware problem:

A Windows 365 or Azure Virtual Desktop VM provisioned from a hardened template
Used only for privileged tasks (no email, no general browsing)
Connected to the Nebula T0 overlay (for DC access) and Tailscale T1 overlay (for server/cloud access)
Accessed by the admin from their normal device via browser or RDP client
Privileged credentials live in the cloud VM, not on the admin's local device
Compromise response: wipe the VM, reprovision from template in 20 minutes

The security property that matters — privileged credentials do not touch the device used for email and browsing — is preserved. An attacker who compromises the admin's local device gets a browser session to a cloud VM that requires phishing-resistant MFA to reach. They do not get cached credentials, session tokens, or WireGuard keys for the management overlay.

When to use a physical PAW instead: clients with a strong security culture and genuine appetite for the operational overhead, OT/ICS environments where the management workstation may need to be air-gapped, or engagements where the threat model includes a sophisticated attacker who would attempt to compromise the RDP session interactively.

The Two Layers

Layer 1: Network Access — Tailscale / Headscale + WireGuard

What it solves: Replace the legacy VPN sprawl. Admins and remote workers get secure, identity-aware access to internal networks without exposing services to the internet.

How it works: WireGuard mesh VPN managed by a control plane (Tailscale as a service, or Headscale self-hosted). Every device gets a node identity. Access is controlled by ACL policies, not IP rules. No open firewall ports required on servers.

Why it matters for security:

Eliminates the "VPN = everything" flat-network problem via ACL policies
Every connection is mutually authenticated (device certificate + identity)
Audit log of who connected to what, when
Access can be revoked instantly by removing a node from the control plane

Layer 2: Protocol-Aware PAM — Teleport

What it solves: Once someone is on the network, enforce which specific servers, databases, and Kubernetes clusters they can access — and record every session in a tamper-evident audit trail.

How it works: Teleport proxies connections to SSH servers, Windows hosts (RDP), Kubernetes clusters, and databases. Users authenticate once (SSO/MFA); Teleport issues short-lived certificates. Sessions are recorded and searchable. No static credentials stored on servers.

Why it matters for security:

Eliminates shared/static credentials on servers (root, administrator)
Just-in-time access: permissions expire, removing standing access
Session recording: every sudo, every SQL query, every RDP session
Auditor-ready evidence: access logs that regulators actually accept

Tool Details

Teleport

Attribute	Detail
What it does	Protocol-aware privileged access proxy for SSH, RDP, Kubernetes, databases, and internal web applications. Short-lived certificates. Full session recording.
Antifragile pillar	Sovereign Intelligence, Structural Decoupling
Open-source status	Community Edition (CE) is open-source and self-hosted

CE Eligibility — Be Honest With Clients

Teleport CE is an excellent, capable product. The licensing constraint is important to communicate clearly:

Teleport CE is free for organisations with fewer than 100 employees AND less than $10M annual revenue. Both conditions must be met.

This catches more clients than it appears. A manufacturing company with 800 employees and 6 administrators who would touch Teleport cannot legally deploy CE, even though it would work perfectly for their use case. When in doubt, check with the client's legal team before deploying CE at scale.

Scenario	Recommendation
< 100 employees, < $10M revenue	Teleport CE — free, self-hosted, full feature set for this scale
> 100 employees OR > $10M revenue	Teleport Enterprise (commercial) or see Alternatives below
Client needs vendor support	Teleport Enterprise regardless of size
Client has sovereign data mandate	Teleport CE or Enterprise self-hosted (both are self-hosted)
OT/SCADA vendor remote access at scale	Teleport Enterprise — session recording and just-in-time access are critical

Teleport CE vs Enterprise Feature Comparison

Feature	CE	Enterprise
SSH, RDP, K8s, DB access proxying	✅	✅
Session recording	✅	✅
Short-lived certificates	✅	✅
SSO integration	✅	✅
Just-in-time (JIT) access	✅	✅
Access request workflows	✅	✅
Device trust (trusted devices only)	Limited	✅
Access monitoring & alerts	Limited	✅
FedRAMP / compliance reports	❌	✅
Commercial support SLA	❌	✅
High availability clustering	Limited	✅
License restriction	< 100 employees AND < $10M revenue	None

The conversation for non-qualifying clients:

"Teleport CE would work technically — your admins would love it. The license terms prohibit it for organisations your size. We can deploy Teleport Enterprise (priced per protected resource, not per user), or we can architect the network access layer with Tailscale and use certificate-based SSH access for the protocol layer. Both are valid paths. The right choice depends on whether session recording and JIT workflows are on your auditor's checklist."

Tailscale — Commercial Partnership

Attribute	Detail
What it does	Managed WireGuard mesh VPN. Every device gets a node identity. Access controlled by ACL policies. Works on any device, any OS, any cloud.
Why we partner	Tailscale provides the managed control plane, commercial support, and SSO integrations that make enterprise deployment painless. Per-user pricing is predictable.
Sovereign alternative	Headscale (open-source self-hosted control plane for WireGuard) — see below
Antifragile pillar	Structural Decoupling, Optionality Preservation
Engagement modules	Module 2 (Identity Security), Module 6 (AD Hardening), Module 8 (OT Security), Module 13 (this module)

When to recommend Tailscale (commercial):

Client wants commercial support and SLA
Client needs Tailscale's SSO integrations (Okta, Azure AD, Google)
Client has a mixed-device estate that benefits from Tailscale's client apps
Client's procurement requires a vendor contract

The conversation:

"You currently have a legacy VPN that requires a specific client, routes all traffic through your data centre, and gives everyone access to the same network. Tailscale replaces it with a mesh that puts every authorised device directly in contact with every authorised resource — no central bottleneck, no broad network exposure. An admin in Prague connects to the server in Vienna as if they are on the same LAN. A supplier accesses only the one application they need, nothing else. When you revoke access, it is immediate and complete."

Headscale + WireGuard — Sovereign Alternative

Attribute	Detail
What it does	Self-hosted control plane (Headscale) for WireGuard mesh networks. Functionally equivalent to Tailscale without the external control plane. Data never leaves client infrastructure.
Why we use it	For clients with sovereign-data mandates, air-gapped environments, or regulated industries where data about network topology and device identities cannot reside with a third party.
Trade-off vs Tailscale	More engineering overhead; no managed apps; SSO integration requires custom OIDC configuration; no commercial support
Antifragile pillar	Sovereign Intelligence, Structural Decoupling
When to deploy	Clients with NIS2/DORA requirements on data residency; utilities/OT environments; clients who have explicitly declined SaaS control planes

Deployment model: Headscale server on client infrastructure or CQRE-managed VM; WireGuard clients on all devices. Managed by us as a retained service or handed over to the client's infrastructure team.

Nebula — T0 Management Overlay

Attribute	Detail
What it does	WireGuard-based overlay mesh with no coordinator in the runtime path. Nodes authenticate via pre-distributed certificates signed by a local CA. Lighthouse nodes handle NAT traversal only — they are not in the authentication path.
Why it is right for T0	No external runtime dependency. A compromised or unavailable coordinator cannot affect T0 access. The CA (the actual trust anchor) can be kept offline and brought up only for certificate issuance.
Trade-off vs Tailscale	No dynamic node management (adding/removing a node requires a CA operation and cert redistribution); no cloud-managed control plane; higher initial setup complexity; certificate revocation requires distributing an updated blocklist
Why the trade-off is acceptable for T0	T0 node population is small (15–25 nodes) and stable. Revocation events (lost PAW, departing admin) are rare and known immediately. The operational overhead is a documented ceremony run a few times a year, not a recurring burden.
Antifragile pillar	Structural Decoupling, Sovereign Intelligence
When to deploy	T0 systems (DCs, sync server, ADCS) in any estate; air-gapped or restricted environments; clients where the management plane must have zero external runtime dependencies

Nebula CA management — the one non-trivial operation:

The Nebula CA private key is the trust anchor for the entire T0 overlay. It must be treated accordingly:

Air-gapped machine (a dedicated laptop that is never networked, or a hardware security module)
Documented signing ceremony: who is authorised to sign a new certificate, what approval is required, what the procedure is
Named individuals (minimum two) who know the procedure and can perform it
CA key backup: encrypted, stored separately from the signing machine, tested
Short certificate lifetimes (90–180 days) so revocation is handled implicitly by non-renewal as much as by explicit blocklist distribution

This is the same discipline as an offline root CA — because that is functionally what it is.

Smallstep — Certificate-Based SSH Access

Attribute	Detail
What it does	SSH certificate authority. Issues short-lived SSH certificates tied to identity (SSO/OIDC). Eliminates static SSH keys. No agent required on target servers.
Why we use it	For clients who need certificate-based SSH access control but cannot justify Teleport. Covers the most common privileged access vector (SSH) at low cost and complexity.
Limitation vs Teleport	No session recording; no RDP/Kubernetes/DB proxying; no GUI
Antifragile pillar	Sovereign Intelligence, Structural Decoupling
When to deploy	Linux-heavy clients; DevOps teams; as a stepping stone before Teleport

The Decision Framework

Does the client have their own data centre with physical network infrastructure?
├── YES → Traditional management VLAN segmentation + jump box
│          Overlay adds complexity without proportional benefit here
└── NO / Multi-cloud / Scattered resources → Overlay is the right management plane

Does the client need a T0 management overlay (DC, ADCS, sync server access)?
├── YES → Nebula (no external runtime dependency, CA offline)
│   └── Admin workstation: cloud admin VM (W365/AVD) or physical PAW, enrolled in Nebula
│
Does the client need a T1 overlay (servers, cloud workloads, K8s, DevOps)?
├── YES → Layer 1 (network access)
│   ├── Wants managed service + commercial support → Tailscale + Entra OIDC + key expiry MFA
│   └── Wants full sovereignty / data residency → Headscale + WireGuard
│
Does the client need protocol-aware session recording / JIT / DB access?
├── YES → Add Layer 2 (PAM)
│   ├── < 100 employees AND < $10M revenue → Teleport CE (free, self-hosted)
│   ├── Larger org / needs support → Teleport Enterprise (commercial, verify current pricing)
│   └── SSH-only, budget-constrained → Smallstep (certificates only, no session recording)
│
Typical SME multi-cloud client:
├── T0: Nebula + cloud admin VMs
├── T1: Tailscale + Entra OIDC
└── Session recording: Teleport CE if eligible, otherwise accept the gap and compensate with
    cloud VM audit logging and Tailscale connection logs

OT / Critical infrastructure:
└── Headscale (sovereign T1) + Nebula (T0 where applicable) + Teleport (vendor session recording)

OT and Critical Infrastructure Considerations

This module is especially valuable for Module 8 (OT Security Assessment) clients. The most common and dangerous finding in OT environments is uncontrolled vendor remote access: SCADA vendors, maintenance contractors, and automation engineers with persistent VPN credentials and no session recording.

The OT-specific requirements:

Requirement	Solution
Vendor access without standing credentials	Teleport JIT access: vendor requests access, engineer approves, session recorded, credential expires
No persistent VPN for OT networks	Tailscale/Headscale ACL policy: vendor node can reach only the specific OT asset, nothing else
Auditability for regulators (NIS2, CER)	Teleport session recordings: complete record of every vendor action on every OT system
Air-gapped or restricted networks	Headscale on-premise: no outbound control plane dependency
Separation of IT and OT access	Separate Tailscale/Headscale networks with explicit, audited bridge points

The executive pitch for utilities and telco:

"Your SCADA vendor has a VPN credential that gives them access to your control network. It has not been rotated in three years. You do not know when they last used it or what they did. If that credential is compromised, an attacker has access to your control systems without ever touching your IT network. We replace that with a session that the vendor requests on the day they need it, that an engineer approves, that is recorded start to finish, and that expires the moment the maintenance window closes. This is not extra bureaucracy. This is the audit evidence your regulator will ask for under NIS2."

CQRE Deployment Tiers

Tier	Description	Best for
Assessment & Design	Architecture review, tool selection, design document, implementation roadmap	Clients with existing VPN/PAM debt; pre-deployment planning
Managed Deployment	CQRE installs and configures the chosen stack; hands over to client team	Clients without internal infrastructure expertise
Fully Managed Service	CQRE operates the network access and PAM layer as a managed service	Clients who want the capability without the operational burden
Retained Advisory	Quarterly reviews, policy updates, incident support	Clients who have deployed and want ongoing assurance

Per-Module Tool Pairing

Module	Access Architecture Role
Module 2: M365 Identity Security	Tailscale/Headscale for admin access to cloud management plane; Teleport for server access
Module 6: On-Premise AD Hardening	Teleport CE as PAW replacement for domain controller access; recorded sessions for all Tier 0 admin activity
Module 8: OT Security Assessment	Headscale for sovereign OT network access; Teleport for vendor access with full session recording
Module 10: Red Team & Validation	Verify that Tailscale ACLs actually enforce segmentation; test Teleport JIT bypass scenarios
Module 13: This module	Full deployment of chosen network + PAM stack

Integration With Existing Frameworks

Document	Integration
T0 Asset Framework	T0 assets (domain controllers, key servers, OT controllers) require Teleport session recording; Tailscale ACLs isolate T0 network segments
AD and Endpoint Hardening	PAW architecture is enhanced by Teleport; privileged accounts should authenticate through PAM, not direct RDP
Sovereign Tool Stack	Tailscale/Headscale extends the network access layer; Teleport extends the identity and session intelligence layer
Vertical: Power and Utilities	Vendor remote access to OT is addressed directly by this module
Vertical: Telco	Network operations centre access, vendor access to network elements

For the OT security context, see Vertical: Power and Utilities. For identity and T0 asset protection, see T0 Asset Framework. For the full module menu, see Modular Engagements.

21 KiB Raw Blame History Unescape Escape