cqrenet/antifragile

Fork 0

Files

T

tomas.kracmar 5264f7b439 feat: Add Antifragile Handbook for M365 & AD (6 books + 2 field guides)

2026-06-09 11:48:11 +02:00

15 KiB

Raw Blame History

The Antifragile Handbook for M365 & Active Directory

Book III — Privileged Access

Privilege is blast radius with a time axis. Standing privilege reaches everything, forever. The whole job is to collapse both: less reach, less time.

The governing question

Book I asked you to draw the wall. Book II built it between on-prem and cloud. This book is about the credentials that can knock any wall down. Ask of every privileged identity — human, service account, or app:

If this credential leaks tonight, how long does it stay useful, and how far does it reach?

A permanent Domain Admin answers "forever, everything." A permanent Global Admin answers "forever, the whole tenant." A JIT, scoped, time-boxed role answers "for one hour, for one task." Every technique in this book exists to turn the first kind of answer into the second. That's it. That's the whole craft of privileged access: shrink the reach, shrink the time.

Compliance counts whether you "have a PAM solution." Wrong question. The question is whether privilege evaporates when not in use and whether a leaked credential hits a wall in minutes instead of owning the estate forever.

1. Fragility inventory — where privilege rots

Standing privilege (the original sin)

An account that is always an admin is a loaded gun left on the table, every hour of every day, whether anyone's using it or not. Its blast radius is constant and maximal. Permanent Domain Admins, permanent Enterprise Admins, permanent Global Admins — every one of them is a credential whose value to an attacker never drops to zero. The single most important number in this book is: how many identities hold standing privilege? In most estates it's an order of magnitude too high, and nobody has ever counted.

Service accounts and service principals (the dark matter)

This is where the bodies are buried, on both sides of the wall:

On-prem service accounts — over-permissioned ("we made it Domain Admin to make it work"), static passwords that haven't changed since 2016, an SPN attached so they're Kerberoastable (request the ticket offline, crack the weak password at leisure), owned by nobody, documented nowhere, and impossible to turn off because something unknown will break.
Cloud service principals / app registrations — the same disease in a new body. Client secrets that never expire, tenant-wide admin consent, and Microsoft Graph permissions that are quietly catastrophic: RoleManagement.ReadWrite.Directory, AppRoleAssignment.ReadWrite.All, Application.ReadWrite.All — any of which is a privilege-escalation path to Global Admin. Service principals cannot do MFA, usually hold standing privilege, and live in a blind spot no benchmark looks at hard enough.

Service identities are dark matter: most of the privileged mass of the estate, invisible in the usual diagrams, and gravitationally dominant when something goes wrong.

Tier violations (the wall with a hole kicked in it)

The Lindy core of on-prem security is the tier model (Tier 0 = identity control plane: DCs, AD, ADCS, the sync server from Book II; Tier 1 = servers; Tier 2 = workstations). Microsoft has since reframed it as the Enterprise Access Model reaching into the cloud, but the rule never changed:

A higher-tier credential must never be exposed on a lower-tier system.

Every Domain Admin who RDPs into a workstation, every admin whose daily-driver laptop also touches a DC, every shared jump box used for both Tier 0 and Tier 1 — that's a tier violation, and it's how pass-the-hash / pass-the-ticket turns one phished workstation into domain dominance. The clean-source principle is absolute: you cannot securely manage a system from a less-secure one.

The escalation plumbing nobody maps

AD ACL backdoors — who can reset whose password, who has WriteDACL / GenericAll on what. Privilege hides in object permissions, not just group membership. Attackers map this in minutes; defenders rarely map it at all.
Delegation — unconstrained delegation is a standing golden-ticket risk; constrained/RBCD misconfigurations are escalation paths.
ADCS — the certificate services escalation paths (the ESC-series misconfigurations) turn a forgotten CA template into domain compromise. ADCS is Tier 0 and is almost always treated as Tier 1 or forgotten entirely.
KRBTGT — the master key behind golden tickets. Rarely rotated; if an attacker ever had it, they may still have it.
LAPS absent — without per-machine local admin password randomisation, one cracked local admin hash unlocks lateral movement across every machine sharing it.

The recovery paradox

The accounts that can rebuild the estate after a disaster are, by definition, the most powerful — and therefore the most valuable to an attacker. Break-glass done carelessly is just standing privilege with a heroic name. (Handled in §4.)

2. Via negativa — what to remove (in priority order)

Privilege is the domain where deletion is the entire strategy. Adding "privileged access controls" on top of unmanaged standing privilege is rearranging furniture in a burning room.

Eliminate standing privilege. Roles become eligible, not active. Cloud-side this is PIM (§3). On-prem it's harder and the tooling is weaker — be honest about that (§ honest uncertainty) — but time-bound group membership and JIT elevation tooling exist; use them. The target state: at rest, almost nobody is an admin.
Empty the top groups toward the irreducible minimum. Drive Domain Admins, Enterprise Admins, and standing Global Admins down to the smallest number that reality permits (plus break-glass). Delegate specific rights instead of handing out god-mode. "Empty Domain Admins" is an achievable goal, not a fantasy.
Kill, convert, or constrain service identities. Remove the ones nobody can justify (apply the 90-day-scream test). Convert the rest to managed identities — gMSA on-prem (the established, Lindy fix: automatic password rotation, no static secret, not Kerberoastable in the same way), managed identities in Azure where possible. Strip every excess right. For app registrations: remove the dangerous Graph permissions, expire and rotate secrets, prefer certificate credentials or managed identities over secrets, and delete unused registrations and stale consent grants.
Remove tier violations. No high-tier credential on a low-tier box, ever. This is mostly subtraction — taking admin rights off daily-driver machines and shared boxes.
Fix the escalation plumbing by removal. Decommission unused ADCS templates, remove unconstrained delegation, prune dangerous ACLs, deploy LAPS so standing shared local admin passwords cease to exist.
Remove standing local admin from users. Most don't need it. The ones who think they do usually need it for ten minutes a month — which is a JIT problem, not a standing-rights problem.

3. The barbell — paranoia for the control plane, cheap for the rest

The irreplaceable few (paranoid, redundant, monitored):

Tier 0 — DCs, AD, ADCS, KRBTGT, and the sync server from Book II. This is the control plane; if it falls, everything falls.
The handful of break-glass Global Admins (§4).
The PIM / role-management configuration itself — because whoever controls who can become admin is effectively admin. Privileged Role Administrator and Privileged Authentication Administrator are crown roles; treat them as such.

Paranoid protection for privileged work means, non-negotiably:

PAWs — privileged access workstations. All Tier 0 / Global Admin work happens from a clean, hardened, single-purpose device that never reads email or browses the web. The admin's normal laptop is Tier 2 and stays there.
Phishing-resistant MFA only for admins — FIDO2 / passkeys / certificate- based. SMS and push-approve are not admin-grade; they're phishable, and admins are the phishing prize.
Separate, cloud-only privileged identities for cloud admin (the Book II firebreak, enforced here). On-prem admin identity must not be the cloud admin identity.
JIT for everything via PIM: eligible-not-active, time-boxed, MFA on activation, justification logged, and approval workflow on the crown roles.
Conditional Access scoped to admins — privileged roles usable only from PAWs / compliant devices / named locations.

Everything else stays cheap. Standard RBAC, normal user access, ordinary app permissions — don't pour the privileged-access budget evenly across the whole directory. Concentrate it ferociously on the tiny set of identities that own the control plane. A thousand hardened standard users won't save you if one permanent Domain Admin uses Password1! on a Kerberoastable SPN.

4. Optionality & recovery — escape hatches, tested

Break-glass done right. This is the deliberate exception to "no standing privilege" — you need an account that works when PIM, MFA infrastructure, or the IdP is down. So it's standing by necessity, which means it is protected differently: cloud-only, phishing-resistant credential stored offline/split, excluded from the CA policy that would otherwise lock it out, and wired so that any use at all triggers a screaming alert. Standing privilege you can't remove, you watch like a hawk. And you test it — an untested break-glass account is Schrödinger's recovery.
KRBTGT rotation on demand. Can you rotate KRBTGT (twice, with the required interval) the moment you suspect golden tickets — without taking the forest down? Is it rehearsed? If not, you have a theoretical control, not a real one.
Fast session revocation / admin disable. A one-move way to kill a compromised admin's sessions and tokens and disable the account, on both sides of the wall. Rehearse it; the breach is not the time to discover the command.
No single human as the only recovery path — balanced against blast radius. You want enough redundancy that one person under a bus (or under coercion) doesn't end recovery, without so many standing admins that you've recreated the problem. The barbell, again.
Tier 0 / forest rebuild path — links forward to Book V (Recovery). Know it exists, know it's been tested, know it doesn't secretly depend on a credential that the incident just compromised.

5. Stressor — break it on purpose

Pull an admin's standing access and route them through PIM for a week. Does real work still flow? If JIT activation is too slow or broken, people will route around it — and you'll have found that in a drill instead of discovering the shadow standing-admin account they created in revenge.
Kerberoast yourself. Run the attack against your own directory. Which service accounts crack? Did anything detect the ticket requests? Two findings in one cheap test.
Attempt a tier violation in a test window. Try to use a Tier 0 credential on a Tier 2 box. Is it blocked? Detected? Silent? Silence is the worst answer and the most common.
Run attack-path analysis as routine, not as a once-a-year pentest. Tools that map "who can reach Domain Admin / Global Admin in N hops" turn privilege escalation into a number you can track over time. The count of paths to domain/tenant dominance is a better security metric than any compliance percentage. Drive it down; watch it not creep back up.
Simulate a malicious consent grant / over-permissioned app. Register an app requesting a dangerous Graph scope. Does anything flag it? Can you find every existing app holding those scopes today? (You should be able to. Most can't.)
Break-glass drill — yes, again, and on a schedule. The recurring test in this whole handbook.

Per Book I principle 6: each of these must yield a structural change — a removed right, a severed path, a new alert — not a note that says "be careful."

Honest uncertainty (the moving parts — verify, don't trust this book)

Stable and Lindy (teach with confidence): standing privilege is the core risk; the tier / clean-source model; Kerberoasting, pass-the-hash, golden/silver tickets, DCSync; the gMSA pattern; JIT/eligibility as the goal. These don't churn.

What moves, and what you must verify against current Microsoft documentation:

PIM capabilities, role definitions, and the risk classification of specific Graph permissions evolve continually. Confirm which scopes are escalation-grade today rather than trusting a 2026 list.
On-prem JIT/PAM tooling is genuinely weaker and more fragmented than the cloud story. Native time-bound group membership, MIM PAM, and third-party PAM all have trade-offs that shift. Don't promise a client a clean AD-native JIT experience without checking current reality — and be honest that on-prem eligibility is harder than PIM makes cloud look.
gMSA vs dMSA. gMSA is the established, Lindy answer for managed service accounts. dMSA (delegated managed service accounts, introduced with the Windows Server 2025 generation) targets the real gap — migrating a standing service account and disabling the original — but newer mechanisms carry newer attack surface, and there has been published privilege-escalation research against the dMSA migration path. Verify current patch and hardening guidance before you recommend dMSA; this is exactly the kind of new-and-shiny that Book I principle 8 warns about. gMSA until you've checked dMSA's current state.
Enterprise Access Model vs the classic three-tier model — same logic, evolving names and cloud extensions. Use whichever vocabulary the client knows; don't get religious about the label.

If a client's safety hinges on a current specific, look it up and cite it. "I need to verify the current Graph permission classification" beats confidently quoting a stale one. That posture is the independence this handbook is trying to build.

Consolidated judgement prompts

How many identities hold standing privilege — human, service account, and service principal — counted, named, and owned? (If you can't produce the number, that's finding #1.)
For each privileged credential: leaked tonight, how long is it useful and how far does it reach? Where's the wall?
Where are the tier violations? Which high-tier credentials touch low-tier systems? Does any admin's daily laptop reach Tier 0?
Which service accounts are Kerberoastable? Which app registrations hold escalation-grade Graph permissions or non-expiring secrets?
Are cloud admins cloud-only and phishing-resistant, or synced and push-MFA'd? (Book II firebreak — verify it's actually enforced here.)
Does privilege evaporate when idle (PIM/JIT) or sit loaded on the table?
Is ADCS treated as Tier 0? When was KRBTGT last rotated? Is LAPS deployed?
Break-glass: does it exist, is it monitored to scream on use, and when was it last tested — not created, tested?
How many paths to Domain Admin / Global Admin exist right now, and is that number going up or down?

Book III of the Antifragile Handbook. Privilege is blast radius with a clock on it. Shrink the reach, shrink the time, and watch the credentials that can rebuild the world. Move fast and fix things.

15 KiB Raw Blame History