The companion role was tightly coupled to Synapse through shared tags, worker routing, and lifecycle ordering. Keeping them separate added coordination overhead without practical benefits, especially for parallelized execution.
This merges the role into matrix-synapse while keeping companion logic organized under dedicated reverse_proxy_companion task/template subdirectories.
Compatibility is preserved:
- matrix_synapse_reverse_proxy_companion_* variable names remain unchanged
- install/setup companion-specific tags remain available
Cross-role/global wiring is now in group_vars (matrix-synapse section), while role defaults provide sensible standalone defaults and self-wiring for Synapse-owned values.
Only queue matrix-goofys.service for restart when Synapse is enabled. Goofys is installed from the Synapse role, so non-Synapse homeserver configurations should not try to restart this unit. This mirrors the fix for issue https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/4959.
Only queue matrix-synapse-s3-storage-provider-migrate.timer for restart when Synapse is actually enabled. This prevents setup/install failures when a Synapse-only extension flag is set while using another homeserver implementation, as reported in https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/4959.
The startup issue came from a timing dependency around coturn TLS certs:
- `matrix-coturn.service` depends on
`matrix-traefik-certs-dumper-wait-for-domain@<matrix-fqdn>.service`
- That waiter succeeds only after Traefik has obtained and dumped a cert for
the Matrix hostname (typically driven by homeserver labels/routes becoming
active)
- If coturn is started too early, it can block/fail waiting for cert files
that are not yet present
Historically, coturn priority was mode-dependent:
- `one-by-one`: coturn at 1500 (delayed after homeserver)
- other modes: coturn at 900 (before homeserver)
This could still trigger undesirable startup ordering and confusing behavior
in non-`one-by-one` modes, especially during initial bootstrap/restart flows
where cert availability lags service startup.
This change makes ordering explicit and consistent:
1. Introduce `matrix_homeserver_systemd_service_manager_priority` (default 1000)
in `roles/custom/matrix-base/defaults/main.yml`.
2. Use that variable for the homeserver service entry in
`group_vars/matrix_servers`.
3. Set coturn priority relative to homeserver priority in all modes:
`matrix_homeserver_systemd_service_manager_priority + 500`.
4. Update inline documentation comments in `group_vars/matrix_servers` to
match the new behavior and rationale.
Result:
- Homeserver/coturn ordering is deterministic and mode-agnostic.
- Coturn is intentionally started later than the homeserver by default,
reducing first-start certificate wait/fail races.
- Priority intent is now centralized and configurable via a dedicated
homeserver priority variable.
- Coturn may still be stated earlier, because the homeserver typically
has a `Wants` "dependency" on it, but that's alright
These three roles have multiple variable prefixes each:
- kakaotalk: matrix_appservice_kakaotalk + matrix_appservice_kakaotalk_node
- telegram: matrix_mautrix_telegram + matrix_mautrix_telegram_lottieconverter
- synapse: matrix_synapse + matrix_synapse_customized + matrix_synapse_rust_synapse_compress_state
For each: renamed _docker_image* to _container_image* (and _docker_src*,
_docker_repo* where applicable), added deprecation entries in
validate_config.yml, updated group_vars references, and moved
deprecation tasks to the front of validate_config.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add matrix_continuwuity_version with container_image_tag inheriting from it.
Rename all _docker_image* variables to _container_image* with deprecation notices.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add change-tracking and restart_necessary computation for:
- matrix-authentication-service (custom role in this repo)
- container-socket-proxy, traefik-certs-dumper, postgres, exim-relay,
cinny, livekit-server (external roles, bumped in requirements.yml)
Wire all 7 services in group_vars to use their _restart_necessary variable
instead of hardcoded true.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Track config/image/systemd changes via register: directives and compute
a _restart_necessary variable for each service role, allowing the
systemd_service_manager to skip unnecessary restarts during install-* runs.
Covers 22 service roles: alertmanager-receiver, appservice-draupnir-for-all,
bridge-mautrix-wsproxy (+ syncproxy), cactus-comments, cactus-comments-client,
corporal, element-admin, ldap-registration-proxy, livekit-jwt-service, matrixto,
pantalaimon, prometheus-nginxlog-exporter, rageshake, registration, static-files,
sygnal, synapse-admin, synapse-auto-compressor, synapse-reverse-proxy-companion,
synapse-usage-exporter, and user-verification-service.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
For each of the 34 roles (3 clients, 9 bots, 22 bridges), this commit:
- Adds `_restart_necessary: false` default variable
- Adds `register:` directives to config/image/systemd tasks
- Computes `_restart_necessary` via set_fact (OR of all .changed results)
- Wires `(_restart_necessary | bool)` in group_vars/matrix_servers
This allows the systemd service manager to skip unnecessary restarts
when running install-* tags and nothing actually changed.
Service roles and complex multi-service roles will follow separately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Override devture_systemd_service_manager_conditional_restart_enabled in
group_vars based on ansible_run_tags: disabled when setup-* tags are used,
enabled otherwise. This replaces the --extra-vars hack in the justfile and
ensures consistent behavior for both `just` and raw `ansible-playbook` users.
- Revert justfile setup-all to its original form (no --extra-vars needed).
- Update docs/just.md to reflect tag-agnostic behavior.
- Add CHANGELOG.md entry documenting the conditional restart feature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Traefik's service list entry now uses the `traefik_restart_necessary`
variable (computed by the Traefik role) instead of hardcoded `true`,
so it is only restarted when its config, systemd unit, or image changed.
- `just setup-all` now passes
`devture_systemd_service_manager_conditional_restart_enabled=false`
to force unconditional restarts, matching its "full setup" semantics.
- Document the conditional restart behavior in docs/just.md.
Some benchmarks follow for `just install-service traefik -l matrix.example.com`
when Traefik settings did not change and a restart is not really necessary:
- Before:
- total time: 56 seconds 🐌
- Traefik restarted: yes ❌
- Services that depend on Traefik restarted: yes; all of them restarted ❌
- After:
- total time: 27 seconds ⚡
- Traefik restarted: no ✅
- Services that depend on Traefik restarted: no; none restarted ✅
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After Synapse's systemd health check passes, Traefik still needs
providers.providersThrottleDuration to register routes. Derive the
post-start delay from this setting (+1s for healthcheck polling gap)
instead of using a hardcoded value. Defaults to 0 when no Traefik
reverse proxy is used.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When both services restart simultaneously (e.g. in all-at-once mode),
Traefik may momentarily truncate or reinitialize acme.json, causing
the certs dumper to read an empty file and panic. By adding
Requires/After on the Traefik service, the certs dumper only starts
after Traefik is fully ready and acme.json is stable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addons typically access the homeserver via Traefik, but requests
ultimately lead to the homeserver and it'd better be up or Traefik would
serve a "404 Not Found" error.
This is an attempt (one of many pieces) to make services more reliable,
especially when `devture_systemd_service_manager_service_restart_mode: all-at-once` is used
(which is the default).
- `install-service` no longer forces `one-by-one` restart mode
- the coturn priority condition is flipped: only `one-by-one` mode
needs the delayed priority (1500); all other modes (including
the new `all-at-once` default) use the normal priority (900)
Ref:
- d42cd92045
- f3e658cca3/docs/restart-mode-comparison.md
- 36445fb419
- 750cb7e29e