Only queue matrix-goofys.service for restart when Synapse is enabled. Goofys is installed from the Synapse role, so non-Synapse homeserver configurations should not try to restart this unit. This mirrors the fix for issue https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/4959.
Only queue matrix-synapse-s3-storage-provider-migrate.timer for restart when Synapse is actually enabled. This prevents setup/install failures when a Synapse-only extension flag is set while using another homeserver implementation, as reported in https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/4959.
The startup issue came from a timing dependency around coturn TLS certs:
- `matrix-coturn.service` depends on
`matrix-traefik-certs-dumper-wait-for-domain@<matrix-fqdn>.service`
- That waiter succeeds only after Traefik has obtained and dumped a cert for
the Matrix hostname (typically driven by homeserver labels/routes becoming
active)
- If coturn is started too early, it can block/fail waiting for cert files
that are not yet present
Historically, coturn priority was mode-dependent:
- `one-by-one`: coturn at 1500 (delayed after homeserver)
- other modes: coturn at 900 (before homeserver)
This could still trigger undesirable startup ordering and confusing behavior
in non-`one-by-one` modes, especially during initial bootstrap/restart flows
where cert availability lags service startup.
This change makes ordering explicit and consistent:
1. Introduce `matrix_homeserver_systemd_service_manager_priority` (default 1000)
in `roles/custom/matrix-base/defaults/main.yml`.
2. Use that variable for the homeserver service entry in
`group_vars/matrix_servers`.
3. Set coturn priority relative to homeserver priority in all modes:
`matrix_homeserver_systemd_service_manager_priority + 500`.
4. Update inline documentation comments in `group_vars/matrix_servers` to
match the new behavior and rationale.
Result:
- Homeserver/coturn ordering is deterministic and mode-agnostic.
- Coturn is intentionally started later than the homeserver by default,
reducing first-start certificate wait/fail races.
- Priority intent is now centralized and configurable via a dedicated
homeserver priority variable.
- Coturn may still be stated earlier, because the homeserver typically
has a `Wants` "dependency" on it, but that's alright
These three roles have multiple variable prefixes each:
- kakaotalk: matrix_appservice_kakaotalk + matrix_appservice_kakaotalk_node
- telegram: matrix_mautrix_telegram + matrix_mautrix_telegram_lottieconverter
- synapse: matrix_synapse + matrix_synapse_customized + matrix_synapse_rust_synapse_compress_state
For each: renamed _docker_image* to _container_image* (and _docker_src*,
_docker_repo* where applicable), added deprecation entries in
validate_config.yml, updated group_vars references, and moved
deprecation tasks to the front of validate_config.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add matrix_continuwuity_version with container_image_tag inheriting from it.
Rename all _docker_image* variables to _container_image* with deprecation notices.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add change-tracking and restart_necessary computation for:
- matrix-authentication-service (custom role in this repo)
- container-socket-proxy, traefik-certs-dumper, postgres, exim-relay,
cinny, livekit-server (external roles, bumped in requirements.yml)
Wire all 7 services in group_vars to use their _restart_necessary variable
instead of hardcoded true.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Track config/image/systemd changes via register: directives and compute
a _restart_necessary variable for each service role, allowing the
systemd_service_manager to skip unnecessary restarts during install-* runs.
Covers 22 service roles: alertmanager-receiver, appservice-draupnir-for-all,
bridge-mautrix-wsproxy (+ syncproxy), cactus-comments, cactus-comments-client,
corporal, element-admin, ldap-registration-proxy, livekit-jwt-service, matrixto,
pantalaimon, prometheus-nginxlog-exporter, rageshake, registration, static-files,
sygnal, synapse-admin, synapse-auto-compressor, synapse-reverse-proxy-companion,
synapse-usage-exporter, and user-verification-service.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
For each of the 34 roles (3 clients, 9 bots, 22 bridges), this commit:
- Adds `_restart_necessary: false` default variable
- Adds `register:` directives to config/image/systemd tasks
- Computes `_restart_necessary` via set_fact (OR of all .changed results)
- Wires `(_restart_necessary | bool)` in group_vars/matrix_servers
This allows the systemd service manager to skip unnecessary restarts
when running install-* tags and nothing actually changed.
Service roles and complex multi-service roles will follow separately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Override devture_systemd_service_manager_conditional_restart_enabled in
group_vars based on ansible_run_tags: disabled when setup-* tags are used,
enabled otherwise. This replaces the --extra-vars hack in the justfile and
ensures consistent behavior for both `just` and raw `ansible-playbook` users.
- Revert justfile setup-all to its original form (no --extra-vars needed).
- Update docs/just.md to reflect tag-agnostic behavior.
- Add CHANGELOG.md entry documenting the conditional restart feature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Traefik's service list entry now uses the `traefik_restart_necessary`
variable (computed by the Traefik role) instead of hardcoded `true`,
so it is only restarted when its config, systemd unit, or image changed.
- `just setup-all` now passes
`devture_systemd_service_manager_conditional_restart_enabled=false`
to force unconditional restarts, matching its "full setup" semantics.
- Document the conditional restart behavior in docs/just.md.
Some benchmarks follow for `just install-service traefik -l matrix.example.com`
when Traefik settings did not change and a restart is not really necessary:
- Before:
- total time: 56 seconds 🐌
- Traefik restarted: yes ❌
- Services that depend on Traefik restarted: yes; all of them restarted ❌
- After:
- total time: 27 seconds ⚡
- Traefik restarted: no ✅
- Services that depend on Traefik restarted: no; none restarted ✅
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After Synapse's systemd health check passes, Traefik still needs
providers.providersThrottleDuration to register routes. Derive the
post-start delay from this setting (+1s for healthcheck polling gap)
instead of using a hardcoded value. Defaults to 0 when no Traefik
reverse proxy is used.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When both services restart simultaneously (e.g. in all-at-once mode),
Traefik may momentarily truncate or reinitialize acme.json, causing
the certs dumper to read an empty file and panic. By adding
Requires/After on the Traefik service, the certs dumper only starts
after Traefik is fully ready and acme.json is stable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addons typically access the homeserver via Traefik, but requests
ultimately lead to the homeserver and it'd better be up or Traefik would
serve a "404 Not Found" error.
This is an attempt (one of many pieces) to make services more reliable,
especially when `devture_systemd_service_manager_service_restart_mode: all-at-once` is used
(which is the default).
- `install-service` no longer forces `one-by-one` restart mode
- the coturn priority condition is flipped: only `one-by-one` mode
needs the delayed priority (1500); all other modes (including
the new `all-at-once` default) use the normal priority (900)
Ref:
- d42cd92045
- f3e658cca3/docs/restart-mode-comparison.md
- 36445fb419
- 750cb7e29e