mirror of
https://github.com/spantaleev/matrix-docker-ansible-deploy.git
synced 2026-02-28 09:53:09 +00:00
Align homeserver/coturn service priorities to avoid first-start cert race
The startup issue came from a timing dependency around coturn TLS certs: - `matrix-coturn.service` depends on `matrix-traefik-certs-dumper-wait-for-domain@<matrix-fqdn>.service` - That waiter succeeds only after Traefik has obtained and dumped a cert for the Matrix hostname (typically driven by homeserver labels/routes becoming active) - If coturn is started too early, it can block/fail waiting for cert files that are not yet present Historically, coturn priority was mode-dependent: - `one-by-one`: coturn at 1500 (delayed after homeserver) - other modes: coturn at 900 (before homeserver) This could still trigger undesirable startup ordering and confusing behavior in non-`one-by-one` modes, especially during initial bootstrap/restart flows where cert availability lags service startup. This change makes ordering explicit and consistent: 1. Introduce `matrix_homeserver_systemd_service_manager_priority` (default 1000) in `roles/custom/matrix-base/defaults/main.yml`. 2. Use that variable for the homeserver service entry in `group_vars/matrix_servers`. 3. Set coturn priority relative to homeserver priority in all modes: `matrix_homeserver_systemd_service_manager_priority + 500`. 4. Update inline documentation comments in `group_vars/matrix_servers` to match the new behavior and rationale. Result: - Homeserver/coturn ordering is deterministic and mode-agnostic. - Coturn is intentionally started later than the homeserver by default, reducing first-start certificate wait/fail races. - Priority intent is now centralized and configurable via a dedicated homeserver priority variable. - Coturn may still be stated earlier, because the homeserver typically has a `Wants` "dependency" on it, but that's alright
This commit is contained in:
@@ -246,15 +246,14 @@ matrix_addons_homeserver_systemd_services_list: |
|
|||||||
# - so that addon services (starting later) can communicte with the homeserver via Traefik's internal entrypoint
|
# - so that addon services (starting later) can communicte with the homeserver via Traefik's internal entrypoint
|
||||||
# (see `matrix_playbook_internal_matrix_client_api_traefik_entrypoint_enabled`)
|
# (see `matrix_playbook_internal_matrix_client_api_traefik_entrypoint_enabled`)
|
||||||
# - core services (the homeserver) get a level of ~1000
|
# - core services (the homeserver) get a level of ~1000
|
||||||
# - services that the homeserver depends on (database, Redis, ntfy, coturn, etc.) get a lower level — between 500 and 1000
|
# - services that the homeserver depends on (database, Redis, ntfy, etc.) get a lower level — between 500 and 1000
|
||||||
# - coturn gets a higher priority level (= starts later) if `devture_systemd_service_manager_service_restart_mode == 'one-by-one'` to intentionally delay it, because:
|
# - coturn gets a higher priority level (= starts later) in all cases, to intentionally delay it in relation to the homeserver, because:
|
||||||
# - starting services one by one means that the service manager role waits for each service to fully start before proceeding to the next one
|
# - when starting services one by one, the service manager waits for each service to fully start before proceeding to the next one
|
||||||
# - if coturn has a lower priority than the homeserver, it would be started before it
|
# - if coturn has a lower priority than the homeserver, it would be started before it
|
||||||
# - since coturn is started before the homeserver, there's no container label telling Traefik to get a `matrix.example.com` certificate
|
# - if coturn is started before the homeserver, there'd be no container label (usually on the homeserver) telling Traefik to get a `matrix.example.com` certificate
|
||||||
# - thus, coturn would spin and wait for a certificate until it fails. We'd get a playbook failure due to it, but service manager will proceed to start all other services anyway.
|
# - thus, coturn would spin and wait for a certificate until it fails. We'd get a playbook failure due to it, but service manager will proceed to start all other services anyway.
|
||||||
# - only later, when the homeserver actually starts, would that certificate be fetched and dumped
|
# - only later, when the homeserver actually starts, would that certificate be fetched and dumped
|
||||||
# - this is not a problem with `all-at-once` (default) or `priority-batched` (services start concurrently),
|
# - this is a problem for `one-by-one`, `clean-stop-start` (which behaves like one-by-one initially) and possibly other modes, except `all-at-once`
|
||||||
# or with `clean-stop-start` (everything stops first, then starts in priority order — coturn at 900 is fine)
|
|
||||||
# - reverse-proxying services get level 3000
|
# - reverse-proxying services get level 3000
|
||||||
# - Matrix utility services (bridges, bots) get a level of 2000/2200, so that:
|
# - Matrix utility services (bridges, bots) get a level of 2000/2200, so that:
|
||||||
# - they can start before the reverse-proxy
|
# - they can start before the reverse-proxy
|
||||||
@@ -607,7 +606,7 @@ devture_systemd_service_manager_services_list_auto: |
|
|||||||
+
|
+
|
||||||
([{
|
([{
|
||||||
'name': ('matrix-' + matrix_homeserver_implementation + '.service'),
|
'name': ('matrix-' + matrix_homeserver_implementation + '.service'),
|
||||||
'priority': 1000,
|
'priority': matrix_homeserver_systemd_service_manager_priority,
|
||||||
'restart_necessary': true,
|
'restart_necessary': true,
|
||||||
'groups': ['matrix', 'homeservers', matrix_homeserver_implementation],
|
'groups': ['matrix', 'homeservers', matrix_homeserver_implementation],
|
||||||
}] if matrix_homeserver_enabled else [])
|
}] if matrix_homeserver_enabled else [])
|
||||||
@@ -635,7 +634,7 @@ devture_systemd_service_manager_services_list_auto: |
|
|||||||
+
|
+
|
||||||
([{
|
([{
|
||||||
'name': (coturn_identifier + '.service'),
|
'name': (coturn_identifier + '.service'),
|
||||||
'priority': (1500 if devture_systemd_service_manager_service_restart_mode == 'one-by-one' else 900),
|
'priority': (matrix_homeserver_systemd_service_manager_priority + 500),
|
||||||
'restart_necessary': (coturn_restart_necessary | bool),
|
'restart_necessary': (coturn_restart_necessary | bool),
|
||||||
'groups': ['matrix', 'coturn'],
|
'groups': ['matrix', 'coturn'],
|
||||||
}] if coturn_enabled else [])
|
}] if coturn_enabled else [])
|
||||||
|
|||||||
@@ -92,6 +92,10 @@ matrix_homeserver_enabled: true
|
|||||||
# Note that the homeserver implementation of a server will not be able to be changed without data loss.
|
# Note that the homeserver implementation of a server will not be able to be changed without data loss.
|
||||||
matrix_homeserver_implementation: synapse
|
matrix_homeserver_implementation: synapse
|
||||||
|
|
||||||
|
# The priority that the homeserver starts with (lower = starts earlier).
|
||||||
|
# Related to the systemd_service_manager role and `devture_systemd_service_manager_services_list*` variables.
|
||||||
|
matrix_homeserver_systemd_service_manager_priority: 1000
|
||||||
|
|
||||||
# This contains a secret, which is used for generating various other secrets later on.
|
# This contains a secret, which is used for generating various other secrets later on.
|
||||||
matrix_homeserver_generic_secret_key: ''
|
matrix_homeserver_generic_secret_key: ''
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user