fix(api): bound Celery worker concurrency to a configurable default#11075
Open
b-abderrahmane wants to merge 1 commit intoprowler-cloud:masterfrom
Open
fix(api): bound Celery worker concurrency to a configurable default#11075b-abderrahmane wants to merge 1 commit intoprowler-cloud:masterfrom
b-abderrahmane wants to merge 1 commit intoprowler-cloud:masterfrom
Conversation
Without an explicit `worker_concurrency` setting, Celery falls back to
`os.cpu_count()` of the host. On Kubernetes nodes with many CPUs (16+)
the worker container spawns one prefork child per CPU, each loading the
full SDK working set, and routinely OOMKills under the typical 4 GiB
limit even when idle.
Add `CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)`
in `config/settings/celery.py`. The Celery app already loads Django
settings via `config_from_object("django.conf:settings", namespace="CELERY")`,
so the setting flows through to `app.conf.worker_concurrency` and the
prefork pool size with no entrypoint changes. Default of 4 matches the
per-process SDK memory footprint in typical deployments; operators size
up via `DJANGO_CELERY_WORKER_CONCURRENCY`.
Closes prowler-cloud#10968
Contributor
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Fix #10968.
Without an explicit
worker_concurrencysetting, Celery falls back toos.cpu_count()of the host. On Kubernetes nodes with many CPUs (16+) the worker container spawns one prefork child per CPU, each loading the full SDK working set, and routinely OOMKills under typical 4 GiB pod limits even when idle. The existing--max-tasks-per-child 1recycles each child after one task — it bounds per-task leaks but not the number of concurrent slots.Description
Adds
CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)inconfig/settings/celery.py. The Celery app already loads Django settings viaconfig_from_object("django.conf:settings", namespace="CELERY"), so the setting flows through toapp.conf.worker_concurrencyand the prefork pool size with no entrypoint changes.A default of
4matches the per-process SDK memory footprint observed in typical deployments; operators size up viaDJANGO_CELERY_WORKER_CONCURRENCY(also added toapi/.env.examplenext to the siblingDJANGO_CELERY_DEADLOCK_ATTEMPTS).Changes:
api/src/backend/config/settings/celery.py— one Django setting, matching the existingenv.int("DJANGO_*", default=...)convention used byDJANGO_CELERY_DEADLOCK_ATTEMPTS.api/.env.example— document the new env var alongside its siblings.api/CHANGELOG.md— entry under[1.27.0] (Prowler UNRELEASED)→🐞 Fixed.Steps to review
CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5)(existing)CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)(this PR)Evidence
Validated end-to-end on a 20-core AKS node with the worker image rebuilt from this branch (
prowler-api, dockerized fromapi/). Worker Deployment patched to use the test image; same pod spec (requests.memory=1Gi,limits.memory=4Gi).Before vs after
4)settings.CELERY_WORKER_CONCURRENCY4app.conf.worker_concurrencyos.cpu_count()=204204~3.4 GiB~880 MiB4and8(OOMKilled)0Process tree (after) — 1 parent + 4 ForkPool children:
Wiring proof — Django setting → Celery config:
Override demonstration —
DJANGO_CELERY_WORKER_CONCURRENCY=2propagates through:Process count goes to 1 parent + 2 children (with brief overlaps from
--max-tasks-per-child 1recycling, expected).Checklist
env.int()(the same pattern already exercised forDJANGO_CELERY_DEADLOCK_ATTEMPTSwith no dedicated test inapi/src/backend/api/tests/test_celery_settings.py); validated in a real cluster as documented above.[1.27.0]→🐞 Fixed.API
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.