fix(api): bound Celery worker concurrency to a configurable default by b-abderrahmane · Pull Request #11075 · prowler-cloud/prowler

b-abderrahmane · 2026-05-07T19:41:00Z

Context

Without an explicit worker_concurrency setting, Celery falls back to os.cpu_count() of the host. On Kubernetes nodes with many CPUs (16+) the worker container spawns one prefork child per CPU, each loading the full SDK working set, and routinely OOMKills under typical 4 GiB pod limits even when idle. The existing --max-tasks-per-child 1 recycles each child after one task — it bounds per-task leaks but not the number of concurrent slots.

Description

Adds CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4) in config/settings/celery.py. The Celery app already loads Django settings via config_from_object("django.conf:settings", namespace="CELERY"), so the setting flows through to app.conf.worker_concurrency and the prefork pool size with no entrypoint changes.

A default of 4 matches the per-process SDK memory footprint observed in typical deployments; operators size up via DJANGO_CELERY_WORKER_CONCURRENCY (also added to api/.env.example next to the sibling DJANGO_CELERY_DEADLOCK_ATTEMPTS).

Changes:

api/src/backend/config/settings/celery.py — one Django setting, matching the existing env.int("DJANGO_*", default=...) convention used by DJANGO_CELERY_DEADLOCK_ATTEMPTS.
api/.env.example — document the new env var alongside its siblings.
api/CHANGELOG.md — entry under [1.27.0] (Prowler UNRELEASED) → 🐞 Fixed.

Steps to review

Read the 1-line settings change and the changelog/env-example additions — three files, +7 lines.
Confirm the setting name/namespace is consistent with existing siblings:
- CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5) (existing)
- CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4) (this PR)
Optionally reproduce the validation below in any K8s deployment.

Evidence

Validated end-to-end on a 20-core AKS node with the worker image rebuilt from this branch (prowler-api, dockerized from api/). Worker Deployment patched to use the test image; same pod spec (requests.memory=1Gi, limits.memory=4Gi).

Before vs after

Metric	Before (master, no setting)	After (this PR, default `4`)
Django `settings.CELERY_WORKER_CONCURRENCY`	(unset → falls through)	`4`
`app.conf.worker_concurrency`	`os.cpu_count()` = `20`	`4`
Prefork children per pod	`20`	`4`
Idle resident memory	`~3.4 GiB`	`~880 MiB`
Restart count (over 26h soak)	`4` and `8` (OOMKilled)	`0`

Process tree (after) — 1 parent + 4 ForkPool children:

$ kubectl exec prowler-worker-7c56d9dd6d-4lcdt -- ls /proc/[0-9]*/comm | grep -E celery
pid=10  python -m celery -A config.celery worker -n c5...
pid=38  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-1
pid=39  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-2
pid=40  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-3
pid=41  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-4

Wiring proof — Django setting → Celery config:

$ kubectl exec prowler-worker-... -- manage.py shell -c "
    from django.conf import settings; from config.celery import celery_app
    print(settings.CELERY_WORKER_CONCURRENCY)
    print(celery_app.conf.worker_concurrency)
  "
4
4

Override demonstration — DJANGO_CELERY_WORKER_CONCURRENCY=2 propagates through:

$ kubectl set env deploy/prowler-worker DJANGO_CELERY_WORKER_CONCURRENCY=2
$ kubectl exec <new-pod> -- printenv DJANGO_CELERY_WORKER_CONCURRENCY
2
$ kubectl exec <new-pod> -- manage.py shell -c "..."
settings.CELERY_WORKER_CONCURRENCY=2
celery_app.conf.worker_concurrency=2
$ kubectl top pod <new-pod>
NAME                              CPU(cores)   MEMORY(bytes)
prowler-worker-5cb6bb6bc4-2b9q7   521m         564Mi          # was 3.4 GiB

Process count goes to 1 parent + 2 children (with brief overlaps from --max-tasks-per-child 1 recycling, expected).

Checklist

Review if the code is being covered by tests.
- The change is a single Django settings line wrapping env.int() (the same pattern already exercised for DJANGO_CELERY_DEADLOCK_ATTEMPTS with no dedicated test in api/src/backend/api/tests/test_celery_settings.py); validated in a real cluster as documented above.
Review if code is being documented following pyguide §3.8.
Review if backport is needed.
Review if is needed to change the Readme.md — N/A.
Ensure new entries are added to CHANGELOG.md — added under [1.27.0] → 🐞 Fixed.

API

All issue/task requirements work as expected on the API — see Evidence above.
Endpoint response output (if applicable) — N/A (worker-side change, no endpoints touched).
EXPLAIN ANALYZE output for new/modified queries or indexes (if applicable) — N/A.
Performance test results (if applicable) — included above (memory + process-count).
Any other relevant evidence of the implementation (if applicable) — included above.
Verify if API specs need to be regenerated — N/A.
Check if version updates are required — N/A (no dependency or spec changes).
Ensure new entries are added to CHANGELOG.md.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Without an explicit `worker_concurrency` setting, Celery falls back to `os.cpu_count()` of the host. On Kubernetes nodes with many CPUs (16+) the worker container spawns one prefork child per CPU, each loading the full SDK working set, and routinely OOMKills under the typical 4 GiB limit even when idle. Add `CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)` in `config/settings/celery.py`. The Celery app already loads Django settings via `config_from_object("django.conf:settings", namespace="CELERY")`, so the setting flows through to `app.conf.worker_concurrency` and the prefork pool size with no entrypoint changes. Default of 4 matches the per-process SDK memory footprint in typical deployments; operators size up via `DJANGO_CELERY_WORKER_CONCURRENCY`. Closes prowler-cloud#10968

github-actions · 2026-05-07T19:41:40Z

✅ Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

b-abderrahmane requested a review from a team as a code owner May 7, 2026 19:41

github-actions Bot added component/api community Opened by the Community labels May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): bound Celery worker concurrency to a configurable default#11075

fix(api): bound Celery worker concurrency to a configurable default#11075
b-abderrahmane wants to merge 1 commit intoprowler-cloud:masterfrom
b-abderrahmane:fix/celery-worker-concurrency-default

b-abderrahmane commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

b-abderrahmane commented May 7, 2026

Context

Description

Steps to review

Evidence

Checklist

API

License

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant