Fixes #39277 - Fix TopbarSweeper thread-safety race condition under Puma#10969
Fixes #39277 - Fix TopbarSweeper thread-safety race condition under Puma#10969pablomh wants to merge 1 commit intotheforeman:developfrom
Conversation
f0b2351 to
e64332d
Compare
chris1984
left a comment
There was a problem hiding this comment.
Couple small comments, will let someone else look it over as well. @ShimShtein
TopbarSweeper uses a Singleton with a shared controller attribute across all Puma threads. Under concurrent requests, one thread ensure block nils the controller while another thread is still using it, causing NoMethodError on expire_fragment. Replace the Singleton with Thread.current storage, following the same pattern used by ThreadSession for User.current/Organization.current. Each thread gets its own controller reference - no cross-thread clobbering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e64332d to
1ccd744
Compare
chris1984
left a comment
There was a problem hiding this comment.
Thanks for adding the things I suggested and the test. Will let @ShimShtein give a final look.
|
Overall the PR looks good. If I understand correctly, you have moved the controller cache handling into each thread. |
|
The We initially thought this was only observable under extreme concurrency, but looking more closely at the data, that's not quite right. We found it during concurrent host registration testing on a 16-CPU Satellite. At the default Puma config (24 workers × 5 threads = 120 slots), the race triggered at 760+ concurrent hosts. When we increased to 16 workers × 12 threads = 192 slots, it triggered at just 456 concurrent — more threads per worker widens the race window. What's interesting is why this hasn't been seen before, given the Puma config hasn't changed. We think it's because our other registration performance patches (fact import reduction, Candlepin caching) made Puma significantly more efficient — requests complete faster, so threads cycle through the Regarding memory: comparing Puma RSS across multiple runs with and without this fix, we see a ~0.7 GB Puma RSS increase — but that's the combined effect of all patches in the batch (caching layers, connection pooling, this fix). The Thread.current change itself adds one pointer per thread that's nil between requests, so its contribution is negligible. |
TopbarSweeper uses
include Singletonwith a sharedcontrollerattributeacross all Puma threads within each worker. Under concurrent requests, one
thread ensure block nils the controller while another thread is actively
using it:
The existing
if controller.present?guard has a TOCTOU (time-of-checkto time-of-use) race: the check passes, then another thread nils the
value before
expire_fragmentis called.This bug is hard to reproduce under typical load because the Singleton is
per-process (each Puma worker has its own), so only the threads within a
single worker compete. With the default 5 threads per worker, the race
window is extremely narrow. It becomes observable under high concurrency
(many simultaneous registrations) or with increased threads per worker,
where more threads share the same Singleton and the interleaving becomes
likely. When it does trigger, it produces a single 500 error that is
easily dismissed as transient noise in production logs.
Discovered during concurrent host registration performance testing.
The error appears in production.log at
topbar_sweeper.rb:6.Changes
Replace the Singleton with
Thread.currentstorage, following the samepattern already used by
ThreadSessionforUser.current,Organization.current, andLocation.current. Each Puma thread getsits own controller reference - no cross-thread clobbering.
app/services/topbar_sweeper.rb- Removeinclude Singleton, storecontroller in
Thread.current[:topbar_sweeper_controller], use safenavigation (
&.) instead of TOCTOUif controller.present?guardapp/controllers/concerns/foreman/controller/topbar_sweeper.rb-.instance.controller->.controllerapp/models/concerns/topbar_cache_expiry.rb- sametest/unit/topbar_sweeper_test.rb- NEW: thread isolation and nilsafety tests
How to test
Prerequisites
Foreman environment with Puma in multi-threaded mode (default).
Steps
bundle exec rake test TEST=test/unit/topbar_sweeper_test.rbbundle exec rake test TEST=test/integration/top_bar_test.rbconcurrently and grep for
expire_fragmenterrors in production.log