Skip to content

Use RTLD_NODELETE when loading the render engine plugin (backport #1280)#1288

Merged
iche033 merged 1 commit intogz-rendering8from
mergify/bp/gz-rendering8/pr-1280
Apr 30, 2026
Merged

Use RTLD_NODELETE when loading the render engine plugin (backport #1280)#1288
iche033 merged 1 commit intogz-rendering8from
mergify/bp/gz-rendering8/pr-1280

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented Apr 29, 2026

🦟 Bug fix

Fixes #1265

Summary

Repeated unloadEngine / engine cycles dlopen and dlclose the engine plugin. A transitive dependency in the plugin's chain (libgomp on Ubuntu Noble + rotary nightly packages, via libgz-common-graphicslibassimp and friends) uses thread-local storage. Without RTLD_NODELETE, each reload allocates from glibc's static-TLS surplus, which is not reliably reclaimed on dlclose. After ~10 cycles the surplus is exhausted and the next dlopen fails with:

Error while loading the library [/usr/local/lib/libgz-rendering-ogre2.so]: /lib/x86_64-linux-gnu/libgomp.so.1: cannot allocate memory in static TLS block

This regression appeared on main after #1246 switched CI to the rotary alias packages, which (via the gz-common rebuild that swapped FreeImage for vendored STB in gz-common#803) no longer transitively pull libfreeimagelibrawlibgomp into the test binary's startup DT_NEEDED. The static-TLS exhaustion was latent before that change — libfreeimage's dep chain was anchoring libgomp in the main binary's TLS region for free.

The fix passes _noDelete=true to gz::plugin::Loader::LoadLib, which gates the RTLD_NODELETE flag that's already supported by gz-plugin's loader. With this flag, dlclose keeps the library mapped, finalizers don't run, and
TLS slots aren't released. TLS is allocated once on first load and reused on every subsequent reload, eliminating the surplus leak regardless of which library in the plugin's chain is the TLS hog.

Reproduction

REGRESSION_reload_engine_ogre2_gl3plus reproduces the failure 100% of the time inside a fresh ubuntu:noble docker container with gzdev repository enable --project=rotary (matching what gazebo-tooling/action-gz-ci@noble does). Before this PR: 5 cases fail with the static-TLS error. After this PR: 8/8 cases pass.

Trade-off

The engine plugin and its transitive dependencies remain mapped for the lifetime of the process. For a rendering engine this is effectively the process's lifetime anyway, Ogre::Root is created and destroyed by Ogre2RenderEngine at the C++ level, separately from library load/unload, so calling unloadEngine followed by engine() still produces a fresh Ogre::Root. What changes:

  • Static constructors in the plugin (and its deps) run once per process, not once per load.
  • Static destructors run only at process exit.
  • Memory grows monotonically; once mapped, never unmapped (~50 MB one-time).
  • Hot-reload from disk (recompile + reload without restart) would no longer work for engine plugins.

Alternatives considered

  • Linking -Wl,--no-as-needed -lgomp into the test executable (initial attempt): works but only patches the test, doesn't fix the underlying reload bug for downstream consumers, GCC-specific, and only handles libgomp.
  • GLIBC_TUNABLES=glibc.rtld.optional_static_tls=N: runtime-only, delays exhaustion rather than fixing it, and requires CI-environment plumbing.
  • Trimming the plugin's transitive NEEDED chain: the chain is largely load-bearing (the plugin uses ~80 symbols from libgz-common-graphics); not a productive direction.

RTLD_NODELETE is the only option that fixes the bug for every gz::rendering::engine() consumer and not just this one test.

Checklist

  • Signed all commits for DCO
  • Added a screen capture or video to the PR description that demonstrates the fix (as needed)
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage) — verified end-to-end on Ubuntu Noble x86_64 in a fresh ubuntu:noble docker container with gzdev rotary packages, the previously-failing REGRESSION_reload_engine_ogre2_gl3plus now passes (5.18 s, 8/8 cases).
  • Updated Bazel files (if adding new files). Created an issue otherwise.
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers
  • Was GenAI used to generate this PR? If so, make sure to add "Generated-by" to your commits. (See this policy for more info.)

Generated-by: Claude Code

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by and Generated-by messages.


This is an automatic backport of pull request #1280 done by Mergify.

Repeated unloadEngine/engine cycles dlopen and dlclose the engine
plugin.  Some transitive dependency (libgomp on Ubuntu Noble + rotary
nightly packages) uses thread-local storage; without RTLD_NODELETE,
each reload allocates from glibc's static-TLS surplus, which is not
reliably reclaimed on dlclose.  After ~10 cycles the surplus is
exhausted and the next dlopen fails with
  "cannot allocate memory in static TLS block"

Pass _noDelete=true to gz-plugin's Loader::LoadLib so dlclose keeps
the library mapped.  TLS is allocated once on first load and reused
on every subsequent reload, eliminating the leak.

Trade-off: the plugin and its transitive deps remain resident for
the lifetime of the process.  For a rendering engine this is the
expected lifetime anyway.

Fixes #1265

Generated-by: Claude Opus 4.7

Signed-off-by: Taylor Howard <taylorhoward@me.com>
(cherry picked from commit fb9dd4c)
@github-project-automation github-project-automation Bot moved this from Inbox to In review in Core development Apr 30, 2026
@iche033 iche033 merged commit ff0c5b8 into gz-rendering8 Apr 30, 2026
10 checks passed
@github-project-automation github-project-automation Bot moved this from In review to Done in Core development Apr 30, 2026
@iche033 iche033 deleted the mergify/bp/gz-rendering8/pr-1280 branch April 30, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🎵 harmonic Gazebo Harmonic

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants