Add tool calling support to `RLOOTrainer` by qgallouedec · Pull Request #5395 · huggingface/trl

qgallouedec · 2026-03-27T20:36:16Z

Mirrors the tool calling integration already present in GRPOTrainer:

RLOOTrainer.__init__ accepts a tools argument (list of sync/async callables)
_tool_call_loop and _get_tool_suffix_ids copied verbatim from GRPOTrainer
tool_mask (1 = model-generated, 0 = tool result) applied to completion mask in loss and KL computations
max_tool_calling_iterations moved from training section to generation section in both RLOOConfig and GRPOConfig
Two tests added to test_rloo_trainer.py: test_training_with_tools (sync + async variants) and test_training_with_malformed_tool_calls

The key RLOO-specific difference: logprobs are discarded (_generate_single_turn returns None), so the logprobs is not None guards in _tool_call_loop are no-ops.

Note

Medium Risk
Adds multi-turn tool execution and new masking paths in RLOOTrainer generation/loss/KL calculations, which can affect training dynamics and sequence handling. Also introduces new runtime requirements (Transformers>=5 and jmespath) when tools are enabled.

Overview
Adds tool-calling support to RLOOTrainer. The trainer now accepts a tools list (sync or async callables), validates transformers>=5.0.0 and jmespath, and runs a multi-turn tool execution loop during generation (with optional max_tool_calling_iterations).

Tool-result tokens are tracked via a new tool_mask and are excluded from completion-length stats, logprob/KL calculations, and loss/entropy masking; new tool call/failure frequency metrics are logged. Config docs/fields move max_tool_calling_iterations into the generation section for both RLOOConfig and GRPOConfig, and tests add coverage for tool calling (including malformed calls) plus minor comments to reduce memory usage in several trainer tests.

^{Written by Cursor Bugbot for commit 2297245. This will update automatically on new commits. Configure here.}

…tributor validation

HuggingFaceDocBuilderDev · 2026-03-27T20:39:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26c4f9bbed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-27T20:41:20Z

+        generation_batch_size = args.per_device_train_batch_size * args.steps_per_generation
+        self._sync_tool_dicts = [{} for _ in range(generation_batch_size)]
+        self._async_tool_dicts = [{} for _ in range(generation_batch_size)]


Size tool lookup storage to active batch size

Tool lookup tables are preallocated using per_device_train_batch_size * steps_per_generation, but _tool_call_loop indexes them by the current batch position (idx_with_tool). During evaluation, per_device_eval_batch_size may legitimately be larger than the train generation batch, so a tool call in a higher index will hit IndexError at lookup. This makes tool-enabled eval fragile for valid configs; build per-call dicts or size storage by the active batch.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-02T19:52:00Z

+        # When we compute `suffix_ids` by slicing `full_ids`, we must align the slicing boundary to
+        # EOS (not EOS + newline).
+        last_eos_idx = max(i for i, tok_id in enumerate(prefix_ids) if tok_id == self.eos_token_id)
+        prefix_ids = prefix_ids[: last_eos_idx + 1]


Missing EOS guard crashes _get_tool_suffix_ids for some models

High Severity

_get_tool_suffix_ids uses max() on a generator that yields nothing when prefix_ids contains no EOS token, raising ValueError: max() arg is an empty sequence. The corresponding GRPOTrainer code safely builds a list and guards with if eos_positions: before trimming, with a comment explaining that "Templates that don't use EOS as end-of-turn (e.g. Gemma uses <turn|>) skip this trimming." This guard is missing in the RLOO copy, so any model whose chat template doesn't include the EOS token in turn boundaries will crash during tool calling.

^{Triggered by project rule: ../.ai/AGENTS.md}

cursor · 2026-04-02T19:52:00Z

+                            if name in sync_tool_dict:
+                                tool_call_results.append((name, sync_tool_dict[name](**function["arguments"])))
+                            elif name in async_tool_dict:
+                                async_coros.append((name, async_tool_dict[name](**function["arguments"])))


Missing string-arguments parsing in tool call loop

High Severity

_tool_call_loop passes function["arguments"] directly to **kwargs without checking if it's a string. GRPOTrainer's version (lines 1597–1616) includes argument normalization that handles models (e.g., Gemma) returning arguments as strings instead of dicts — including JSON parsing, brace-wrapping, and regex fallback. Without this, **function["arguments"] on a string raises TypeError, which gets silently caught as a tool failure, producing incorrect results.

^{Triggered by project rule: ../.ai/AGENTS.md}

qgallouedec added 3 commits March 27, 2026 19:48

Enhance PR template check to exclude reopened PRs from first-time con…

360f277

…tributor validation

reorder

8de5990

Add tool support to RLOOTrainer

26c4f9b

cursor bot reviewed Mar 27, 2026

View reviewed changes

Comment thread trl/trainer/rloo_trainer.py

chatgpt-codex-connector bot reviewed Mar 27, 2026

View reviewed changes

align

f92a9b2

cursor bot reviewed Mar 27, 2026

View reviewed changes

Comment thread trl/trainer/rloo_trainer.py Outdated

qgallouedec added 2 commits March 27, 2026 20:53

align

c17c0ab

consistency

f9c7fe0

cursor bot reviewed Mar 27, 2026

View reviewed changes

Comment thread tests/test_rloo_trainer.py Outdated

qgallouedec added 2 commits March 27, 2026 21:07

fix

8dd9c7a

rm logprobs

688bcaa

cursor bot reviewed Mar 27, 2026

View reviewed changes

Comment thread trl/trainer/rloo_trainer.py

qgallouedec added 2 commits March 27, 2026 15:27

Merge branch 'main' into rloo-tool-call

166d8fe

Merge branch 'main' into rloo-tool-call

2297245

cursor bot reviewed Apr 2, 2026

View reviewed changes

qgallouedec marked this pull request as draft April 7, 2026 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tool calling support to `RLOOTrainer`#5395

Add tool calling support to `RLOOTrainer`#5395
qgallouedec wants to merge 10 commits intomainfrom
rloo-tool-call

qgallouedec commented Mar 27, 2026 •

edited by cursor bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 2, 2026

Uh oh!

cursor bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qgallouedec commented Mar 27, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

Missing EOS guard crashes _get_tool_suffix_ids for some models

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

Missing string-arguments parsing in tool call loop

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Mar 27, 2026 •

edited by cursor bot

Loading

Missing EOS guard crashes `_get_tool_suffix_ids` for some models