Skip to content

moq-rtc: abort sessions that never receive ICE candidates#1951

Merged
kixelated merged 1 commit into
mainfrom
claude/webrtc-ice-establish-timeout
Jun 30, 2026
Merged

moq-rtc: abort sessions that never receive ICE candidates#1951
kixelated merged 1 commit into
mainfrom
claude/webrtc-ice-establish-timeout

Conversation

@kixelated

@kixelated kixelated commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Problem

str0m already ends a WebRTC connection whose candidate pairs were tried and failed: once every pair exhausts its STUN retransmits the ICE agent (is) transitions to IceConnectionState::Disconnected (~21s at the default StunTiming), which Session::run maps to SessionClosed.

What it does not end -- deliberately, for trickle ICE -- is a session that never received any remote candidate. evaluate_state forces any_still_possible = true whenever remote_candidates is empty ("more candidates may yet arrive"), so a peer that answers the SDP and then sends nothing -- an abandoned WHIP/WHEP offer, or a probe that only exercises the signalling contract -- sits in Checking forever. The task lives indefinitely holding its broadcast announcement + mux registration; accumulated stuck sessions peg the CPU. (str0m's own chat.rs/http-post.rs examples have the same gap -- they only react to Disconnected -- so the application is expected to bound this.)

Fix

Add a backstop: track whether ICE ever connected and bail with a new Error::IceTimeout if not within ICE_ESTABLISH_TIMEOUT. Set it (30s) above str0m's ~21s pair-exhaustion so a connection that actually started checks is still ended by str0m's native Disconnected path (and a slow-but-real TURN/lossy peer isn't clipped) -- this only fires for the never-any-candidate case. Clamp the select sleep to the deadline so the check fires even when str0m scheduled a far-off timeout. Logged as ordinary churn (debug).

Verification

cargo build/clippy/test -p moq-rtc clean. Against a live dev edge: six abandoned WHIP offers + three WHEP offers were each torn down with session ended: ICE never connected ~the deadline after creation; before the change they never ended.

🤖 Generated with Claude Code

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @kixelated, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Error gains a new IceTimeout variant. Session::run now tracks a start time and connection state, applies a 30-second ICE establishment deadline, returns IceTimeout if ICE stays unconnected past the deadline, and limits waiting time while still connecting. IceConnectionStateChange updates the connection flag, and log_session_end logs IceTimeout at debug level.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly reflects the main change: timing out sessions that never complete ICE connection.
Description check ✅ Passed The description is directly related to the session timeout fix and its verification details.
✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/webrtc-ice-establish-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rs/moq-rtc/src/error.rs`:
- Around line 17-18: The new public error variant IceTimeout in the Error enum
is missing rustdoc, so it won’t appear in the public API docs. Add a doc comment
directly above Error::IceTimeout in error.rs, describing when this variant is
returned and keeping it aligned with the existing #[error(...)] display text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8ac1d962-b7bd-4ea8-98ad-f7fefe70b16e

📥 Commits

Reviewing files that changed from the base of the PR and between 370dfd0 and 3591445.

📒 Files selected for processing (2)
  • rs/moq-rtc/src/error.rs
  • rs/moq-rtc/src/session.rs

Comment thread rs/moq-rtc/src/error.rs
Comment on lines +17 to +18
#[error("ICE did not connect before the establishment deadline")]
IceTimeout,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add rustdoc for the new public error variant.

#[error(...)] controls Display, but it doesn't document Error::IceTimeout in the public API docs.

📝 Proposed fix
+	/// ICE did not reach a connected state before the establishment deadline.
 	#[error("ICE did not connect before the establishment deadline")]
 	IceTimeout,

As per coding guidelines, "**/*.{rs,ts,tsx,js,jsx}: Document every exported public API symbol: each exported Rust item and each exported JS/TS symbol, including notable public members, must have a doc comment."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[error("ICE did not connect before the establishment deadline")]
IceTimeout,
/// ICE did not reach a connected state before the establishment deadline.
#[error("ICE did not connect before the establishment deadline")]
IceTimeout,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-rtc/src/error.rs` around lines 17 - 18, The new public error variant
IceTimeout in the Error enum is missing rustdoc, so it won’t appear in the
public API docs. Add a doc comment directly above Error::IceTimeout in error.rs,
describing when this variant is returned and keeping it aligned with the
existing #[error(...)] display text.

Source: Coding guidelines

str0m ends a WebRTC connection whose candidate pairs were tried and
failed: once every pair exhausts its STUN retransmits the ICE agent
(the `is` crate) transitions to `IceConnectionState::Disconnected`
(~21s at the default StunTiming), which `Session::run` already maps to
`SessionClosed`.

What it does NOT end is a session that never received any remote
candidate. `evaluate_state` forces `any_still_possible = true` whenever
`remote_candidates` is empty (trickle ICE: more candidates may yet
arrive), so a peer that answers the SDP and then sends nothing -- an
abandoned WHIP/WHEP offer, or a probe that only exercises the signalling
contract -- sits in `Checking` forever. The task lives indefinitely
holding its broadcast announcement and mux registration; accumulated
stuck sessions are what pegged the dev gateway's CPU. str0m's own
examples have the same gap (they only react to `Disconnected`), so the
application is expected to bound this.

Add a backstop: track whether ICE ever connected and bail with a new
`Error::IceTimeout` if not within ICE_ESTABLISH_TIMEOUT. Set it (30s)
ABOVE str0m's ~21s pair-exhaustion so a connection that actually started
checks is still ended by str0m's native `Disconnected` path (and a
slow-but-real TURN/lossy peer isn't clipped); this only fires for the
never-any-candidate case. Clamp the select sleep to the deadline so the
check fires even when str0m scheduled a far-off timeout. Logged as
ordinary churn (debug), not a warning.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kixelated kixelated force-pushed the claude/webrtc-ice-establish-timeout branch from 3591445 to ba22678 Compare June 30, 2026 02:26
@kixelated kixelated changed the title moq-rtc: abort sessions that never finish connecting moq-rtc: abort sessions that never receive ICE candidates Jun 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rs/moq-rtc/src/session.rs (1)

174-203: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Move the ICE deadline check after draining RTC output. rs/moq-rtc/src/session.rs:174-203 A packet processed right before the deadline can queue IceConnectionStateChange(Connected), but this branch returns IceTimeout before poll_output() surfaces it. Reorder the deadline check to run after pending Transmit/Event output is drained to Output::Timeout.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-rtc/src/session.rs` around lines 174 - 203, Move the ICE timeout check
in session::Session handling so pending RTC output is drained first. In the loop
around self.rtc.poll_output(), process any Output::Transmit and Output::Event
values (including IceConnectionStateChange) before returning Error::IceTimeout,
then apply the ICE_ESTABLISH_TIMEOUT check only after Output::Timeout is reached
so a late Connected event is not missed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@rs/moq-rtc/src/session.rs`:
- Around line 174-203: Move the ICE timeout check in session::Session handling
so pending RTC output is drained first. In the loop around
self.rtc.poll_output(), process any Output::Transmit and Output::Event values
(including IceConnectionStateChange) before returning Error::IceTimeout, then
apply the ICE_ESTABLISH_TIMEOUT check only after Output::Timeout is reached so a
late Connected event is not missed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2673968f-113b-4b93-83c9-c9ef49bfd88e

📥 Commits

Reviewing files that changed from the base of the PR and between 3591445 and ba22678.

📒 Files selected for processing (2)
  • rs/moq-rtc/src/error.rs
  • rs/moq-rtc/src/session.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • rs/moq-rtc/src/error.rs

@kixelated kixelated merged commit a34b7c3 into main Jun 30, 2026
2 checks passed
@kixelated kixelated deleted the claude/webrtc-ice-establish-timeout branch June 30, 2026 02:46
@moq-bot moq-bot Bot mentioned this pull request Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant