moq-rtc: abort sessions that never receive ICE candidates#1951
Conversation
There was a problem hiding this comment.
Sorry @kixelated, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
Walkthrough
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rs/moq-rtc/src/error.rs`:
- Around line 17-18: The new public error variant IceTimeout in the Error enum
is missing rustdoc, so it won’t appear in the public API docs. Add a doc comment
directly above Error::IceTimeout in error.rs, describing when this variant is
returned and keeping it aligned with the existing #[error(...)] display text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8ac1d962-b7bd-4ea8-98ad-f7fefe70b16e
📒 Files selected for processing (2)
rs/moq-rtc/src/error.rsrs/moq-rtc/src/session.rs
| #[error("ICE did not connect before the establishment deadline")] | ||
| IceTimeout, |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Add rustdoc for the new public error variant.
#[error(...)] controls Display, but it doesn't document Error::IceTimeout in the public API docs.
📝 Proposed fix
+ /// ICE did not reach a connected state before the establishment deadline.
#[error("ICE did not connect before the establishment deadline")]
IceTimeout,As per coding guidelines, "**/*.{rs,ts,tsx,js,jsx}: Document every exported public API symbol: each exported Rust item and each exported JS/TS symbol, including notable public members, must have a doc comment."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[error("ICE did not connect before the establishment deadline")] | |
| IceTimeout, | |
| /// ICE did not reach a connected state before the establishment deadline. | |
| #[error("ICE did not connect before the establishment deadline")] | |
| IceTimeout, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rs/moq-rtc/src/error.rs` around lines 17 - 18, The new public error variant
IceTimeout in the Error enum is missing rustdoc, so it won’t appear in the
public API docs. Add a doc comment directly above Error::IceTimeout in error.rs,
describing when this variant is returned and keeping it aligned with the
existing #[error(...)] display text.
Source: Coding guidelines
str0m ends a WebRTC connection whose candidate pairs were tried and failed: once every pair exhausts its STUN retransmits the ICE agent (the `is` crate) transitions to `IceConnectionState::Disconnected` (~21s at the default StunTiming), which `Session::run` already maps to `SessionClosed`. What it does NOT end is a session that never received any remote candidate. `evaluate_state` forces `any_still_possible = true` whenever `remote_candidates` is empty (trickle ICE: more candidates may yet arrive), so a peer that answers the SDP and then sends nothing -- an abandoned WHIP/WHEP offer, or a probe that only exercises the signalling contract -- sits in `Checking` forever. The task lives indefinitely holding its broadcast announcement and mux registration; accumulated stuck sessions are what pegged the dev gateway's CPU. str0m's own examples have the same gap (they only react to `Disconnected`), so the application is expected to bound this. Add a backstop: track whether ICE ever connected and bail with a new `Error::IceTimeout` if not within ICE_ESTABLISH_TIMEOUT. Set it (30s) ABOVE str0m's ~21s pair-exhaustion so a connection that actually started checks is still ended by str0m's native `Disconnected` path (and a slow-but-real TURN/lossy peer isn't clipped); this only fires for the never-any-candidate case. Clamp the select sleep to the deadline so the check fires even when str0m scheduled a far-off timeout. Logged as ordinary churn (debug), not a warning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3591445 to
ba22678
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rs/moq-rtc/src/session.rs (1)
174-203: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winMove the ICE deadline check after draining RTC output.
rs/moq-rtc/src/session.rs:174-203A packet processed right before the deadline can queueIceConnectionStateChange(Connected), but this branch returnsIceTimeoutbeforepoll_output()surfaces it. Reorder the deadline check to run after pendingTransmit/Eventoutput is drained toOutput::Timeout.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rs/moq-rtc/src/session.rs` around lines 174 - 203, Move the ICE timeout check in session::Session handling so pending RTC output is drained first. In the loop around self.rtc.poll_output(), process any Output::Transmit and Output::Event values (including IceConnectionStateChange) before returning Error::IceTimeout, then apply the ICE_ESTABLISH_TIMEOUT check only after Output::Timeout is reached so a late Connected event is not missed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@rs/moq-rtc/src/session.rs`:
- Around line 174-203: Move the ICE timeout check in session::Session handling
so pending RTC output is drained first. In the loop around
self.rtc.poll_output(), process any Output::Transmit and Output::Event values
(including IceConnectionStateChange) before returning Error::IceTimeout, then
apply the ICE_ESTABLISH_TIMEOUT check only after Output::Timeout is reached so a
late Connected event is not missed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2673968f-113b-4b93-83c9-c9ef49bfd88e
📒 Files selected for processing (2)
rs/moq-rtc/src/error.rsrs/moq-rtc/src/session.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- rs/moq-rtc/src/error.rs
Problem
str0m already ends a WebRTC connection whose candidate pairs were tried and failed: once every pair exhausts its STUN retransmits the ICE agent (
is) transitions toIceConnectionState::Disconnected(~21s at the defaultStunTiming), whichSession::runmaps toSessionClosed.What it does not end -- deliberately, for trickle ICE -- is a session that never received any remote candidate.
evaluate_stateforcesany_still_possible = truewheneverremote_candidatesis empty ("more candidates may yet arrive"), so a peer that answers the SDP and then sends nothing -- an abandoned WHIP/WHEP offer, or a probe that only exercises the signalling contract -- sits inCheckingforever. The task lives indefinitely holding its broadcast announcement + mux registration; accumulated stuck sessions peg the CPU. (str0m's ownchat.rs/http-post.rsexamples have the same gap -- they only react toDisconnected-- so the application is expected to bound this.)Fix
Add a backstop: track whether ICE ever connected and bail with a new
Error::IceTimeoutif not withinICE_ESTABLISH_TIMEOUT. Set it (30s) above str0m's ~21s pair-exhaustion so a connection that actually started checks is still ended by str0m's nativeDisconnectedpath (and a slow-but-real TURN/lossy peer isn't clipped) -- this only fires for the never-any-candidate case. Clamp theselectsleep to the deadline so the check fires even when str0m scheduled a far-off timeout. Logged as ordinary churn (debug).Verification
cargo build/clippy/test -p moq-rtcclean. Against a live dev edge: six abandoned WHIP offers + three WHEP offers were each torn down withsession ended: ICE never connected~the deadline after creation; before the change they never ended.🤖 Generated with Claude Code