Reject execPodStep when the exec WebSocket closes without a status response#376
Draft
jeanschmidt wants to merge 1 commit into
Draft
Conversation
- Track whether the V1Status callback fired in execPodStep's Promise scope
- Attach a one-shot ws.once('close') listener that rejects with an
explanatory error and stops the heartbeat when status was never received
- Add tests covering the close-without-status reject path and the
Success-then-close regression (no double-settle)
The heartbeat introduced in actions#333 only catches the pong-timeout case
(~61s by default). A clean WebSocket close mid-exchange — apiserver
restart, network drop, sidecar exit — left execPodStep's Promise
unsettled, so the step hung until the runner-level job timeout with no
actionable error in logs. The close-without-status guard turns that
silent hang into an immediate, named failure.
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds handling (and tests) for the case where the Kubernetes exec WebSocket closes before the status callback returns, avoiding hangs and preventing a second settle after successful completion.
Changes:
- Add a
statusReceivedguard and aclosehandler to reject when the WebSocket closes without a status response. - Add Jest tests that cover (1) close-without-status rejection and (2) success-then-close not causing a second rejection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| packages/k8s/tests/exec-pod-step-close-test.ts | Adds regression tests for close-without-status and success-then-close behavior. |
| packages/k8s/src/k8s/index.ts | Adds WebSocket close handling guarded by statusReceived to avoid hangs and double-settling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+42
to
+55
| beforeEach(() => { | ||
| jest.clearAllMocks() | ||
| process.env['ACTIONS_RUNNER_KUBERNETES_NAMESPACE'] = 'test-namespace' | ||
| // Keep heartbeat dormant during the test so the only settle path is the | ||
| // close-without-status handler under test. | ||
| process.env['ACTIONS_RUNNER_HEARTBEAT_PERIOD_MS'] = '60000' | ||
| process.env['ACTIONS_RUNNER_HEARTBEAT_DEADLINE_MS'] = '60000' | ||
| }) | ||
|
|
||
| afterEach(() => { | ||
| delete process.env['ACTIONS_RUNNER_KUBERNETES_NAMESPACE'] | ||
| delete process.env['ACTIONS_RUNNER_HEARTBEAT_PERIOD_MS'] | ||
| delete process.env['ACTIONS_RUNNER_HEARTBEAT_DEADLINE_MS'] | ||
| }) |
Comment on lines
+343
to
+350
| ws.once('close', () => { | ||
| if (!statusReceived) { | ||
| heartbeat.stop() | ||
| reject( | ||
| new Error('WebSocket closed without status response') | ||
| ) | ||
| } | ||
| }) |
| return new Promise<number>((resolve, reject) => { | ||
| core.debug('[execPodStep] About to call exec.exec') | ||
| let ws: HeartbeatWebSocket | null = null | ||
| let statusReceived = false |
| stdin ?? null, | ||
| false /* tty */, | ||
| async resp => { | ||
| statusReceived = true |
Comment on lines
+343
to
+350
| ws.once('close', () => { | ||
| if (!statusReceived) { | ||
| heartbeat.stop() | ||
| reject( | ||
| new Error('WebSocket closed without status response') | ||
| ) | ||
| } | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reject execPodStep when the exec WebSocket closes without a status response
Impact: all consumers of the k8s hook's
execPodStep(every container-step execution on a Kubernetes runner)Risk: low
What
Make
execPodStepsettle deterministically when the underlying exec WebSocket closes before aV1Statusis delivered, instead of hanging until the runner-level job timeout.Why
The heartbeat added in #333 covers the pong-timeout case, but a clean WebSocket close mid-exchange (kube-apiserver restart/rollout, network drop, sidecar exit, LB idle reset, etc.) is not a pong timeout — no ping is in flight, so the heartbeat never fires. The status callback also never fires, because the server-sent
V1Statusframe is exactly what got cut off. The returned Promise then never settles, and the Actions step hangs until the job-level timeout with no actionable error in the logs.How
Inside
execPodStep's Promise scope, track whether the status callback fired with a localstatusReceivedflag. Afterexec.execresolves with the WebSocket, register a one-shotws.once('close')listener that — only ifstatusReceivedis still false — stops the heartbeat and rejects withWebSocket closed without status response. The status callback's existing close-and-resolve path is unchanged; once it has run, the guarded close handler is a no-op, so there is no double-settle.Behavior when a status is received is bit-for-bit identical to before. The new code path only runs when the socket dies without a status — the exact case that previously hung.