Skip to content

Reject execPodStep when the exec WebSocket closes without a status response#376

Draft
jeanschmidt wants to merge 1 commit into
actions:mainfrom
jeanschmidt:jeanschmidt/upstream-pr01-close-handler-reject
Draft

Reject execPodStep when the exec WebSocket closes without a status response#376
jeanschmidt wants to merge 1 commit into
actions:mainfrom
jeanschmidt:jeanschmidt/upstream-pr01-close-handler-reject

Conversation

@jeanschmidt

@jeanschmidt jeanschmidt commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Reject execPodStep when the exec WebSocket closes without a status response

Impact: all consumers of the k8s hook's execPodStep (every container-step execution on a Kubernetes runner)
Risk: low

What

Make execPodStep settle deterministically when the underlying exec WebSocket closes before a V1Status is delivered, instead of hanging until the runner-level job timeout.

Why

The heartbeat added in #333 covers the pong-timeout case, but a clean WebSocket close mid-exchange (kube-apiserver restart/rollout, network drop, sidecar exit, LB idle reset, etc.) is not a pong timeout — no ping is in flight, so the heartbeat never fires. The status callback also never fires, because the server-sent V1Status frame is exactly what got cut off. The returned Promise then never settles, and the Actions step hangs until the job-level timeout with no actionable error in the logs.

How

Inside execPodStep's Promise scope, track whether the status callback fired with a local statusReceived flag. After exec.exec resolves with the WebSocket, register a one-shot ws.once('close') listener that — only if statusReceived is still false — stops the heartbeat and rejects with WebSocket closed without status response. The status callback's existing close-and-resolve path is unchanged; once it has run, the guarded close handler is a no-op, so there is no double-settle.

Behavior when a status is received is bit-for-bit identical to before. The new code path only runs when the socket dies without a status — the exact case that previously hung.

- Track whether the V1Status callback fired in execPodStep's Promise scope
- Attach a one-shot ws.once('close') listener that rejects with an
  explanatory error and stops the heartbeat when status was never received
- Add tests covering the close-without-status reject path and the
  Success-then-close regression (no double-settle)

The heartbeat introduced in actions#333 only catches the pong-timeout case
(~61s by default). A clean WebSocket close mid-exchange — apiserver
restart, network drop, sidecar exit — left execPodStep's Promise
unsettled, so the step hung until the runner-level job timeout with no
actionable error in logs. The close-without-status guard turns that
silent hang into an immediate, named failure.
Copilot AI review requested due to automatic review settings June 11, 2026 15:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds handling (and tests) for the case where the Kubernetes exec WebSocket closes before the status callback returns, avoiding hangs and preventing a second settle after successful completion.

Changes:

  • Add a statusReceived guard and a close handler to reject when the WebSocket closes without a status response.
  • Add Jest tests that cover (1) close-without-status rejection and (2) success-then-close not causing a second rejection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
packages/k8s/tests/exec-pod-step-close-test.ts Adds regression tests for close-without-status and success-then-close behavior.
packages/k8s/src/k8s/index.ts Adds WebSocket close handling guarded by statusReceived to avoid hangs and double-settling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +55
beforeEach(() => {
jest.clearAllMocks()
process.env['ACTIONS_RUNNER_KUBERNETES_NAMESPACE'] = 'test-namespace'
// Keep heartbeat dormant during the test so the only settle path is the
// close-without-status handler under test.
process.env['ACTIONS_RUNNER_HEARTBEAT_PERIOD_MS'] = '60000'
process.env['ACTIONS_RUNNER_HEARTBEAT_DEADLINE_MS'] = '60000'
})

afterEach(() => {
delete process.env['ACTIONS_RUNNER_KUBERNETES_NAMESPACE']
delete process.env['ACTIONS_RUNNER_HEARTBEAT_PERIOD_MS']
delete process.env['ACTIONS_RUNNER_HEARTBEAT_DEADLINE_MS']
})
Comment on lines +343 to +350
ws.once('close', () => {
if (!statusReceived) {
heartbeat.stop()
reject(
new Error('WebSocket closed without status response')
)
}
})
return new Promise<number>((resolve, reject) => {
core.debug('[execPodStep] About to call exec.exec')
let ws: HeartbeatWebSocket | null = null
let statusReceived = false
stdin ?? null,
false /* tty */,
async resp => {
statusReceived = true
Comment on lines +343 to +350
ws.once('close', () => {
if (!statusReceived) {
heartbeat.stop()
reject(
new Error('WebSocket closed without status response')
)
}
})
@jeanschmidt jeanschmidt marked this pull request as draft June 11, 2026 15:42
@jeanschmidt jeanschmidt reopened this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants