You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(cli): fail attempt on uncaught exception instead of hanging to maxDuration (TRI-9117) (#3529)
When a Node EventEmitter (e.g. node-redis) emits an "error" event with
no
listener attached, Node escalates it to process.on("uncaughtException")
in
the task worker. The worker reported the error via the
UNCAUGHT_EXCEPTION
IPC event but did not exit, and the supervisor-side handler in
taskRunProcess only logged the message at debug level — leaving the
run()
promise orphaned until maxDuration fired and producing empty attempts
(durationMs=0, costInCents=0).
The supervisor now rejects the in-flight attempt with an
UncaughtExceptionError and gracefully terminates the worker (preserving
the OTEL flush window) on UNCAUGHT_EXCEPTION. The attempt fails fast
with
TASK_EXECUTION_FAILED, surfacing the original error name, message, and
stack trace, and falls under the normal retry policy. This mirrors the
existing indexing-side behavior in indexWorkerManifest. Apply the same
handling to unhandled promise rejections, which Node already routes
through uncaughtException by default.
Fail attempts on uncaught exceptions instead of hanging to `MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`) emitting `"error"` with no `.on("error", ...)` listener escalates to `uncaughtException`, which the worker previously reported but did not act on — runs drifted to maxDuration with empty attempts. They now fail fast with the original error and status `FAILED`, and respect the task's normal retry policy. You should still attach `.on("error", ...)` listeners to long-lived clients to handle errors gracefully.
Copy file name to clipboardExpand all lines: docs/troubleshooting.mdx
+49Lines changed: 49 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -278,6 +278,55 @@ You could also offload the CPU-heavy work to a Node.js worker thread, but this i
278
278
279
279
If the above doesn't work, then we recommend you try increasing the machine size of your task. See our [machines guide](/machines) for more information.
280
280
281
+
### Uncaught exceptions
282
+
283
+
If you see a `TASK_RUN_UNCAUGHT_EXCEPTION` error, an exception escaped your task's `run()` function without being thrown through your `await` chain — the runtime caught it via Node's `process.on("uncaughtException")` handler. The dashboard surfaces this as a regular task failure (status `Failed`) and the run will retry according to your task's retry policy, but the exception still indicates a bug worth fixing.
284
+
285
+
The most common cause is a Node `EventEmitter` emitting an `"error"` event with no listener attached. When this happens, Node escalates the event into an `uncaughtException`. Long-lived clients like `node-redis`, `pg`, `kafkajs`, and `mongodb` all surface socket-level errors this way.
286
+
287
+
For example, a `node-redis` client with no error listener will fail your run with an `Error: read ECONNRESET` (or similar TCP error) the next time the socket is reset:
// GOOD: the listener catches socket-level errors. The awaited command
312
+
// (e.g. .get) will still reject if the connection is broken, and that
313
+
// rejection propagates through your run() and fails the attempt cleanly.
314
+
client.on("error", (err) => {
315
+
logger.warn("Redis client error", { err });
316
+
});
317
+
318
+
awaitclient.connect();
319
+
returnawaitclient.get("foo");
320
+
```
321
+
322
+
The same fix applies to any library that emits `"error"` events. As a rule, attach an `.on("error", ...)` listener to every long-lived client you create inside a task.
323
+
324
+
<Note>
325
+
326
+
Unhandled promise rejections (e.g. `Promise.reject(...)` with no `.catch`) take the same path — Node routes them through `uncaughtException` by default, and the runtime treats them as `TASK_RUN_UNCAUGHT_EXCEPTION` for the same reasons. Make sure every promise either gets `await`ed or has a `.catch(...)` handler.
Errors mentioning `sendBatchNonBlocking`, `@s2-dev/streamstore`, or `S2AppendSession` (often with `code: undefined`) can occur when you close a stream and then await `waitUntilComplete()`, or when a stream runs for a long time (e.g. 20+ minutes). Wrap `waitUntilComplete()` in try/catch so Transport/closed-stream errors don't fail your task:
0 commit comments