fix: Action Not Found Race Condition#36
Conversation
Adds `compact-uuids` library and switches frontend-facing IDs from hyphen (`-`) to underscore (`_`) separator (snake case). ## Rationale - **Snake case:** allow double-click selection of entire ID in browser devtools - **Compact UUIDs:** 26 chars vs 36 chars (30% smaller), URL-safe, no ambiguous characters (0/O, 1/l/I) ## Changes 1. **Compact UUIDs:** All UUID usages replaced with compact encoding 2. **Snake case:** Only frontend-facing IDs modified - action-id and tab-id
4884393 to
32d62c8
Compare
Problem: cleanup-tab-actions! created a gap where :actions was empty between cleanup and re-registration. In-flight HTTP action requests during this gap would fail with "Action not found". Solution: Use ReentrantReadWriteLock (FIFO) per tab to serialize cleanup+render (write lock) with action execution (read lock).
32d62c8 to
900e7bb
Compare
|
Thanks for the detailed write up and detective work. Been traveling for the weekend but will review ASAP. |
|
Hey @alekseysotnikov thanks again for finding this bug! I've fixed it in dc317a3. Since the renderer is a dedicated virtual thread and action IDs are deterministic, the issue really only existed because we deleted actions before re-registering them. Rather than introducing locks around it, I was able to just render first (which re-registers all live actions in place), then atomically swap out any any stale IDs that weren't re-registered. Thanks again for finding the bug! |
Replace the cleanup-before-render pattern that created a gap where actions were missing between cleanup and re-registration. Instead, collect registered action IDs during render via a new *registered-action-ids* dynamic var, then atomically sweep only stale actions after render completes. No window where actions are absent, no locks needed. Closes #36
|
Thanks. I have reviewed your commit, and it aligns with my initial approach. However, I later realized there's a second important point to consider: since a single action can update multiple cursors, the renderer might run mid-execution and broadcast an unintended intermediate state, ie inconsistent HTML, part rendered with old state, part with new. For example:
This means the renderer could emit a partial state that wasn't designed to be exposed. Local signals or reactive handlers might then respond incorrectly to this intermediate state. The only way to prevent this would be if actions guaranteed atomic state updates, which they currently don't. Thats why locks were added to the renderer. |
|
That's a great thought. I wonder if either some sort of batching mechanism or STM could be used to help ensure consistent rendering. I will give it some more thought. In reality due to throttling of the render loop it is less likely to be hit, but certainly could be. That being said, a user could get atomicity guarantees by storing both pieces of state inside the cursor instead of using two cursors - and in a way this is very similar to regular Clojure as well. |
Sure, but it is a workaround in the current implementation. It is unclear from the beginning, meaning a user could get hurt here. Also it forces to have more gymnastic with keywords.
I also think that in worse cases, this could have a cumulative effect that strikes too rarely and is hard to debug. |
|
@alekseysotnikov I saw some of the stuff around the PR on your fork and it prompted me to think about this a bit more. I really think that globally locking ends up being the wrong choice:
Instead I introduced some new code in 3c6e730 that causes a consistent picture of all cursors to be presented at the time that we start the render cycle. Let me know what your thoughts are and thanks again for your investigation in this to begin with! |
|
I agree with your concerns about the consequences of locking — this has been bothering me lately as well, and it’s pushing me to think in the direction of something like an optional lock-free action mode (which complicates usability). Your approach is really good, but there’s still a case where intermediate state can appear on the client side: |
|
@alekseysotnikov I think the solution will be to introduce a |
|
This 100% reasonable. If So this way locks are not needed for sure |
|
@alekseysotnikov Just shipped (h/action
(reset! loading* true) ;; immediate — renderer shows spinner
(let [data (fetch-from-api!)]
(h/batch ;; atomic — lands as one swap
(reset! data* data)
(reset! loading* false))))It's opt-in, so actions that want intermediate state (progress bars, loading spinners) keep working exactly as before — just don't wrap those writes in Under the hood it reuses the same overlay mechanism we already had for render snapshots, so there's no new concurrency primitive, just a dynamic binding and a flush function. Thanks again for all of the back and forth on this. Hopefully this covers your use case! |
|
WOW, that’s really amazing! The project is designed incredibly well. Thank you again |
This fix builds on PR #34, which helped identify the root cause.
Originally, I encountered the problem when an action received CustomEvents from a third-party JS component (similar to this datastar example) "concurrently" with another action trigger.
How to reproduce
Load

example.appns and start typing chars fast until get HTTP 500.Symptom
Intermittent errors:
The
:actionsmap was completely empty — despite actions being registered on every render.Root Cause: The Cleanup Gap
The render loop in
server.cljperformed two operations sequentially:Between Step A and Step B,
:actionsis{}. This gap lasts ~1-50ms depending on render complexity.The Broken Flow (Race Condition Timeline)
The action handler's lookup at T3 happens during the gap — the action existed at T0 (from the previous render), was wiped at T0, and hasn't been re-registered yet at T4.
Why Simpler Fixes Don't Work
The Correct Fix: ReentrantReadWriteLock (FIFO)
Why RwLock specifically:
The lock must cover the entire action lifecycle — lookup AND execution. If the lookup happens outside the lock, the race still exists.
The Correct Flow (With RwLock)
The action handler blocks at T1 until the render completes at T3. By then, new actions are registered and the lookup succeeds.
Files Changed
hyper/server.cljReentrantReadWriteLockimport; created lock per tab in-start-renderer!; wrapped cleanup+render in write lock in-renderer-loop!hyper/actions.cljKey Design Decisions