Skip to content

dds-map: subdir identity + keyset lifetime + ack metadata identity + side-maps + keysets list#27413

Draft
anthony-murphy wants to merge 8 commits into
microsoft:mainfrom
anthony-murphy:prep-map
Draft

dds-map: subdir identity + keyset lifetime + ack metadata identity + side-maps + keysets list#27413
anthony-murphy wants to merge 8 commits into
microsoft:mainfrom
anthony-murphy:prep-map

Conversation

@anthony-murphy
Copy link
Copy Markdown
Contributor

@anthony-murphy anthony-murphy commented May 27, 2026

Description

Bundles five structural changes in packages/dds/map/:

  1. Subdirectory pending-op resubmit by reference identity — adds pendingEntry back-pointers on ICreateSubDirLocalOpMetadata / IDeleteSubDirLocalOpMetadata, threads them through the submit helpers, and rewrites resubmitSubDirectoryMessage to use the back-pointer instead of findLast(name && type).
  2. PendingKeySet.lifetime back-pointer — adds a readonly lifetime: PendingKeyLifetime back-pointer on PendingKeySet in both mapKernel.ts and directory.ts, populates it at construction sites, and converts rollback consumers to use the back-pointer for O(1) parent lookup instead of findLastIndex over pendingData.
  3. Local set/delete ack via metadata identity — replaces pendingData.findIndex(... entry.key === op.key) at the 4 local-op-ack sites with reference-identity (delete: localOpMetadata IS the PendingKeyDelete; set: uses the lifetime back-pointer from Address re-entrancy when playing ops (#3242) #2 directly).
  4. pendingSubDirectoryByName side-map — adds Map<string, DoublyLinkedList<PendingSubDirectoryEntry>> next to pendingSubDirectoryData, maintained symmetrically at every mutation. Converts 10+ scan sites (findIndex/findLast/some for subdirName === name) to O(1) / O(per-name-length). New private helpers on SubDirectory: pushPendingSubDirectoryEntry, findLastPendingSubDirectoryNode, removePendingSubDirectoryEntry.
  5. PendingKeyLifetime.keySets: DoublyLinkedList<PendingKeySet> — replaces the array with a doubly-linked list in both files. ACK shift() is O(1) instead of Array.shift()'s O(k); rollback pop() similarly; last-element reads use list.last!.data.

Why

  • The subdir-identity change fixes a latent 0xc31 assert on same-name lifecycle cycles (delete → create → delete with the same subdirectory name), where findLast(name && type) could pair the resubmit with the wrong pending entry. Also removes two O(N) scans per resubmit.
  • The lifetime back-pointer makes parent–child relationships between PendingKeyLifetime and its PendingKeySets explicit, eliminates findLastIndex scans during rollback, and lets the local-ack lookups (Sample: type-race switch to use webpack-dev-server from prague-dev-server #3) be O(1) instead of linear key scans.
  • subdirectories() was O(n²) today (n entries × findLast per name). The side-map collapses those scans without changing observable behavior or insertion-order semantics. The flat pendingSubDirectoryData array is preserved to minimize blast radius; only the lookup-scans (the hot paths) become O(1).
  • Array.shift() is O(k) per ack on a lifetime with k queued keySets — quadratic when draining a hot key written k times before any ack. DoublyLinkedList.shift() is O(1).

Notes

Pure structural prep — no reSubmitSquashed, no squash logic, no tests touched, no api-report changes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (641 lines, 4 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

@anthony-murphy anthony-murphy changed the title dds-map: subdir pending-op identity + PendingKeySet lifetime back-pointer dds-map: subdir identity + keyset lifetime + ack metadata identity + side-maps + keysets list May 27, 2026
0xc72 /* Unexpected pending data for deleteSubDirectory op */,
);
this.pendingSubDirectoryData.splice(pendingEntryIndex, 1);
this.removePendingSubDirectoryEntry(pendingEntry, pendingNode);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Subdirectory rollback ignores localOpMetadata.pendingEntry and still removes by name+type.

This PR plumbs pendingEntry through ICreateSubDirLocalOpMetadata / IDeleteSubDirLocalOpMetadata specifically to disambiguate same-name lifecycle ops, and resubmitSubDirectoryMessage now uses it — pendingSubDirectoryData.includes(localOpMetadata.pendingEntry) plus resubmission of that exact entry. Rollback here calls findLastPendingSubDirectoryNode(subdirName, "createSubDirectory" | "deleteSubDirectory") and then removePendingSubDirectoryEntry(pendingEntry, pendingNode) without ever consulting localOpMetadata.pendingEntry.

On a pending delete("x") → create("x") → delete("x") sequence, rolling back the first delete removes the last delete entry instead of the metadata's own entry — the exact same-name ambiguity the resubmit-side fix was authored to prevent, and the failure mode Abe27342 flagged on #15493 (2023-05-26, inline on directory.ts:1116). The asymmetry is logically indefensible: both rollback and resubmit must use identity, or neither does.

Fix: in both create-rollback and delete-rollback branches, locate and remove localOpMetadata.pendingEntry directly. Mirror what resubmitSubDirectoryMessage already does. At minimum, add assert(pendingNode.data === localOpMetadata.pendingEntry) before removePendingSubDirectoryEntry so the asymmetry fails loud rather than silently corrupting the pending queue. Storing the owning ListNode on pendingEntry at push time makes removal O(1) without an indexOf scan.

…k regressions

Add two regressions to lock in the prep-map fixes:

- Reconnection: delete -> create -> delete the same SharedDirectory subdir
  while disconnected, then reconnect. Guards the reference-identity
  resubmit path against the 0xc31 assert.
- Rollback: set -> delete -> set the same SharedMap key, then rollback.
  Exercises the new PendingKeySet.lifetime back-pointer and the
  per-type rollback branches.
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review

Reviewed commit 63b260b on 2026-05-27.

Readiness: 3/10 — GETTING STARTED

Not ready for sign-off. The structural refactor — identity back-pointers on subdir pendingEntry and on PendingKeySet.lifetime, the pendingSubDirectoryByName side-map, and the DLL-based keySets — is sound. One Tier 1 holds readiness in GETTING STARTED: subdirectory rollback still removes pending entries by findLast(name, type) while resubmitSubDirectoryMessage was pivoted to identity, re-exposing the same same-name-lifecycle ambiguity the pendingEntry back-pointer was authored to prevent. The unresolved inline thread on directory.ts:2685 (raised on the prior SHA) still applies — the affected code at lines 2637–2698 is unchanged at 63b260b.

Path to Ready

  • Resolve inline threads
  • Convert subdirectory rollback to identity-based removal at directory.ts ~2637–2698 (both create-rollback and delete-rollback branches): use localOpMetadata.pendingEntry directly instead of findLastPendingSubDirectoryNode(name, type), mirroring resubmitSubDirectoryMessage. At minimum add assert(pendingNode.data === localOpMetadata.pendingEntry, "…") before removePendingSubDirectoryEntry so the asymmetry fails loud.
  • Add a focused mocha regression under packages/dds/map/src/test/mocha/ for the disconnected SharedDirectory delete("a") → create("a") → delete("a") cycle that rolls back the first delete and asserts the later create("a") / delete("a") entries survive. Mirror the same-key SharedMap regression in map.rollback.spec.ts:330-390.
  • Add assert(pendingEntryIndex !== -1, "…") at the set-ack splice sites — mapKernel.ts:836-838 and directory.ts:2143-2145 — to match the sibling delete-ack and rollback paths.
  • Update the PR description to acknowledge the two new regression tests (map.rollback.spec.ts same-key set → delete → set; reconnection.spec.ts same-subdir delete → create → delete) and the latent bugs they pin — "no tests touched" no longer fits. Also tighten the asymptotic wording: key iteration (internalIterator, getOptimisticValue, optimisticallyHas) remains effectively O(n²) until a per-key side index lands as follow-up.
  • Run the SharedMap / SharedDirectory fuzz suites (directoryFuzzTests.spec.ts) and the tinylicious / odsp / odsp-df stress stages on 63b260bRevert "Map rollback across remote ops" #25066 reverted the prior change here because stress, not unit tests, caught the regression.
  • Run pnpm run policy-check:asserts and decide on 0xc050xc07 / 0xbf70xbf9 continuity: either re-mint short codes on the equivalent post-PR predicates, or document the retirement so telemetry owners can migrate.

Context for Reviewers

For human reviewer
Review history (3 prior reviews)
  • 56348dc 2026-05-27 · 4/10 — rollback-by-identity asymmetry persists as Tier 1
  • 2e1932d 2026-05-27 · 4/10 — structural refactor sound; rollback-by-identity asymmetry preserved as Tier 1
  • 0b901c9 2026-05-27 · 3/10 — structural refactor sound; missing regression test for the named 0xc31 lifecycle fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant