Skip to content

perf(matrix): use DoublyLinkedList + side-map for pending cell changes#27422

Draft
anthony-murphy wants to merge 4 commits into
microsoft:mainfrom
anthony-murphy:prep-matrix-pending-cell-list
Draft

perf(matrix): use DoublyLinkedList + side-map for pending cell changes#27422
anthony-murphy wants to merge 4 commits into
microsoft:mainfrom
anthony-murphy:prep-matrix-pending-cell-list

Conversation

@anthony-murphy
Copy link
Copy Markdown
Contributor

Description

Replaces PendingCellChanges<T>.local: { localSeq, value }[] with DoublyLinkedList<{ localSeq, value }> and adds a sibling side-map localByLocalSeq: Map<number, ListNode<{ localSeq, value }>> for O(1) findIndex-by-localSeq lookup.

Sites converted:

  • sendSetCellOp push: insert into list, register node in side-map.
  • reSubmitCore: was findIndex + splice per pending op; now localByLocalSeq.get(localSeq)list.remove(node)localByLocalSeq.delete(localSeq). O(1).
  • ACK shift: list.shift()?.data + side-map delete.
  • Rollback pop() + peek-tail list.last!.data.value.
  • Length probes use list.length.

Why

For a hot cell with k pending writes, the previous reSubmit-per-op was O(k) find + O(k) splice (O(k²) total) and per-ACK was O(k) shift. For n ops to a single hot cell, the whole rebase pass was O(n²). The list-plus-side-map makes every per-op step O(1).

Notes

Snapshot/wire format unaffected (_pendingCliSeqData is unknown). Pure perf prep — no reSubmitSquashed, no squash logic, no tests touched, no api-report changes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (131 lines, 2 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Comment thread packages/dds/matrix/src/matrix.ts
Comment thread packages/dds/matrix/src/matrix.ts Outdated
const change = pendingCell.local.pop();
const change = pendingCell.local.pop()?.data;
assert(change?.localSeq === setMetadata.localSeq, 0xbaa /* must have change */);
pendingCell.localByLocalSeq.delete(setMetadata.localSeq);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: pop() here is correct — rollback is LIFO, the operation being rolled back is always the tail, and assert 0xbaa enforces the match. But on #24604 the Copilot bot flagged this exact construct ("Using pop() to remove the pending operation may remove the most recently added op rather than the one corresponding to the rollback"), and now that this PR adds the localByLocalSeq index — making targeted removal trivial and applying it in reSubmitCore — a reader will reasonably ask why rollback didn't switch too. Switching would be semantically wrong (rollback must restore the previous cell value, which requires popping the tail — see matrix.rollback.spec.ts:163-187, :232-290), so the code is right; only the intent is implicit.

A one-line comment above the pop() pre-empts the same #24604 round-trip.

Suggested change
pendingCell.localByLocalSeq.delete(setMetadata.localSeq);
// Rollback is LIFO: the operation being rolled back is always the tail. `pop()` is correct; assert 0xbaa enforces the match.
const change = pendingCell.local.pop()?.data;

@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review

Reviewed commit abd04d4 on 2026-05-27.

Readiness: 8/10 — ALMOST READY

Faithful, well-scoped perf prep: PendingCellChanges<T>.local is now a private PendingLocalCellChanges<T> wrapper around DoublyLinkedList + Map<localSeq, ListNode>, turning the per-cell reSubmit hot path from O(k²) to O(k). The hot-cell reconnect regression asked for in the prior review landed at matrix.reconnect.spec.ts:234-283 (N=200), and the lock-step-invariant thread is resolved by the wrapper-class adoption that defines this commit. No correctness defects. Five Tier 3 polish items remain — the last-leaks-ListNode ergonomic gap (the PR's only eslint-disable), an untagged assert, the rollback pop()-is-LIFO-only note, undocumented per-member contracts on the wrapper (esp. the push uniqueness precondition), and no micro-bench guarding the asymptotic claim.

Path to Ready

  • Resolve inline threads
  • File a follow-up issue for a hot-cell micro-bench in packages/dds/matrix/src/test/matrix.bench.ts (N disconnected setCell to the same (row, col) + reconnect/ACK drain) so a future regression that reintroduces an O(N) per-op scan is caught

Context for Reviewers

For human reviewer
  • Encapsulation taste call (Josmithr) — the wrapper-class direction was already adopted at the design-owner's request (thread 3314510112, resolved). The remaining ergonomic leak (last returning ListNode) is matrix-area taste; if peekNewest() is acceptable, the inline thread's recommendation lands cleanly.
  • Perf-direction validation — the pipeline cannot measure runtime. Once the bench follow-up lands, confirm the measured improvement matches the O(k²) → O(k) claim under Matrix: Enable rollback and local server stress #24604's stress workloads.
  • FWW ACK glance (jatgarg) — the ACK-path conversion from array shift() to list.shift()?.data (matrix.ts:1142-1146) is mechanically equivalent but worth a domain-expert look given Update shared matrix FWW policy to immediately switch mode on switch rather than on ack #18709's pending-write sequencing history.
  • Resubmit-over-deleted-row/col (Abe27342)fix(matrix): Properly detect row/cell deletion when resubmitting a set op #19862's fix sits adjacent to the new removeByLocalSeq → list.remove(node) in reSubmitCore. The local-ref-position dance is unchanged but worth a confirming glance.
Review history (3 prior reviews)
  • 42d0e76 2026-05-27 · 7/10 — perf prep verified; four polish items, hot-cell regression test most material
  • b065a1c 2026-05-27 · 7/10 — perf prep verified; three polish items, hot-cell regression test most material
  • 09b3415 2026-05-27 · 6/10 — perf prep with a single hot-cell regression-test blocker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant