perf(tree): binary-search anchor removeChild + O(1) revision lookup in editManager#27428
perf(tree): binary-search anchor removeChild + O(1) revision lookup in editManager#27428anthony-murphy wants to merge 5 commits into
Conversation
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (233 lines, 6 files), I've queued these reviewers:
How this works
|
| } | ||
| } | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
Deep Review: binaryFindIndex here is a textbook [min, max) search with no fast-path. The sibling binaryFind immediately above (anchorSet.ts:1284-1310) opens with const guess = sorted[index]; if (guess?.parentIndex === index) return guess;, and its privateRemarks (anchorSet.ts:1273-1283) explicitly warns "current usages tends to fully populate the anchor tree leading the correct array index to be the requested parent index … care and benchmarking should be used when messing with this function."
The PR description says the fast-path "doesn't apply to the index-by-parentIndex case", but when the field is dense sorted[index].parentIndex === index, so the array index of the hit equals index — one read instead of log n iterations. removeChild runs from considerDispose on every anchor free, which is exactly the path CraigMacomber tuned in #22717 ("I started with a more generic binary search which returned the index, but this was so dominated by small data sizes (0 and 1 are the common array sizes here) that it had a lot of overhead…", 2024-10-03).
Suggest adding the guess fast-path before the binary loop and updating binaryFind's privateRemarks to also cover the new index variant. Re-run the BubbleBench-style anchor benchmarks #22717 used to confirm parity.
| } | |
| function binaryFindIndex(sorted: readonly PathNode[], index: number): number { | |
| const guess = sorted[index]; | |
| if (guess !== undefined && guess.parentIndex === index) { | |
| return index; | |
| } | |
| let min = 0; | |
| let max = sorted.length; | |
| while (min < max) { | |
| const mid = (min + max) >>> 1; | |
| const current = sorted[mid].parentIndex; | |
| if (current === index) { | |
| return mid; | |
| } else if (current < index) { | |
| min = mid + 1; | |
| } else { | |
| max = mid; | |
| } | |
| } | |
| return -1; | |
| } |
| } | ||
| } | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
Deep Review: Same missing fast-path as the anchorSet.ts copy. When the field is dense sorted[index].parentIndex === index, so the answer is index — one read instead of log n iterations. The sibling binaryFind (and binaryFind in anchorSet.ts) opens with exactly this guess to avoid the loop on the common dense-array shape that #22717's benchmarks were tuned against. Suggest mirroring the fast-path here and re-running the relevant benchmark to confirm parity.
| } | |
| function binaryFindIndex<TData>(sorted: readonly SparseNode<TData>[], index: number): number { | |
| const guess = sorted[index]; | |
| if (guess !== undefined && guess.parentIndex === index) { | |
| return index; | |
| } | |
| let min = 0; | |
| let max = sorted.length; | |
| while (min < max) { | |
| const mid = (min + max) >>> 1; | |
| const current = sorted[mid].parentIndex; | |
| if (current === index) { | |
| return mid; | |
| } else if (current < index) { | |
| min = mid + 1; | |
| } else { | |
| max = mid; | |
| } | |
| } | |
| return -1; | |
| } |
| const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex); | ||
| assert(childIndex !== -1, 0x35c /* child must be parented to be removed */); | ||
| field?.splice(childIndex, 1); | ||
| if (field?.length === 0) { |
There was a problem hiding this comment.
Deep Review: The previous field?.indexOf(child) implicitly cross-checked object identity (it returns -1 if the exact child reference is absent). The replacement binaryFindIndex(field, child.parentIndex) returns a hit whenever some node with that parentIndex exists — if the sort/uniqueness invariant ever drifted (duplicate parentIndex, stale child reference), the PR would silently splice the wrong node.
Extend the assert to also verify identity. Zero runtime cost in non-debug builds; restores the implicit invariant indexOf provided.
| if (field?.length === 0) { | |
| assert(childIndex !== -1 && field![childIndex] === child, 0x35c /* child must be in the field */); |
| // Note that this is the index in the list of child paths, not the index within the field. | ||
| const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex); | ||
| assert(childIndex !== -1, 0x4a5 /* child must be parented to be removed */); | ||
| field?.splice(childIndex, 1); |
There was a problem hiding this comment.
Deep Review: Same identity-loss concern as the anchorSet.ts assert. The prior field?.indexOf(child) returned -1 unless the exact child reference was present; binaryFindIndex(field, child.parentIndex) matches any node with that parentIndex. If the sorted-by-parentIndex invariant ever drifted (duplicate, stale reference), this would splice the wrong node silently.
| field?.splice(childIndex, 1); | |
| assert(childIndex !== -1 && field![childIndex] === child, 0x4a5 /* child must be in the field */); |
| public getLocalCommitIndexByRevision( | ||
| branchId: BranchId, | ||
| revision: RevisionTag, | ||
| ): number | undefined { |
There was a problem hiding this comment.
Deep Review: getLocalCommitIndexByRevision(branchId, revision): number | undefined leaks the storage shape — callers need to know the result is an index into localCommits and must reason about the new headOffset index space the side-map maintains. The only call site (sharedTreeCore.ts:558-566) immediately does localCommits.slice(revisionIndex), so the index is a means to a suffix slice.
Both solution-space reviewers independently designed a getLocalCommitsFrom(branchId, revision): readonly GraphCommit<TChangeset>[] returning the suffix slice (or undefined/empty). That matches the symmetry of the existing getLocalCommits(branchId) accessor at editManager.ts:422-425, removes the undefined-vs--1 sentinel split the PR's own "Risks and tradeoffs" calls out, and keeps EditManager's storage substitutable.
The fair counter is that getLocalCommits() already exposes the array shape — but it does not expose headOffset, which this accessor effectively does. If you'd rather keep the index API for callers we haven't seen yet, a one-line // returns an index into getLocalCommits(branchId); undefined if not present comment here would be enough to ward off the next reader.
| * @param index - parentIndex being looked for. | ||
| * @returns array index of the child with the requested parentIndex, or -1 if not found. | ||
| */ | ||
| function binaryFindIndex(sorted: readonly PathNode[], index: number): number { |
There was a problem hiding this comment.
Deep Review: This binaryFindIndex and the copy in sparseTree.ts:150-172 are byte-identical apart from the element type parameter (PathNode vs SparseNode<TData>), and both rely only on .parentIndex — a property on both via UpPath. Unlike binaryFind, this helper has no tuned fast-path or benchmark surface, so the "don't extract benchmarked code" rationale in the PR description does not apply here. Drift risk is real: a fast-path or bug-fix added to one copy (see the adjacent thread asking for the dense-array guess) will silently miss the other.
Suggest extracting function binaryFindIndexByParentIndex<T extends { parentIndex: number }>(sorted: readonly T[], index: number): number to a shared module alongside UpPath (or core/tree/) and calling it from both removeChild sites. If the fast-path lands, the shared helper carries it once — one code change, two beneficiaries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deep ReviewReviewed commit Readiness: 7/10 — ALMOST READY Two contained perf-prep changes — binary-search Path to Ready
Context for Reviewers
For human reviewer
Review history (2 prior reviews)
|
Description
Bundles two tree structural changes:
Binary-search by
parentIndexinremoveChild(anchorSet.tsandsparseTree.ts) — replaces the linearindexOfwith binary search bychild.parentIndex(). The field arrays are already kept sorted byparentIndex(the existingbinaryFindhelpers on the same structs rely on it). The existing TODOs at both sites are removed.Map<RevisionTag, number>side-map onSharedBranch(editManager.ts+sharedTreeCore.ts) — sosharedTreeCore.ts:getEnrichedCommit'slocalCommits.findIndex(c => c.revision === revision)becomes O(1). ThelocalCommitsarray andgetLocalCommits()shape are unchanged. Maintenance uses a head-offset counter on shift so other map entries don't need to be touched (avoids O(m) decrement-all per ACK).A local
binaryFindIndexhelper is added in each removeChild file. The existing benchmarkedbinaryFindis left untouched — it returns the element (not the index) and has a tuned dense-array fast-path that doesn't apply to the index-by-parentIndex case.Why
removeChildsites called this out: "should do more optimized search (ex: binary search or better) usingchild.parentIndex()." This is invoked from anchor lifecycle (considerDispose) — every anchor free in workloads with many siblings paid the scan.findIndexper resubmit was O(n) → O(n²) per full reSubmit. The side-map keeps the lookup O(1) without amortizing the shift cost over every other entry.Notes
getLocalCommits()return shape unchanged (readonly GraphCommit[]). The sort-by-parentIndexinvariant on anchor fields is already maintained by every field mutator (getOrCreateChildsorts after push;coupleNodes/decoupleNodes/offsetChildrenpreserve order). Pure perf prep — no squash logic, no tests touched, no api-report changes.