Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 28 additions & 3 deletions packages/dds/tree/src/core/tree/anchorSet.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1226,9 +1226,8 @@ class PathNode extends ReferenceCountedBase implements AnchorNode {
assert(this.status === Status.Alive, 0x40e /* PathNode must be alive */);
const key = child.parentField;
const field = this.children.get(key);
// TODO: should do more optimized search (ex: binary search or better) using child.parentIndex()
// Note that this is the index in the list of child paths, not the index within the field
const childIndex = field?.indexOf(child) ?? -1;
// Note that this is the index in the list of child paths, not the index within the field.
const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex);
assert(childIndex !== -1, 0x35c /* child must be parented to be removed */);
field?.splice(childIndex, 1);
if (field?.length === 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: The previous field?.indexOf(child) implicitly cross-checked object identity (it returns -1 if the exact child reference is absent). The replacement binaryFindIndex(field, child.parentIndex) returns a hit whenever some node with that parentIndex exists — if the sort/uniqueness invariant ever drifted (duplicate parentIndex, stale child reference), the PR would silently splice the wrong node.

Extend the assert to also verify identity. Zero runtime cost in non-debug builds; restores the implicit invariant indexOf provided.

Suggested change
if (field?.length === 0) {
assert(childIndex !== -1 && field![childIndex] === child, 0x35c /* child must be in the field */);

Expand Down Expand Up @@ -1310,6 +1309,32 @@ function binaryFind(sorted: readonly PathNode[], index: number): PathNode | unde
return undefined; // If we reach here, target is not in array (or array was not sorted)
}

/**
* Find the array index of a child PathNode by its parentIndex using a binary search.
* @param sorted - array of PathNode's sorted by parentIndex.
* @param index - parentIndex being looked for.
* @returns array index of the child with the requested parentIndex, or -1 if not found.
*/
function binaryFindIndex(sorted: readonly PathNode[], index: number): number {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: This binaryFindIndex and the copy in sparseTree.ts:150-172 are byte-identical apart from the element type parameter (PathNode vs SparseNode<TData>), and both rely only on .parentIndex — a property on both via UpPath. Unlike binaryFind, this helper has no tuned fast-path or benchmark surface, so the "don't extract benchmarked code" rationale in the PR description does not apply here. Drift risk is real: a fast-path or bug-fix added to one copy (see the adjacent thread asking for the dense-array guess) will silently miss the other.

Suggest extracting function binaryFindIndexByParentIndex<T extends { parentIndex: number }>(sorted: readonly T[], index: number): number to a shared module alongside UpPath (or core/tree/) and calling it from both removeChild sites. If the fast-path lands, the shared helper carries it once — one code change, two beneficiaries.

// inclusive
let min = 0;
// exclusive
let max = sorted.length;

while (min !== max) {
const mid = Math.floor((min + max) / 2);
const found = sorted[mid]!.parentIndex;
if (found === index) {
return mid;
} else if (found > index) {
max = mid;
} else {
min = mid + 1;
}
}
return -1;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: binaryFindIndex here is a textbook [min, max) search with no fast-path. The sibling binaryFind immediately above (anchorSet.ts:1284-1310) opens with const guess = sorted[index]; if (guess?.parentIndex === index) return guess;, and its privateRemarks (anchorSet.ts:1273-1283) explicitly warns "current usages tends to fully populate the anchor tree leading the correct array index to be the requested parent index … care and benchmarking should be used when messing with this function."

The PR description says the fast-path "doesn't apply to the index-by-parentIndex case", but when the field is dense sorted[index].parentIndex === index, so the array index of the hit equals index — one read instead of log n iterations. removeChild runs from considerDispose on every anchor free, which is exactly the path CraigMacomber tuned in #22717 ("I started with a more generic binary search which returned the index, but this was so dominated by small data sizes (0 and 1 are the common array sizes here) that it had a lot of overhead…", 2024-10-03).

Suggest adding the guess fast-path before the binary loop and updating binaryFind's privateRemarks to also cover the new index variant. Re-run the BubbleBench-style anchor benchmarks #22717 used to confirm parity.

Suggested change
}
function binaryFindIndex(sorted: readonly PathNode[], index: number): number {
const guess = sorted[index];
if (guess !== undefined && guess.parentIndex === index) {
return index;
}
let min = 0;
let max = sorted.length;
while (min < max) {
const mid = (min + max) >>> 1;
const current = sorted[mid].parentIndex;
if (current === index) {
return mid;
} else if (current < index) {
min = mid + 1;
} else {
max = mid;
}
}
return -1;
}


interface BufferedEvent {
node: PathNode;
event: keyof AnchorEvents;
Expand Down
39 changes: 33 additions & 6 deletions packages/dds/tree/src/core/tree/sparseTree.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* Licensed under the MIT License.
*/

import { assert } from "@fluidframework/core-utils/internal";
import { assert, oob } from "@fluidframework/core-utils/internal";

import type { FieldKey } from "../schema-stored/index.js";

Expand Down Expand Up @@ -99,8 +99,8 @@ export class SparseNode<TData> implements UpPath {
if (field === undefined) {
return undefined;
}
// TODO: should do more optimized search (ex: binary search or better) using index.
return field.find((c) => c.parentIndex === index);
const childIndex = binaryFindIndex(field, index);
return childIndex === -1 ? undefined : field[childIndex];
}

/**
Expand All @@ -111,9 +111,8 @@ export class SparseNode<TData> implements UpPath {
public removeChild(child: SparseNode<TData>): void {
const key = child.parentField;
const field = this.children.get(key);
// TODO: should do more optimized search (ex: binary search or better) using child.parentIndex()
// Note that this is the index in the list of child paths, not the index within the field
const childIndex = field?.indexOf(child) ?? -1;
// Note that this is the index in the list of child paths, not the index within the field.
const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex);
assert(childIndex !== -1, 0x4a5 /* child must be parented to be removed */);
field?.splice(childIndex, 1);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Same identity-loss concern as the anchorSet.ts assert. The prior field?.indexOf(child) returned -1 unless the exact child reference was present; binaryFindIndex(field, child.parentIndex) matches any node with that parentIndex. If the sorted-by-parentIndex invariant ever drifted (duplicate, stale reference), this would splice the wrong node silently.

Suggested change
field?.splice(childIndex, 1);
assert(childIndex !== -1 && field![childIndex] === child, 0x4a5 /* child must be in the field */);

if (field?.length === 0) {
Expand Down Expand Up @@ -145,6 +144,34 @@ export class SparseNode<TData> implements UpPath {
}
}

/**
* Find the array index of a child SparseNode by its parentIndex using a binary search.
* @param sorted - array of SparseNode's sorted by parentIndex.
* @param index - parentIndex being looked for.
* @returns array index of the child with the requested parentIndex, or -1 if not found.
*/
function binaryFindIndex<TData>(sorted: readonly SparseNode<TData>[], index: number): number {
// inclusive
let min = 0;
// exclusive
let max = sorted.length;

while (min !== max) {
const mid = Math.floor((min + max) / 2);
// mid < max <= sorted.length, so this access is always defined.
const midNode = sorted[mid] ?? oob();
const found = midNode.parentIndex;
if (found === index) {
return mid;
} else if (found > index) {
max = mid;
} else {
min = mid + 1;
}
}
return -1;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Same missing fast-path as the anchorSet.ts copy. When the field is dense sorted[index].parentIndex === index, so the answer is index — one read instead of log n iterations. The sibling binaryFind (and binaryFind in anchorSet.ts) opens with exactly this guess to avoid the loop on the common dense-array shape that #22717's benchmarks were tuned against. Suggest mirroring the fast-path here and re-running the relevant benchmark to confirm parity.

Suggested change
}
function binaryFindIndex<TData>(sorted: readonly SparseNode<TData>[], index: number): number {
const guess = sorted[index];
if (guess !== undefined && guess.parentIndex === index) {
return index;
}
let min = 0;
let max = sorted.length;
while (min < max) {
const mid = (min + max) >>> 1;
const current = sorted[mid].parentIndex;
if (current === index) {
return mid;
} else if (current < index) {
min = mid + 1;
} else {
max = mid;
}
}
return -1;
}


export function getDescendant<TData>(
ancestor: SparseNode<TData>,
path: UpPath | undefined,
Expand Down
55 changes: 55 additions & 0 deletions packages/dds/tree/src/shared-tree-core/editManager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,19 @@ export class EditManager<
return branch.getLocalCommits();
}

/**
* @returns the index of the local commit with the given revision in {@link getLocalCommits} for the given branch,
* or `undefined` if no such commit exists.
* @remarks O(1) alternative to scanning the array returned by {@link getLocalCommits}.
*/
public getLocalCommitIndexByRevision(
branchId: BranchId,
revision: RevisionTag,
): number | undefined {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: getLocalCommitIndexByRevision(branchId, revision): number | undefined leaks the storage shape — callers need to know the result is an index into localCommits and must reason about the new headOffset index space the side-map maintains. The only call site (sharedTreeCore.ts:558-566) immediately does localCommits.slice(revisionIndex), so the index is a means to a suffix slice.

Both solution-space reviewers independently designed a getLocalCommitsFrom(branchId, revision): readonly GraphCommit<TChangeset>[] returning the suffix slice (or undefined/empty). That matches the symmetry of the existing getLocalCommits(branchId) accessor at editManager.ts:422-425, removes the undefined-vs--1 sentinel split the PR's own "Risks and tradeoffs" calls out, and keeps EditManager's storage substitutable.

The fair counter is that getLocalCommits() already exposes the array shape — but it does not expose headOffset, which this accessor effectively does. If you'd rather keep the index API for callers we haven't seen yet, a one-line // returns an index into getLocalCommits(branchId); undefined if not present comment here would be enough to ward off the next reader.

const branch = this.getSharedBranch(branchId);
return branch.getLocalCommitIndexByRevision(revision);
}

/**
* Gets the length of the longest branch maintained by this `EditManager`.
* This may be the length of a peer branch or the local branch.
Expand Down Expand Up @@ -676,6 +689,21 @@ class SharedBranch<TEditor extends ChangeFamilyEditor, TChangeset> {
*/
private readonly localCommits: GraphCommit<TChangeset>[] = [];

/**
* Side-map for O(1) revision -\> {@link localCommits} index lookup.
* @remarks
* Stored values are "absolute" indices that are NOT affected by shifts off the front of {@link localCommits}.
* The actual array index is computed as `value - localCommitsHeadOffset`.
* This avoids having to update every entry in the map when a commit is sequenced (shifted) off the front.
*/
private readonly localCommitsByRevision = new Map<RevisionTag, number>();

/**
* The number of commits that have been shifted off the front of {@link localCommits} since the last bulk rebuild.
* Used together with {@link localCommitsByRevision} to translate absolute indices into current array indices.
*/
private localCommitsHeadOffset = 0;

/**
* Records extra data associated with sequenced commits.
* This does not include an entry for the {@link trunkBase}.
Expand Down Expand Up @@ -709,6 +737,10 @@ class SharedBranch<TEditor extends ChangeFamilyEditor, TChangeset> {
this.localBranch.events.on("afterChange", (event) => {
if (event.type === "append") {
for (const commit of event.newCommits) {
this.localCommitsByRevision.set(
commit.revision,
this.localCommits.length + this.localCommitsHeadOffset,
);
this.localCommits.push(commit);
}
} else {
Expand All @@ -717,6 +749,11 @@ class SharedBranch<TEditor extends ChangeFamilyEditor, TChangeset> {
[this.localBranch.getHead(), this.localCommits],
this.trunk.getHead(),
);
this.localCommitsByRevision.clear();
this.localCommitsHeadOffset = 0;
for (const [i, commit] of this.localCommits.entries()) {
this.localCommitsByRevision.set(commit.revision, i);
}
}
});
}
Expand Down Expand Up @@ -890,6 +927,18 @@ class SharedBranch<TEditor extends ChangeFamilyEditor, TChangeset> {
return this.localCommits;
}

/**
* @returns the index of the local commit with the given revision in {@link getLocalCommits}, or `undefined` if none exists.
* @remarks This is an O(1) alternative to calling `findIndex` on the array returned by {@link getLocalCommits}.
*/
public getLocalCommitIndexByRevision(revision: RevisionTag): number | undefined {
const absoluteIndex = this.localCommitsByRevision.get(revision);
if (absoluteIndex === undefined) {
return undefined;
}
return absoluteIndex - this.localCommitsHeadOffset;
}

/**
* Promote the oldest un-sequenced commit on the local branch to the head of the trunk.
* @param sequenceId - The sequence id of the new trunk commit
Expand Down Expand Up @@ -918,6 +967,12 @@ class SharedBranch<TEditor extends ChangeFamilyEditor, TChangeset> {
firstLocalCommit !== undefined,
0x6b5 /* Received a sequenced change from the local session despite having no local changes */,
);
this.localCommitsByRevision.delete(firstLocalCommit.revision);
this.localCommitsHeadOffset += 1;
if (this.localCommits.length === 0) {
// Reset the offset when the array drains so it cannot grow unboundedly across sessions.
this.localCommitsHeadOffset = 0;
}

const prevSequenceId = this.getCommitSequenceId(this.trunk.getHead().revision);
this.pushGraphCommitToTrunk(sequenceId, firstLocalCommit, sessionId);
Expand Down
10 changes: 8 additions & 2 deletions packages/dds/tree/src/shared-tree-core/sharedTreeCore.ts
Original file line number Diff line number Diff line change
Expand Up @@ -555,8 +555,14 @@ export class SharedTreeCore<TEditor extends ChangeFamilyEditor, TChange>

const getLocalCommits = (): GraphCommit<TChange>[] => {
const localCommits = this.editManager.getLocalCommits(branchId);
const revisionIndex = localCommits.findIndex((c) => c.revision === revision);
assert(revisionIndex >= 0, 0xbdb /* revision must exist in local commits */);
const revisionIndex = this.editManager.getLocalCommitIndexByRevision(
branchId,
revision,
);
assert(
revisionIndex !== undefined,
0xbdb /* revision must exist in local commits */,
);
return localCommits.slice(revisionIndex);
};

Expand Down
32 changes: 32 additions & 0 deletions packages/dds/tree/src/test/core/tree/anchorSet.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,38 @@ describe("AnchorSet", () => {
assert.equal(listenerFired, true);
});

it("removeChild locates the child by parentIndex in a sparse, non-monotonic field", () => {
// This is a regression test for `PathNode.removeChild`'s binary search.
// When the child array is sparse (so the array index is not equal to the parentIndex)
// and contains children with non-monotonic parentIndex values relative to the array
// indices, the binary search must still locate the correct child to remove.
// The order in which we forget anchors guarantees that no `removeChild` invocation
// hits the trivial case where the child sits at array index 0.
const anchors = new AnchorSet();
// Sparse parentIndex values: array indices 0, 1, 2, 3 hold parentIndex 3, 7, 42, 100.
const anchorAt3 = anchors.track(makePath([fieldFoo, 3]));
const anchorAt7 = anchors.track(makePath([fieldFoo, 7]));
const anchorAt42 = anchors.track(makePath([fieldFoo, 42]));
const anchorAt100 = anchors.track(makePath([fieldFoo, 100]));

// Remove from the middle of the field, exercising a real binary search.
anchors.forget(anchorAt42);
assert.equal(anchors.locate(anchorAt3)?.parentIndex, 3);
assert.equal(anchors.locate(anchorAt7)?.parentIndex, 7);
assert.equal(anchors.locate(anchorAt100)?.parentIndex, 100);

// Remove from the end of the field. The remaining array now holds parentIndex 3 and 7,
// so the search target (100) is past the last child and must terminate correctly.
anchors.forget(anchorAt100);
assert.equal(anchors.locate(anchorAt3)?.parentIndex, 3);
assert.equal(anchors.locate(anchorAt7)?.parentIndex, 7);

// Remove the trailing remaining child, then the leading one.
anchors.forget(anchorAt7);
assert.equal(anchors.locate(anchorAt3)?.parentIndex, 3);
anchors.forget(anchorAt3);
});

// Simple scenario using just anchorSets to validate if cache implementation of the FlexTree.treeStatus api works.
it("AnchorNode cache can be set and retrieved.", () => {
const anchors = new AnchorSet();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -846,6 +846,72 @@ export function testCorrectness(): void {
assert.equal(manager.getLongestBranchLength(), 2);
});
});

describe("getLocalCommitIndexByRevision", () => {
it("returns the correct index after all local commits are sequenced and new commits are appended", () => {
// This is a regression test for the side-map "headOffset drain-reset" branch.
// When all local commits have been sequenced (drained from the front), the head offset
// is reset to zero. New local commits appended afterward must still be locatable by
// revision.
const { manager } = testChangeEditManagerFactory({});
const local1 = applyLocalCommit(manager, [], 1);
const local2 = applyLocalCommit(manager, [1], 2);
assert.equal(manager.getLocalCommitIndexByRevision("main", local1.revision), 0);
assert.equal(manager.getLocalCommitIndexByRevision("main", local2.revision), 1);

// Sequence all local commits, draining the array and triggering the head-offset reset.
manager.addSequencedChanges([local1], local1.sessionId, brand(1), brand(0), "main");
manager.addSequencedChanges([local2], local2.sessionId, brand(2), brand(1), "main");
assert.deepEqual(manager.getLocalCommits("main"), []);
assert.equal(
manager.getLocalCommitIndexByRevision("main", local1.revision),
undefined,
);
assert.equal(
manager.getLocalCommitIndexByRevision("main", local2.revision),
undefined,
);

// Append new local commits. With the head offset reset to 0, the absolute indices
// stored for these commits should equal their actual array indices.
const local3 = applyLocalCommit(manager, [1, 2], 3);
const local4 = applyLocalCommit(manager, [1, 2, 3], 4);
assert.equal(manager.getLocalCommitIndexByRevision("main", local3.revision), 0);
assert.equal(manager.getLocalCommitIndexByRevision("main", local4.revision), 1);
});

it("returns the correct index after a non-append afterChange event rebuilds the side-map", () => {
// This is a regression test for the side-map "clear-and-rebuild" branch.
// A non-append afterChange event (e.g. one triggered by rebasing the local branch
// over incoming peer commits) clears the side-map and rebuilds it from the new
// local commit set. Lookups after such an event must still be accurate.
const { manager } = testChangeEditManagerFactory({});
const local1 = applyLocalCommit(manager, [], 1);
const local2 = applyLocalCommit(manager, [1], 2);
assert.equal(manager.getLocalCommitIndexByRevision("main", local1.revision), 0);
assert.equal(manager.getLocalCommitIndexByRevision("main", local2.revision), 1);

// A peer commit that the local commits must be rebased over triggers a non-append
// afterChange on the local branch.
const peerCommit1 = peerCommit(peer1, [], 10);
manager.addSequencedChanges(
[peerCommit1],
peerCommit1.sessionId,
brand(1),
brand(0),
"main",
);

// After the rebase, the local commits remain (with the same revisions) but the
// side-map has been cleared and rebuilt by the clear-and-rebuild branch.
const localCommitsAfterRebase = manager.getLocalCommits("main");
assert.equal(localCommitsAfterRebase.length, 2);
assert.equal(localCommitsAfterRebase[0].revision, local1.revision);
assert.equal(localCommitsAfterRebase[1].revision, local2.revision);
assert.equal(manager.getLocalCommitIndexByRevision("main", local1.revision), 0);
assert.equal(manager.getLocalCommitIndexByRevision("main", local2.revision), 1);
});
});
});

/**
Expand Down
Loading