Skip to content

Replace unbalanced BST with AVL tree in ordered map/set#5

Merged
hellerve merged 1 commit into
mainfrom
claude/avl-tree
Jun 9, 2026
Merged

Replace unbalanced BST with AVL tree in ordered map/set#5
hellerve merged 1 commit into
mainfrom
claude/avl-tree

Conversation

@carpentry-agent

Copy link
Copy Markdown

Summary

The define-ord-map (and by extension define-ord-set) macro previously used a plain binary search tree, which degrades to O(n) on sorted or adversarial input. This replaces it with a self-balancing AVL tree, guaranteeing O(log n) for insert, remove, and lookup regardless of insertion order.

What changed

Node type — added a height Int field to the internal BST node.

New private helpers (all private/hidden, inside the generated module):

  • subtree-height — height of a (Maybe NodeRc), returning 0 for Nothing
  • make-node — constructs a node with automatically computed height
  • rotate-right-node / rotate-left-node — single AVL rotations
  • balance-node — checks balance factor and applies LL, RR, LR, or RL rotation as needed

Modified functions:

  • insert-node — uses make-node for leaf/update creation; applies balance-node during bottom-up path reconstruction
  • remove-node — same treatment for deletion path reconstruction and successor replacement
  • extract-min — rebalances during its path reconstruction (removing the minimum can create right-heavy nodes)
  • singleton — uses make-node to set initial height

New public API:

  • height on both ord-map and ord-set — returns the AVL tree height (0 for empty), useful for diagnostics

Tests — 6 new assertions covering:

  • Empty/singleton height values
  • Logarithmic height after 1,000 ascending inserts
  • Logarithmic height after 1,000 descending inserts
  • Height bound for 10,000 ascending inserts (≤ 21)
  • Balance preserved after removing half the entries
  • Ord-set height delegation

All 33 ord-map tests and 21 ord-set tests pass, including memory-balance checks.

README — removed the "unbalanced BSTs; adversarial input degrades to linear" caveat.

Test plan

  • All existing ord-map tests pass (33/33)
  • All existing ord-set tests pass (21/21)
  • Memory balance tests pass with --log-memory
  • New AVL height tests verify logarithmic bounds
  • carp-fmt -c passes on all changed files
  • angler reports no new findings
  • No regressions in other data structure test suites

Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

Add height field to node type, implement single and double rotations
(LL, RR, LR, RL), and rebalance during path reconstruction in
insert-node, remove-node, and extract-min. Guarantees O(log n) for
all operations regardless of insertion order.

- New helpers: subtree-height, make-node, rotate-right-node,
  rotate-left-node, balance-node (all private/hidden)
- Expose height function on both ord-map and ord-set
- Tests verify logarithmic height after ascending, descending, and
  mixed insert/remove sequences; memory balance preserved
- Remove unbalanced-BST caveat from README

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

  • Build: pass (carp -b clean locally)
  • Tests: pass — 33/33 ord-map tests, 21/21 ord-set tests (including memory balance checks)
  • CI: macos pass, ubuntu still pending at time of review

Findings

I checked out the branch, built it, ran all tests, and carefully reviewed the AVL implementation. No bugs found.

Rotation correctness verified:

  • rotate-right-node: standard right rotation — y becomes root, z gets y's right subtree as its left child. Heights recomputed via make-node. ✓
  • rotate-left-node: mirror of right rotation. ✓
  • Both handle the Nothing child case defensively (shouldn't occur during normal AVL operations but won't crash).

Balance logic verified:

  • balance-node correctly implements all four AVL cases: LL (single right rotation), RR (single left rotation), LR (left-rotate child then right-rotate parent), RL (right-rotate child then left-rotate parent). ✓
  • Balance factor bf = left_height - right_height, thresholds at ±1. ✓

Integration points verified:

  • insert-node: creates leaf via make-node (height 1), applies balance-node at every level during bottom-up path reconstruction. ✓
  • remove-node: same rebalancing pattern during reconstruction. Successor replacement via extract-min also rebalances. ✓
  • extract-min: rebalances during its own path reconstruction (removing the minimum can create right-heavy subtrees). ✓
  • Key-equal update in insert-node correctly preserves height with %node-init directly (no structural change). ✓

Height computation:

  • make-node computes h = 1 + max(left_height, right_height) — correct for height-1 convention (leaf = 1, empty = 0). ✓
  • Heights cascade correctly through rotations since all node construction goes through make-node.

Tests are solid:

  • Ascending 1000 → height ≤ 15 (theoretical AVL max ~14.4). ✓
  • Descending 1000 → height ≤ 15. ✓
  • Ascending 10000 → height ≤ 21 (theoretical ~19.2). ✓
  • Balance preserved after removing 50% of entries. ✓
  • Ord-set height delegation works correctly. ✓
  • Memory balance tests pass — no leaks introduced by the structural changes.

README correctly removes the "unbalanced BSTs; adversarial input degrades to linear" caveat.

New public API: height function on both ord-map and ord-set is a reasonable diagnostic addition.

Verdict: revise

The implementation is correct — I could not find bugs or edge cases that break the AVL invariant. Recommending revise only because ubuntu CI is still pending. Once CI is fully green, this is ready to merge.

@hellerve hellerve merged commit 6ef5077 into main Jun 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant