Add set algebra operations (union, intersection, difference)#1
Conversation
hellerve
left a comment
There was a problem hiding this comment.
Review: set algebra operations
CI failure analysis
The CI failure on both macOS and Ubuntu is pre-existing and unrelated to this PR. The crash is a UBSan null-pointer dereference in the hash-map infrastructure (out/main.c:13651:20: runtime error: member access within null pointer of type 'Lambda'). It reproduces identically on main — the hash-map test crashes before the hash-set test even runs, and validate.sh uses set -euo pipefail so it aborts immediately.
Verified locally:
mainhash-map test: same UBSan crashmainhash-set test: same UBSan crash (hash-set is backed by hash-map)- PR branch hash-map test file: zero diff from main
Local build/test results
- Ord-set: all 14 tests pass (7 existing + 7 new), including memory balance check with
--log-memory - Hash-set: UBSan crash before any tests execute (same as main)
Code correctness
The implementation is clean and correct.
union (reduce &(fn [acc x] (insert x &acc)) @a-ref b-ref): copies a, folds insert over b's elements. Duplicates handled by insert idempotency. Ownership is correct — @a-ref produces an owned copy for the initial accumulator, each iteration consumes acc via &acc borrow and replaces it with the insert return value.
intersection (reduce &(fn [acc x] (if (contains? ... b-ref) (insert x &acc) acc)) (empty) a-ref): iterates a, inserts into accumulator only if b also contains the element. The if branches correctly handle ownership — both branches return owned sets (either the new insert result or the unchanged acc).
difference: same pattern as intersection with inverted condition. Correct.
Ord-set vs hash-set contains? convention: ord-set contains? takes value by-value, so the lambda uses @&x to copy x for the membership check while preserving the original for potential insert. Hash-set contains? takes a reference, so the lambda uses &x to borrow without copying. Both are correct for their respective APIs.
Edge case coverage
The 7 new tests per set type cover the important cases well:
- Union with overlapping sets (dedup)
- Union with empty (identity)
- Intersection with partial overlap
- Intersection of disjoint sets (empty result)
- Difference with partial overlap
- Difference with empty (identity)
- Difference with self (empty result)
Not tested but not a blocker: union(A, A) = A and commutativity union(A, B) = union(B, A). Both follow from insert idempotency which is already tested separately.
Formatting noise
The diff is ~4000 lines but the actual new code is ~48 lines (6 functions at ~8 lines each) plus ~120 lines of tests. The rest is formatter reformatting of the entire persistent.carp file. This makes the diff hard to review on GitHub but is harmless — it is purely cosmetic (indentation normalization, line wrapping). Worth noting for future PRs that reformatting should ideally be a separate commit.
No CHANGELOG
The repo has no CHANGELOG file, so nothing to update.
Verdict
Approve. The code is correct, well-tested, follows existing patterns, and the CI failure is pre-existing. The only nit is the large formatting diff — a future cleanup could split that into its own commit.
Add
union,intersection, anddifferenceto bothdefine-ord-setanddefine-hash-setmacros.What changed
Three new functions are generated for every set type:
union— returns a set containing all elements from either input setintersection— returns a set containing only elements present in both input setsdifference— returns a set containing elements in the first set but not the secondAll three take two set references and return a new set, preserving the persistent/immutable semantics of the library.
Implementation
Each operation is built on
reduce, avoiding any new C code or changes to the underlying map layer:union: folds over one set, inserting each element into a copy of the otherintersection/difference: fold over the first set, checking membership in the second viacontains?The ord-set and hash-set versions differ only in how
contains?is called (by value vs. by reference), matching the existing API conventions.Tests
Seven new test cases per set type covering:
All ord-set tests pass. Hash-set tests cannot be verified due to a pre-existing UBSan crash in the hash-map/hash-set infrastructure (reproducible on
mainwithout these changes).Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.