Add set algebra operations (union, intersection, difference) by carpentry-agent[bot] · Pull Request #1 · carpentry-org/persistent

carpentry-agent · 2026-05-20T05:50:39Z

Add union, intersection, and difference to both define-ord-set and define-hash-set macros.

What changed

Three new functions are generated for every set type:

union — returns a set containing all elements from either input set
intersection — returns a set containing only elements present in both input sets
difference — returns a set containing elements in the first set but not the second

All three take two set references and return a new set, preserving the persistent/immutable semantics of the library.

Implementation

Each operation is built on reduce, avoiding any new C code or changes to the underlying map layer:

union: folds over one set, inserting each element into a copy of the other
intersection/difference: fold over the first set, checking membership in the second via contains?

The ord-set and hash-set versions differ only in how contains? is called (by value vs. by reference), matching the existing API conventions.

Tests

Seven new test cases per set type covering:

Union merging elements from both sets
Union with empty set is identity
Intersection keeping only shared elements
Intersection of disjoint sets is empty
Difference removing elements in second set
Difference with empty set is identity
Difference with self is empty

All ord-set tests pass. Hash-set tests cannot be verified due to a pre-existing UBSan crash in the hash-map/hash-set infrastructure (reproducible on main without these changes).

Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

hellerve

Review: set algebra operations

CI failure analysis

The CI failure on both macOS and Ubuntu is pre-existing and unrelated to this PR. The crash is a UBSan null-pointer dereference in the hash-map infrastructure (out/main.c:13651:20: runtime error: member access within null pointer of type 'Lambda'). It reproduces identically on main — the hash-map test crashes before the hash-set test even runs, and validate.sh uses set -euo pipefail so it aborts immediately.

Verified locally:

main hash-map test: same UBSan crash
main hash-set test: same UBSan crash (hash-set is backed by hash-map)
PR branch hash-map test file: zero diff from main

Local build/test results

Ord-set: all 14 tests pass (7 existing + 7 new), including memory balance check with --log-memory
Hash-set: UBSan crash before any tests execute (same as main)

Code correctness

The implementation is clean and correct.

union (reduce &(fn [acc x] (insert x &acc)) @a-ref b-ref): copies a, folds insert over b's elements. Duplicates handled by insert idempotency. Ownership is correct — @a-ref produces an owned copy for the initial accumulator, each iteration consumes acc via &acc borrow and replaces it with the insert return value.

intersection (reduce &(fn [acc x] (if (contains? ... b-ref) (insert x &acc) acc)) (empty) a-ref): iterates a, inserts into accumulator only if b also contains the element. The if branches correctly handle ownership — both branches return owned sets (either the new insert result or the unchanged acc).

difference: same pattern as intersection with inverted condition. Correct.

Ord-set vs hash-set contains? convention: ord-set contains? takes value by-value, so the lambda uses @&x to copy x for the membership check while preserving the original for potential insert. Hash-set contains? takes a reference, so the lambda uses &x to borrow without copying. Both are correct for their respective APIs.

Edge case coverage

The 7 new tests per set type cover the important cases well:

Union with overlapping sets (dedup)
Union with empty (identity)
Intersection with partial overlap
Intersection of disjoint sets (empty result)
Difference with partial overlap
Difference with empty (identity)
Difference with self (empty result)

Not tested but not a blocker: union(A, A) = A and commutativity union(A, B) = union(B, A). Both follow from insert idempotency which is already tested separately.

Formatting noise

The diff is ~4000 lines but the actual new code is ~48 lines (6 functions at ~8 lines each) plus ~120 lines of tests. The rest is formatter reformatting of the entire persistent.carp file. This makes the diff hard to review on GitHub but is harmless — it is purely cosmetic (indentation normalization, line wrapping). Worth noting for future PRs that reformatting should ideally be a separate commit.

No CHANGELOG

The repo has no CHANGELOG file, so nothing to update.

Verdict

Approve. The code is correct, well-tested, follows existing patterns, and the CI failure is pre-existing. The only nit is the large formatting diff — a future cleanup could split that into its own commit.

Add union, intersection, and difference to set macros

ba05a8e

hellerve approved these changes May 20, 2026

View reviewed changes

hellerve merged commit 46ed023 into main May 21, 2026
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add set algebra operations (union, intersection, difference)#1

Add set algebra operations (union, intersection, difference)#1
hellerve merged 1 commit into
mainfrom
claude/set-algebra

carpentry-agent Bot commented May 20, 2026

Uh oh!

hellerve left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carpentry-agent Bot commented May 20, 2026

What changed

Implementation

Tests

Uh oh!

hellerve left a comment

Choose a reason for hiding this comment

Review: set algebra operations

CI failure analysis

Local build/test results

Code correctness

Edge case coverage

Formatting noise

No CHANGELOG

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant