Skip to content

Add set algebra operations (union, intersection, difference)#1

Merged
hellerve merged 1 commit into
mainfrom
claude/set-algebra
May 21, 2026
Merged

Add set algebra operations (union, intersection, difference)#1
hellerve merged 1 commit into
mainfrom
claude/set-algebra

Conversation

@carpentry-agent

Copy link
Copy Markdown

Add union, intersection, and difference to both define-ord-set and define-hash-set macros.

What changed

Three new functions are generated for every set type:

  • union — returns a set containing all elements from either input set
  • intersection — returns a set containing only elements present in both input sets
  • difference — returns a set containing elements in the first set but not the second

All three take two set references and return a new set, preserving the persistent/immutable semantics of the library.

Implementation

Each operation is built on reduce, avoiding any new C code or changes to the underlying map layer:

  • union: folds over one set, inserting each element into a copy of the other
  • intersection/difference: fold over the first set, checking membership in the second via contains?

The ord-set and hash-set versions differ only in how contains? is called (by value vs. by reference), matching the existing API conventions.

Tests

Seven new test cases per set type covering:

  • Union merging elements from both sets
  • Union with empty set is identity
  • Intersection keeping only shared elements
  • Intersection of disjoint sets is empty
  • Difference removing elements in second set
  • Difference with empty set is identity
  • Difference with self is empty

All ord-set tests pass. Hash-set tests cannot be verified due to a pre-existing UBSan crash in the hash-map/hash-set infrastructure (reproducible on main without these changes).


Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

@hellerve hellerve left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: set algebra operations

CI failure analysis

The CI failure on both macOS and Ubuntu is pre-existing and unrelated to this PR. The crash is a UBSan null-pointer dereference in the hash-map infrastructure (out/main.c:13651:20: runtime error: member access within null pointer of type 'Lambda'). It reproduces identically on main — the hash-map test crashes before the hash-set test even runs, and validate.sh uses set -euo pipefail so it aborts immediately.

Verified locally:

  • main hash-map test: same UBSan crash
  • main hash-set test: same UBSan crash (hash-set is backed by hash-map)
  • PR branch hash-map test file: zero diff from main

Local build/test results

  • Ord-set: all 14 tests pass (7 existing + 7 new), including memory balance check with --log-memory
  • Hash-set: UBSan crash before any tests execute (same as main)

Code correctness

The implementation is clean and correct.

union (reduce &(fn [acc x] (insert x &acc)) @a-ref b-ref): copies a, folds insert over b's elements. Duplicates handled by insert idempotency. Ownership is correct — @a-ref produces an owned copy for the initial accumulator, each iteration consumes acc via &acc borrow and replaces it with the insert return value.

intersection (reduce &(fn [acc x] (if (contains? ... b-ref) (insert x &acc) acc)) (empty) a-ref): iterates a, inserts into accumulator only if b also contains the element. The if branches correctly handle ownership — both branches return owned sets (either the new insert result or the unchanged acc).

difference: same pattern as intersection with inverted condition. Correct.

Ord-set vs hash-set contains? convention: ord-set contains? takes value by-value, so the lambda uses @&x to copy x for the membership check while preserving the original for potential insert. Hash-set contains? takes a reference, so the lambda uses &x to borrow without copying. Both are correct for their respective APIs.

Edge case coverage

The 7 new tests per set type cover the important cases well:

  • Union with overlapping sets (dedup)
  • Union with empty (identity)
  • Intersection with partial overlap
  • Intersection of disjoint sets (empty result)
  • Difference with partial overlap
  • Difference with empty (identity)
  • Difference with self (empty result)

Not tested but not a blocker: union(A, A) = A and commutativity union(A, B) = union(B, A). Both follow from insert idempotency which is already tested separately.

Formatting noise

The diff is ~4000 lines but the actual new code is ~48 lines (6 functions at ~8 lines each) plus ~120 lines of tests. The rest is formatter reformatting of the entire persistent.carp file. This makes the diff hard to review on GitHub but is harmless — it is purely cosmetic (indentation normalization, line wrapping). Worth noting for future PRs that reformatting should ideally be a separate commit.

No CHANGELOG

The repo has no CHANGELOG file, so nothing to update.

Verdict

Approve. The code is correct, well-tested, follows existing patterns, and the CI failure is pre-existing. The only nit is the large formatting diff — a future cleanup could split that into its own commit.

@hellerve hellerve merged commit 46ed023 into main May 21, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant