Skip to content

Typed LayoutId, reader.layout package, vortex.zoned compat#194

Merged
dfa1 merged 12 commits into
mainfrom
layout-id
Jul 4, 2026
Merged

Typed LayoutId, reader.layout package, vortex.zoned compat#194
dfa1 merged 12 commits into
mainfrom
layout-id

Conversation

@dfa1

@dfa1 dfa1 commented Jul 4, 2026

Copy link
Copy Markdown
Owner

What

Follow-up to #193, completing the typed-id arc on the layout side:

  1. 7df3a0dbcore.model.LayoutId: sealed interface with the same WellKnown/Custom shape as EncodingId. Layouts stay open — the Rust team's guidance (New third-party implementation: vortex-java vortex-data/vortex#8250) and the reference implementation both treat layouts as runtime-pluggable. Layout's misnamed String encodingId component becomes LayoutId layoutId; unknown layouts keep failing loudly (Rust's default — no allowUnknown for layouts).
  2. b08ace79Layout + ZonedStatsSchema move to the new reader.layout package, mirroring reader.decode and giving the eventual LayoutDecoder SPI a landing zone. FlatSegmentDecoder deliberately stays in the reader root (its only callers are there; moving it would force it back to public). Pitest targetClasses FQN updated.
  3. 7588aa31UnknownArray.encodingId becomes a typed EncodingId (the String was a fossil from the closed-enum era).

Compat fix (found by checking the Rust reference)

Rust renamed the zone-map layout id to vortex.zoned, keeping vortex.stats as a legacy alias. Our reader only knew vortex.stats, so files from current Rust writers would fail layout dispatch. Now both ids route through the zoned path (parse and zone-map pruning — pinned by a parameterized test over both aliases). The writer keeps emitting vortex.stats, which old and new Rust readers both accept; the integration oracle confirms byte-identical output.

Verification

  • ./mvnw verify green after every commit — all 15 modules including the failsafe Rust-interop suite.
  • ./mvnw javadoc:javadoc -pl core — zero output.
  • Adversarial review pass: no blockers, no should-fixes; its one cosmetic nit (import grouping) taken.

🤖 Generated with Claude Code

dfa1 and others added 8 commits July 4, 2026 11:43
LayoutId mirrors the sealed EncodingId shape — WellKnown constants
(FLAT, CHUNKED, STRUCT, ZONED, STATS, DICT) plus Custom — because
layouts are runtime-pluggable in the Rust reference (two separate
footer spec namespaces sharing the string wire form; vortex.flat is
layout-only). Layout's misnamed String encodingId component becomes
LayoutId layoutId; unknown layouts still fail loudly (Rust default,
no allowUnknown for layouts), now with a typed id in the error.

Compat fix uncovered by the reference check: Rust renamed the
zone-map layout id to vortex.zoned, keeping vortex.stats as legacy
alias — the reader now routes BOTH through the zoned path, so files
from current Rust writers scan and prune correctly. The writer
keeps emitting vortex.stats, which old and new Rust readers accept;
integration oracle confirms byte-identical output.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Layout and ZonedStatsSchema get their own package, mirroring
reader.decode on the encoding side — and giving the future
LayoutDecoder SPI a landing zone. FlatSegmentDecoder stays in the
reader root: its only callers live there and moving it would force
it back to public. Pitest targetClasses updated for the new
Layout FQN.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
String encodingId predated the sealed EncodingId — a closed enum
could not represent an unknown id, so the raw string was the only
option. Now the component is typed: a Custom, or a WellKnown whose
decoder is not registered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
LayoutDecoder + LayoutRegistry in reader.layout mirror the
ReadRegistry idiom (builder final-freeze, string-keyed dispatch,
duplicate registration throws, no service file — programmatic
registration like ExtensionDecoder). The four built-ins move out
of ScanIterator verbatim: Flat, Chunked, Zoned (claims both the
canonical vortex.zoned and legacy vortex.stats ids via the
layoutIds() set), Dict. ScanIterator.decodeLayout is now one
registry call; zone-map pruning and chunk planning keep inspecting
built-ins only — the SPI covers full-column subtree decode.

Wired end-to-end per the no-decorative-flags rule: VortexHandle
gains layoutRegistry(), both readers take open(..., LayoutRegistry)
overloads, and a scan through a custom registry is proven by test.
Unknown layouts still fail loudly (Rust default). Reverses the
"Layout is a fixed set, no SPI" design decision — the reference
implementation treats layouts as runtime-pluggable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review findings on the LayoutDecoder SPI:

- ChunkedLayoutDecoder decoded its leaves via a direct static
  FlatLayoutDecoder call, silently bypassing the registry — a custom
  decoder registered for a leaf id was not honored under a chunked
  parent, making the SPI partially decorative. Leaves now route
  through ctx.decodeChild; the end-to-end test asserts the flat
  delegator itself fires during a real scan. Integration oracle
  confirms identical behavior for built-ins (dict leaves under
  chunked included).
- ScanLayoutContext.segmentSpec and DictLayoutDecoder child access
  now guard malformed indexes/arity with VortexException instead of
  leaking IndexOutOfBoundsException from untrusted input.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@dfa1

dfa1 commented Jul 4, 2026

Copy link
Copy Markdown
Owner Author

Extended the PR with the LayoutDecoder SPI (fc488d0 + dd196f1): layout decode is now pluggable via LayoutRegistry mirroring the ReadRegistry idiom, with the four built-ins extracted verbatim from ScanIterator. Review pass verified extraction fidelity line-by-line; its findings (registry bypass under chunked parents, two untrusted-input guards) are fixed in dd196f1. Reverses the "Layout is a fixed set, no SPI" design decision per the Rust team's runtime-pluggability guidance. Full verify green after every commit, integration oracle included.

dfa1 and others added 4 commits July 4, 2026 13:48
Renamed after its exact Rust counterpart (SerializedArray in
vortex-array/src/serde.rs: "a parsed but not-yet-decoded
deserialized array" whose decode() resolves the encoding id against
the spec table and consults the registry). "Flat" is a layout
concept and "segment" a byte-range concept — the unit this class
decodes is one serialized array message. VortexHandle's
decodeFlatSegment follows as decodeSegment, next to rawSegment.
Pitest target FQN and living docs updated; released changelog
entries and ADR 0001 stay as written.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ReadRegistry and LayoutRegistry map keys become EncodingId/LayoutId:
since ArrayNode and Layout carry parsed typed ids, string-keyed
dispatch just round-tripped through the wire form. Strings at the
boundary, types inside. TreeMap orders by wire string via
comparator — the sealed ids are not Comparable, and a Custom key
must not throw.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Relative links rewritten: docs pages point at ../adr/, ADR upward
references drop one level, ADR links into docs/ gain the prefix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 8012c13 into main Jul 4, 2026
6 checks passed
@dfa1 dfa1 deleted the layout-id branch July 4, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant