fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748
Open
jerryjch wants to merge 1 commit into
Open
fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748jerryjch wants to merge 1 commit into
jerryjch wants to merge 1 commit into
Conversation
…iteColumns commits
c116d01 to
456e6dc
Compare
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to fix: propagate update_columns offsets and partial last_updated for RewriteColumns #6650 (fix: propagate update_columns offsets and partial last_updated for
RewriteColumns). Required by lance-spark#528.
Update.java: addsMap<Long, long[]> updatedFragmentOffsetsfield, 7-arg constructor,accessor
updatedFragmentOffsets(), andBuilder.updatedFragmentOffsets(...)setter.Defaults to
Collections.emptyMap().java/lance-jni/src/transaction.rs— two JNI directions updated:FromJava (
convert_to_rust_operationfor"Update"): readsupdatedFragmentOffsets()from the Java object, deserializes
Map<Long, long[]>entries intoHashMap<u64, RoaringBitmap>, and setsupdated_fragment_offsets: Some(UpdatedFragmentOffsets(..))when non-empty,Nonewhen empty or null.IntoJava (
convert_to_java_operation_innerforOperation::Update): stops ignoringupdated_fragment_offsets(wasupdated_fragment_offsets: _). SerializesOption<UpdatedFragmentOffsets>to a JavaHashMap<Long, long[]>— empty map whenNone, onelong[]per fragment whenSome. Uses per-iterationwith_local_frame(4, ..)with direct
HashMap.putviacall_methodto bound local-ref growth and avoidJMapwrapper lifetime conflicts. Updates the
new_objectconstructor from the 6-arg form tothe 7-arg form
(..Ljava/util/Optional;Ljava/util/Map;)Vto match the Java class.Update.javaequals/hashCode:equalsusesoffsetMapsEqualhelper thatdeep-compares
long[]values viaArrays.equalsrather than reference equality.hashCodeadded; per-entry contribution isLong.hashCode(key) ^ Arrays.hashCode(value),summed across entries so the result is insertion-order-independent.
Background
PR #6650 added
updated_fragment_offsetson the RustOperation::Update(proto field 9),build_manifestpartial refresh logic, andFragmentUpdateResult.getUpdatedRowOffsets().Two gaps remained:
The Java
Updateclass had no field for these offsets andconvert_to_rust_operationalways set
updated_fragment_offsets: None, so the lance-spark commit path(UpdateColumnsBackfillBatchWrite) had no way to pass offsets to Rust and the partial
refresh in
build_manifestcould never activate from a JVM caller.convert_to_java_operation_innerstill used the old 6-arg constructor signature fornew_object. With the 6-arg constructor removed fromUpdate.java(replaced by the7-arg form), any Rust→Java materialization of
Operation::Update(e.g. reading back atransaction) would fail at runtime with
NoSuchMethodError.Implementation notes
with_local_frame(4, ..)per bitmap entry in IntoJava bounds local-ref growth on largeoffset maps.
JMapwas avoided inside the frame because it holds aJObjectwith theouter frame's lifetime, causing borrow-checker conflicts;
call_methodon the outerjava_mapreference is used instead.Vec<i64>for bitmap values is allocated in Rust before entering the frame, so itslifetime is independent of JNI frame scope.
UpdatedFragmentOffsetsadded to thelance::dataset::transactionimport.Why the protobuf field alone is not enough
lance-spark commits by calling
CommitBuilder.execute(transaction), which passes the JavaTransactionobject tonativeCommitToDatasetvia JNI. The JNI handler callsconvert_to_rust_transaction→convert_to_rust_operation, which reflects on the JavaUpdateobject to build the RustOperation::Updatestruct. The protobuf field (field 9)is only used when a Transaction is serialized as a proto blob; it has no effect on the
reflection-based JNI path unless the Java
Updateclass exposes the field and the JNIdeserialization reads it.
Test plan
UpdateTest#testUpdatedFragmentOffsetsRoundTrip— commits anUpdatewith a non-emptyupdatedFragmentOffsetsmap throughCommitBuilder.execute(exercises the FromJava JNIpath), reads the transaction back via
Dataset.readTransaction()(exercises the IntoJavaJNI path), and asserts the offsets match.
Compatibility
Updateconstructor isprivate; all external callers gothrough
Builder. The newupdatedFragmentOffsets(...)setter is optional and defaultsto
Collections.emptyMap(), so existingUpdate.builder()...build()call sites compileand behave identically. The new
updatedFragmentOffsets()accessor is additive.equals/hashCode:equalsnow deep-compareslong[]values viaArrays.equalsinstead of reference equality;hashCodeis added. These are correctnessfixes — the previous
equalswas broken for distinct-but-equal arrays, and omittinghashCodewhenequalsis overridden violated the Java contract. No well-formed callerdepends on the broken behavior.
new_objectcall is updated from the 6-arg tothe 7-arg form in the same PR. Both files must ship together; within that atomic change
there is no compatibility gap.
updated_fragment_offsetsproto field and Rust structfield were already added in fix: propagate update_columns offsets and partial last_updated for RewriteColumns #6650.