Skip to content

fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748

Open
jerryjch wants to merge 1 commit into
lance-format:mainfrom
jerryjch:update-with-rewrite-columns-fix-follow-up
Open

fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748
jerryjch wants to merge 1 commit into
lance-format:mainfrom
jerryjch:update-with-rewrite-columns-fix-follow-up

Conversation

@jerryjch
Copy link
Copy Markdown
Contributor

@jerryjch jerryjch commented May 12, 2026

Summary

  • Follow-up to fix: propagate update_columns offsets and partial last_updated for RewriteColumns #6650 (fix: propagate update_columns offsets and partial last_updated for
    RewriteColumns). Required by lance-spark#528.

  • Update.java: adds Map<Long, long[]> updatedFragmentOffsets field, 7-arg constructor,
    accessor updatedFragmentOffsets(), and Builder.updatedFragmentOffsets(...) setter.
    Defaults to Collections.emptyMap().

  • java/lance-jni/src/transaction.rs — two JNI directions updated:

    FromJava (convert_to_rust_operation for "Update"): reads updatedFragmentOffsets()
    from the Java object, deserializes Map<Long, long[]> entries into
    HashMap<u64, RoaringBitmap>, and sets updated_fragment_offsets: Some(UpdatedFragmentOffsets(..)) when non-empty, None when empty or null.

    IntoJava (convert_to_java_operation_inner for Operation::Update): stops ignoring
    updated_fragment_offsets (was updated_fragment_offsets: _). Serializes
    Option<UpdatedFragmentOffsets> to a Java HashMap<Long, long[]> — empty map when
    None, one long[] per fragment when Some. Uses per-iteration with_local_frame(4, ..)
    with direct HashMap.put via call_method to bound local-ref growth and avoid JMap
    wrapper lifetime conflicts. Updates the new_object constructor from the 6-arg form to
    the 7-arg form (..Ljava/util/Optional;Ljava/util/Map;)V to match the Java class.

  • Update.java equals / hashCode: equals uses offsetMapsEqual helper that
    deep-compares long[] values via Arrays.equals rather than reference equality.
    hashCode added; per-entry contribution is Long.hashCode(key) ^ Arrays.hashCode(value),
    summed across entries so the result is insertion-order-independent.

Background

PR #6650 added updated_fragment_offsets on the Rust Operation::Update (proto field 9),
build_manifest partial refresh logic, and FragmentUpdateResult.getUpdatedRowOffsets().
Two gaps remained:

  1. The Java Update class had no field for these offsets and convert_to_rust_operation
    always set updated_fragment_offsets: None, so the lance-spark commit path
    (UpdateColumnsBackfillBatchWrite) had no way to pass offsets to Rust and the partial
    refresh in build_manifest could never activate from a JVM caller.

  2. convert_to_java_operation_inner still used the old 6-arg constructor signature for
    new_object. With the 6-arg constructor removed from Update.java (replaced by the
    7-arg form), any Rust→Java materialization of Operation::Update (e.g. reading back a
    transaction) would fail at runtime with NoSuchMethodError.

Implementation notes

  • with_local_frame(4, ..) per bitmap entry in IntoJava bounds local-ref growth on large
    offset maps. JMap was avoided inside the frame because it holds a JObject with the
    outer frame's lifetime, causing borrow-checker conflicts; call_method on the outer
    java_map reference is used instead.
  • The Vec<i64> for bitmap values is allocated in Rust before entering the frame, so its
    lifetime is independent of JNI frame scope.
  • UpdatedFragmentOffsets added to the lance::dataset::transaction import.

Why the protobuf field alone is not enough

lance-spark commits by calling CommitBuilder.execute(transaction), which passes the Java
Transaction object to nativeCommitToDataset via JNI. The JNI handler calls
convert_to_rust_transactionconvert_to_rust_operation, which reflects on the Java
Update object to build the Rust Operation::Update struct. The protobuf field (field 9)
is only used when a Transaction is serialized as a proto blob; it has no effect on the
reflection-based JNI path unless the Java Update class exposes the field and the JNI
deserialization reads it.

Test plan

  • UpdateTest#testUpdatedFragmentOffsetsRoundTrip — commits an Update with a non-empty
    updatedFragmentOffsets map through CommitBuilder.execute (exercises the FromJava JNI
    path), reads the transaction back via Dataset.readTransaction() (exercises the IntoJava
    JNI path), and asserts the offsets match.

Compatibility

  • Java — builder callers: the Update constructor is private; all external callers go
    through Builder. The new updatedFragmentOffsets(...) setter is optional and defaults
    to Collections.emptyMap(), so existing Update.builder()...build() call sites compile
    and behave identically. The new updatedFragmentOffsets() accessor is additive.
  • Java — equals / hashCode: equals now deep-compares long[] values via
    Arrays.equals instead of reference equality; hashCode is added. These are correctness
    fixes — the previous equals was broken for distinct-but-equal arrays, and omitting
    hashCode when equals is overridden violated the Java contract. No well-formed caller
    depends on the broken behavior.
  • JNI — constructor signature: the IntoJava new_object call is updated from the 6-arg to
    the 7-arg form in the same PR. Both files must ship together; within that atomic change
    there is no compatibility gap.
  • Rust / proto: no changes. The updated_fragment_offsets proto field and Rust struct
    field were already added in fix: propagate update_columns offsets and partial last_updated for RewriteColumns #6650.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added bug Something isn't working java labels May 12, 2026
@jerryjch jerryjch force-pushed the update-with-rewrite-columns-fix-follow-up branch from c116d01 to 456e6dc Compare May 12, 2026 18:57
@jerryjch
Copy link
Copy Markdown
Contributor Author

cc @jackye1995 @hamersaw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant