Skip to content

Support direct Arrow RecordBatch to Parquet conversion #2963

@luoyuxia

Description

@luoyuxia

Search before asking

  • I searched in the issues and found nothing similar.

Description

This issue tracks the Arrow-to-Parquet conversion part of splitting parent task #437.

We need a reusable utility or writer path that can convert Arrow RecordBatch directly into Parquet without first converting through row-oriented representations. This would improve the efficiency and clarity of Arrow-native data flows, especially for tiering and other lakehouse-related write paths.

Possible scope:

  • provide a reusable Arrow RecordBatch to Parquet conversion utility;
  • define the supported Arrow/Parquet type mapping and failure behavior;
  • make the conversion path suitable for direct reuse by tiering writers and other lake components.

This is intended to be one sub-task of #437, while enabling tiering source to read data as Arrow RecordBatch is tracked separately.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions