Skip to content

Implement ParquetFormatModel and update write_file to use the format API#3381

Open
nssalian wants to merge 3 commits into
apache:mainfrom
nssalian:file-format-parquet-impl
Open

Implement ParquetFormatModel and update write_file to use the format API#3381
nssalian wants to merge 3 commits into
apache:mainfrom
nssalian:file-format-parquet-impl

Conversation

@nssalian
Copy link
Copy Markdown
Contributor

Continued work on #3100

PR Description

Follow-up to #3119. Implements ParquetFormatWriter and ParquetFormatModel, registers Parquet in the FileFormatFactory, and rewrites write_file to dispatch through the factory using the write.format.default table property. Future formats can be added in a similar way.

Rationale for this change

The write.format.default table property was never read - the write path was hardcoded to Parquet. This PR makes the property functional. Also threads file_format through _to_requested_schema / ArrowProjectionVisitor / _construct_field so field ID metadata keys are correct per format (PARQUET:field_id for Parquet, iceberg.id plus iceberg.required for ORC), preparing the write path for ORC support without changing default behavior.

Are these changes tested?

  • tests/io/test_format_writers.py adds parametrized tests modeled after Java's BaseFormatModelTests covering round-trip, statistics, null handling, context manager caching, close idempotency, close-without-write, and ORC vs Parquet field ID dispatch.
  • tests/io/test_pyarrow.py adds test_write_file_parquet_round_trip and test_write_file_dispatches_on_write_format_default exercising the full write_file path.

Are there any user-facing changes?

No. Default behavior is unchanged. Setting write.format.default to an unregistered format now raises a ValueError.

@nssalian nssalian changed the title Implement ParquetFormatModel and wire write_file to use the format API Implement ParquetFormatModel and update write_file to use the format API May 19, 2026
@nssalian nssalian marked this pull request as ready for review May 19, 2026 03:51
@nssalian
Copy link
Copy Markdown
Contributor Author

@kevinjqliu @Fokko @geruh PTAL when you can

Comment thread pyiceberg/io/pyarrow.py Outdated
Comment thread tests/io/test_format_writers.py Outdated
Comment thread tests/io/test_format_writers.py Outdated
Comment thread pyiceberg/io/pyarrow.py Outdated
Comment thread pyiceberg/io/pyarrow.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants