Backport 0.15 zstd fixes#848
Open
mkitti wants to merge 7 commits into
Open
Conversation
, #782, #757) Backport three Zstandard improvements from main to the 0.15.x branch, adapted to use the v0.15.x Buffer API from compat_ext: - Add streaming decompression when frame content size is unknown (#707) - Fix negative size issue on 32-bit platforms (#782) - Support decompression of multiple concatenated frames (#757) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pytest 9.1 raises PytestRemovedIn10Warning for non-Collection iterables passed to parametrize, which becomes an error under filterwarnings=error and aborts test collection. Wrap in tuple() to match main. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The numcodecs.zarr3 module in this branch targets the zarr 3.0.x API. zarr 3.1+ moved these codecs into zarr itself and changed internal APIs (e.g. the new dtype system), breaking test_zarr3.py. The previous `--pre zarr>=3.0.0b2` install resolved to zarr 3.2.1, causing 26 failures on Python 3.12/3.13. Pin the test dependency to a compatible zarr release. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
macos-13 runners are stalling. main already moved to macos-15-large (still an Intel runner, preserving x86 coverage). Update the matrix and the clang-install conditional to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The conda-forge compilers inject -Wl,-rpath,$PREFIX/lib three times per link, producing duplicate LC_RPATH load commands. macOS 15's dyld rejects duplicate LC_RPATH at load time, so the freshly switched macos-15-large runners failed every test with an ImportError on dlopen. macos-14's older dyld tolerated the duplicates. Add a macOS-only post-build step that uses install_name_tool to remove the redundant LC_RPATH entries from numcodecs/*.so. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## v0.15.x #848 +/- ##
============================================
+ Coverage 99.74% 100.00% +0.25%
============================================
Files 63 26 -37
Lines 2753 1063 -1690
============================================
- Hits 2746 1063 -1683
+ Misses 7 0 -7
🚀 New features to boost your workflow:
|
- .readthedocs.yaml: bump build image to ubuntu-24.04 (ubuntu-20.04 was removed by Read the Docs) and cap docs zarr to <3.1, matching the 0.15.x compatibility range. - .pre-commit-config.yaml: pin the mypy hook's zarr to >=3.0.0rc1,<3.1 so type checking runs against the zarr 3.0 API that numcodecs.zarr3 targets (the unpinned hook pulled zarr 3.2, whose ZDType broke mypy). - zarr3.py: correct two `# type: ignore[arg-type]` to `[call-overload]` for np.dtype(astype) under current numpy. - wheel.yaml: swap the stalled macos-13 intel runner for macos-15-large, matching ci.yaml. - codecov.yml / pyproject.toml: exclude test files from coverage (ignore: tests/** and [tool.coverage.run] omit), matching main. - test_zarr3.py: cover AsType.evolve_from_array_spec when decode_dtype is unset, bringing zarr3.py to 100%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous commit copied main's config verbatim, but this branch differs
from main in two ways that broke CI:
- Tests live at numcodecs/tests/ here (main moved them to top-level tests/),
so codecov `ignore: tests/**` and coverage `omit = ["tests/*"]` matched
nothing. Point both at numcodecs/tests so the subprocess/skipped-test lines
stop counting against the 100% project target.
- Read the Docs strips the quotes around post_install commands, so the `<`
in 'zarr>=3.0.0b2,<3.1' was parsed as a shell input redirection
("cannot open 3.1"). Use 'zarr~=3.0.0' instead, which pins to zarr 3.0.x
with no shell metacharacters.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Backports several Zstandard-related fixes to the 0.15.x line, extending numcodecs.zstd decompression to handle unknown content sizes (streaming mode), concatenated frames, and 32-bit size safety, with accompanying tests and CI/docs updates.
Changes:
- Update
numcodecs/zstd.pyxto (a) compute total decompressed size across concatenated frames, (b) stream-decompress when frame content size is unknown, and (c) guard against oversize content on 32-bit platforms. - Add/expand tests for streaming + multi-frame behavior (including cross-checks against
pyzstd) and adjust a couple of ancillary tests. - Update CI/docs tooling to pin zarr 3.0.x usage on this branch and tweak macOS runner/RTD configuration.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Adds pyzstd to test extras; configures coverage to omit numcodecs/tests/*. |
| numcodecs/zstd.pyx | Implements multi-frame sizing, streaming decompression for unknown sizes, and 32-bit size checks. |
| numcodecs/zarr3.py | Adjusts mypy ignore code for np.dtype(...) calls. |
| numcodecs/tests/test_zstd.py | Adds tests for streaming decompression and mixed known/unknown multi-frame cases. |
| numcodecs/tests/test_zarr3.py | Adds a parameterization case for AsType without decode_dtype. |
| numcodecs/tests/test_pyzstd.py | New tests comparing behavior against pyzstd for multi-frame + streaming cases. |
| numcodecs/tests/test_checksum32.py | Materializes itertools.product(...) for parametrization. |
| docs/release.rst | Adds release note entries for the three Zstd fixes. |
| .readthedocs.yaml | Updates RTD OS image and pins zarr to ~3.0.0 for docs builds. |
| .pre-commit-config.yaml | Pins mypy hook’s zarr dependency to <3.1. |
| .github/workflows/wheel.yaml | Switches Intel macOS runner label used for wheel builds. |
| .github/workflows/ci.yaml | Switches Intel macOS runner label; pins zarr <3.1; adds macOS rpath deduplication step. |
| .github/codecov.yml | Ignores numcodecs/tests/** in Codecov config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+317
to
+323
| with nogil: | ||
|
|
||
| status = ZSTD_initDStream(zds) | ||
| if ZSTD_isError(status): | ||
| error = ZSTD_getErrorName(status) | ||
| ZSTD_freeDStream(zds); | ||
| raise RuntimeError('Zstd stream decompression error on ZSTD_initDStream: %s' % error) |
Comment on lines
+319
to
+363
| status = ZSTD_initDStream(zds) | ||
| if ZSTD_isError(status): | ||
| error = ZSTD_getErrorName(status) | ||
| ZSTD_freeDStream(zds); | ||
| raise RuntimeError('Zstd stream decompression error on ZSTD_initDStream: %s' % error) | ||
|
|
||
| input = ZSTD_inBuffer(source_ptr, source_size, 0) | ||
| output = ZSTD_outBuffer(dest_ptr, dest_size, 0) | ||
|
|
||
| # Initialize to 1 to force a loop iteration | ||
| status = 1 | ||
| while(status > 0 or input.pos < input.size): | ||
| # Possible returned values of ZSTD_decompressStream: | ||
| # 0: frame is completely decoded and fully flushed | ||
| # error (<0) | ||
| # >0: suggested next input size | ||
| status = ZSTD_decompressStream(zds, &output, &input) | ||
|
|
||
| if ZSTD_isError(status): | ||
| error = ZSTD_getErrorName(status) | ||
| raise RuntimeError('Zstd stream decompression error on ZSTD_decompressStream: %s' % error) | ||
|
|
||
| # There is more to decompress, grow the buffer | ||
| if status > 0 and output.pos == output.size: | ||
| new_size = output.size + DEST_GROWTH_SIZE | ||
|
|
||
| if new_size < output.size or new_size < DEST_GROWTH_SIZE: | ||
| raise RuntimeError('Zstd stream decompression error: output buffer overflow') | ||
|
|
||
| new_dst = realloc(output.dst, new_size) | ||
|
|
||
| if new_dst == NULL: | ||
| # output.dst freed in finally block | ||
| raise RuntimeError('Zstd stream decompression error on realloc: could not expand output buffer') | ||
|
|
||
| output.dst = new_dst | ||
| output.size = new_size | ||
|
|
||
| # Copy the output to a bytes object | ||
| dest = PyBytes_FromStringAndSize(<char *>output.dst, output.pos) | ||
|
|
||
| finally: | ||
| ZSTD_freeDStream(zds) | ||
| free(output.dst) | ||
|
|
Comment on lines
+223
to
+234
| if content_size == ZSTD_CONTENTSIZE_UNKNOWN and dest is None: | ||
| return stream_decompress(source_ptr, source_size) | ||
| elif content_size == ZSTD_CONTENTSIZE_UNKNOWN: | ||
| # dest is not None | ||
| # set dest_size based on dest | ||
| pass | ||
| elif content_size == ZSTD_CONTENTSIZE_ERROR or content_size == 0: | ||
| raise RuntimeError('Zstd decompression error: invalid input data') | ||
| elif content_size > (<unsigned long long>SIZE_MAX): | ||
| raise RuntimeError('Zstd decompression error: content size too large for platform') | ||
|
|
||
| dest_size = <size_t>content_size |
Comment on lines
+390
to
+392
| if ZSTD_isError(frame_compressed_size): | ||
| error = ZSTD_getErrorName(frame_compressed_size) | ||
| raise RuntimeError('Could not set determine zstd frame size: %s' % error) |
Comment on lines
+191
to
+194
| def zstd_cli_available() -> bool: | ||
| return not subprocess.run( | ||
| ["zstd", "-V"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL | ||
| ).returncode |
Contributor
Author
|
Copilot has some good points, but these probably should a distinct pull request on main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport #707, #757, and #782 to 0.15.x branch.
TODO: