Skip to content

Backport 0.15 zstd fixes#848

Open
mkitti wants to merge 7 commits into
v0.15.xfrom
backport-0.15-zstd-fixes
Open

Backport 0.15 zstd fixes#848
mkitti wants to merge 7 commits into
v0.15.xfrom
backport-0.15-zstd-fixes

Conversation

@mkitti

@mkitti mkitti commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Backport #707, #757, and #782 to 0.15.x branch.

TODO:

  • Unit tests and/or doctests in docstrings
  • Tests pass locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • Docs build locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Codecov passes)

mkitti and others added 5 commits June 17, 2026 22:55
, #782, #757)

Backport three Zstandard improvements from main to the 0.15.x branch,
adapted to use the v0.15.x Buffer API from compat_ext:

- Add streaming decompression when frame content size is unknown (#707)
- Fix negative size issue on 32-bit platforms (#782)
- Support decompression of multiple concatenated frames (#757)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pytest 9.1 raises PytestRemovedIn10Warning for non-Collection iterables
passed to parametrize, which becomes an error under filterwarnings=error
and aborts test collection. Wrap in tuple() to match main.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The numcodecs.zarr3 module in this branch targets the zarr 3.0.x API.
zarr 3.1+ moved these codecs into zarr itself and changed internal APIs
(e.g. the new dtype system), breaking test_zarr3.py. The previous
`--pre zarr>=3.0.0b2` install resolved to zarr 3.2.1, causing 26
failures on Python 3.12/3.13. Pin the test dependency to a compatible
zarr release.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
macos-13 runners are stalling. main already moved to macos-15-large
(still an Intel runner, preserving x86 coverage). Update the matrix and
the clang-install conditional to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The conda-forge compilers inject -Wl,-rpath,$PREFIX/lib three times per
link, producing duplicate LC_RPATH load commands. macOS 15's dyld
rejects duplicate LC_RPATH at load time, so the freshly switched
macos-15-large runners failed every test with an ImportError on dlopen.
macos-14's older dyld tolerated the duplicates.

Add a macOS-only post-build step that uses install_name_tool to remove
the redundant LC_RPATH entries from numcodecs/*.so.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (3cf8ab1) to head (02dae08).

Additional details and impacted files
@@             Coverage Diff              @@
##           v0.15.x      #848      +/-   ##
============================================
+ Coverage    99.74%   100.00%   +0.25%     
============================================
  Files           63        26      -37     
  Lines         2753      1063    -1690     
============================================
- Hits          2746      1063    -1683     
+ Misses           7         0       -7     
Files with missing lines Coverage Δ
numcodecs/zarr3.py 100.00% <100.00%> (+0.49%) ⬆️

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mkitti and others added 2 commits June 20, 2026 17:25
- .readthedocs.yaml: bump build image to ubuntu-24.04 (ubuntu-20.04 was
  removed by Read the Docs) and cap docs zarr to <3.1, matching the
  0.15.x compatibility range.
- .pre-commit-config.yaml: pin the mypy hook's zarr to >=3.0.0rc1,<3.1 so
  type checking runs against the zarr 3.0 API that numcodecs.zarr3 targets
  (the unpinned hook pulled zarr 3.2, whose ZDType broke mypy).
- zarr3.py: correct two `# type: ignore[arg-type]` to `[call-overload]`
  for np.dtype(astype) under current numpy.
- wheel.yaml: swap the stalled macos-13 intel runner for macos-15-large,
  matching ci.yaml.
- codecov.yml / pyproject.toml: exclude test files from coverage
  (ignore: tests/** and [tool.coverage.run] omit), matching main.
- test_zarr3.py: cover AsType.evolve_from_array_spec when decode_dtype is
  unset, bringing zarr3.py to 100%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous commit copied main's config verbatim, but this branch differs
from main in two ways that broke CI:

- Tests live at numcodecs/tests/ here (main moved them to top-level tests/),
  so codecov `ignore: tests/**` and coverage `omit = ["tests/*"]` matched
  nothing. Point both at numcodecs/tests so the subprocess/skipped-test lines
  stop counting against the 100% project target.
- Read the Docs strips the quotes around post_install commands, so the `<`
  in 'zarr>=3.0.0b2,<3.1' was parsed as a shell input redirection
  ("cannot open 3.1"). Use 'zarr~=3.0.0' instead, which pins to zarr 3.0.x
  with no shell metacharacters.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backports several Zstandard-related fixes to the 0.15.x line, extending numcodecs.zstd decompression to handle unknown content sizes (streaming mode), concatenated frames, and 32-bit size safety, with accompanying tests and CI/docs updates.

Changes:

  • Update numcodecs/zstd.pyx to (a) compute total decompressed size across concatenated frames, (b) stream-decompress when frame content size is unknown, and (c) guard against oversize content on 32-bit platforms.
  • Add/expand tests for streaming + multi-frame behavior (including cross-checks against pyzstd) and adjust a couple of ancillary tests.
  • Update CI/docs tooling to pin zarr 3.0.x usage on this branch and tweak macOS runner/RTD configuration.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pyproject.toml Adds pyzstd to test extras; configures coverage to omit numcodecs/tests/*.
numcodecs/zstd.pyx Implements multi-frame sizing, streaming decompression for unknown sizes, and 32-bit size checks.
numcodecs/zarr3.py Adjusts mypy ignore code for np.dtype(...) calls.
numcodecs/tests/test_zstd.py Adds tests for streaming decompression and mixed known/unknown multi-frame cases.
numcodecs/tests/test_zarr3.py Adds a parameterization case for AsType without decode_dtype.
numcodecs/tests/test_pyzstd.py New tests comparing behavior against pyzstd for multi-frame + streaming cases.
numcodecs/tests/test_checksum32.py Materializes itertools.product(...) for parametrization.
docs/release.rst Adds release note entries for the three Zstd fixes.
.readthedocs.yaml Updates RTD OS image and pins zarr to ~3.0.0 for docs builds.
.pre-commit-config.yaml Pins mypy hook’s zarr dependency to <3.1.
.github/workflows/wheel.yaml Switches Intel macOS runner label used for wheel builds.
.github/workflows/ci.yaml Switches Intel macOS runner label; pins zarr <3.1; adds macOS rpath deduplication step.
.github/codecov.yml Ignores numcodecs/tests/** in Codecov config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread numcodecs/zstd.pyx
Comment on lines +317 to +323
with nogil:

status = ZSTD_initDStream(zds)
if ZSTD_isError(status):
error = ZSTD_getErrorName(status)
ZSTD_freeDStream(zds);
raise RuntimeError('Zstd stream decompression error on ZSTD_initDStream: %s' % error)
Comment thread numcodecs/zstd.pyx
Comment on lines +319 to +363
status = ZSTD_initDStream(zds)
if ZSTD_isError(status):
error = ZSTD_getErrorName(status)
ZSTD_freeDStream(zds);
raise RuntimeError('Zstd stream decompression error on ZSTD_initDStream: %s' % error)

input = ZSTD_inBuffer(source_ptr, source_size, 0)
output = ZSTD_outBuffer(dest_ptr, dest_size, 0)

# Initialize to 1 to force a loop iteration
status = 1
while(status > 0 or input.pos < input.size):
# Possible returned values of ZSTD_decompressStream:
# 0: frame is completely decoded and fully flushed
# error (<0)
# >0: suggested next input size
status = ZSTD_decompressStream(zds, &output, &input)

if ZSTD_isError(status):
error = ZSTD_getErrorName(status)
raise RuntimeError('Zstd stream decompression error on ZSTD_decompressStream: %s' % error)

# There is more to decompress, grow the buffer
if status > 0 and output.pos == output.size:
new_size = output.size + DEST_GROWTH_SIZE

if new_size < output.size or new_size < DEST_GROWTH_SIZE:
raise RuntimeError('Zstd stream decompression error: output buffer overflow')

new_dst = realloc(output.dst, new_size)

if new_dst == NULL:
# output.dst freed in finally block
raise RuntimeError('Zstd stream decompression error on realloc: could not expand output buffer')

output.dst = new_dst
output.size = new_size

# Copy the output to a bytes object
dest = PyBytes_FromStringAndSize(<char *>output.dst, output.pos)

finally:
ZSTD_freeDStream(zds)
free(output.dst)

Comment thread numcodecs/zstd.pyx
Comment on lines +223 to +234
if content_size == ZSTD_CONTENTSIZE_UNKNOWN and dest is None:
return stream_decompress(source_ptr, source_size)
elif content_size == ZSTD_CONTENTSIZE_UNKNOWN:
# dest is not None
# set dest_size based on dest
pass
elif content_size == ZSTD_CONTENTSIZE_ERROR or content_size == 0:
raise RuntimeError('Zstd decompression error: invalid input data')
elif content_size > (<unsigned long long>SIZE_MAX):
raise RuntimeError('Zstd decompression error: content size too large for platform')

dest_size = <size_t>content_size
Comment thread numcodecs/zstd.pyx
Comment on lines +390 to +392
if ZSTD_isError(frame_compressed_size):
error = ZSTD_getErrorName(frame_compressed_size)
raise RuntimeError('Could not set determine zstd frame size: %s' % error)
Comment on lines +191 to +194
def zstd_cli_available() -> bool:
return not subprocess.run(
["zstd", "-V"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode
@mkitti

mkitti commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

Copilot has some good points, but these probably should a distinct pull request on main.

@mkitti mkitti requested a review from d-v-b June 20, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants