Skip to content

perf: unwrap returns a zero-copy memoryview instead of copying the payload (#162)#184

Merged
27Bslash6 merged 1 commit into
mainfrom
perf/unwrap-zero-copy-memoryview
Jun 17, 2026
Merged

perf: unwrap returns a zero-copy memoryview instead of copying the payload (#162)#184
27Bslash6 merged 1 commit into
mainfrom
perf/unwrap-zero-copy-memoryview

Conversation

@27Bslash6

@27Bslash6 27Bslash6 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

SerializationWrapper.unwrap returns a zero-copy memoryview slice of the cache frame instead of copying the payload out with bytes(mv[header_end:]).

Closes #162. Prerequisite for the mmap read path (epic #171).

Why

unwrap copied the full payload on every read — every L1 hit, every backend read. For a ~300 MB Arrow frame that one copy doubled read peak RSS, and it sat on the only place a memoryview could otherwise flow uncopied into pa.py_buffer / pa.memory_map. The advertised "zero-copy" Arrow read could never actually be zero-copy while unwrap re-materialized the payload.

How

  • unwrap returns the v3-frame payload as a memoryview aliasing the input (the legacy base64+JSON path still returns bytes). The view keeps the source buffer alive, so it never dangles.
  • The deserialize protocol + all serializers accept bytes | memoryview.
  • Arrow consumes the view zero-copy (memoryviewpa.py_buffer).
  • msgpack / orjson / encrypted paths coerce bytes(data) at their Rust/C boundary (ByteStorage.retrieve, AES-GCM decrypt_with_keys, .startswith). bytes(b) is a no-op when b is already bytes, so the hot path pays nothing — and those paths can't be zero-copy regardless (msgpack rebuilds Python objects; AES-GCM decrypt owns its buffer).

Net effect: Arrow reads drop one full-payload copy on any backend; every other path is copy-neutral.

Tests

TestZeroCopyRead proves the contract two ways: the v3 payload is a memoryview, and mutating the source frame shows through the returned payload (a copy would not). Plus the full existing wrapper/encryption/legacy suite as regression.

  • Unit: 1549 passed
  • Critical (real Redis): 255 passed
  • basedpyright: 0 errors · ruff format+lint clean · doctests pass

No protocol/Rust/TS/SaaS change — the envelope is Python-SDK-internal and the cross-SDK wire format (ByteStorage) is untouched. Backward-compatible reads (legacy base64+JSON entries still decode).

Summary by CodeRabbit

Release Notes

  • Performance Enhancements

    • Improved serialisation efficiency with support for zero-copy memory operations during deserialisation.
  • Compatibility

    • Deserialisation now accepts flexible input formats, enabling better integration with different data sources whilst maintaining backward compatibility.

…yload (#162)

SerializationWrapper.unwrap copied the full payload on every read
(`bytes(mv[header_end:])`) — including every L1 hit and every Arrow read.
For a ~300MB Arrow frame that copy alone doubled read peak RSS.

unwrap now returns a memoryview slice that aliases the input frame. The
deserialize protocol accepts `bytes | memoryview`: Arrow flows the view
zero-copy into `pa.py_buffer`, while the msgpack / orjson / encrypted paths
coerce `bytes()` at their Rust/C boundary (a no-op when the input is already
bytes, and those paths can't be zero-copy anyway — msgpack rebuilds Python
objects, AES-GCM decrypt owns its buffer).

This is the prerequisite for the mmap read path (epic #171): the payload can
now travel uncopied from unwrap into pa.memory_map.
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6da20c83-8936-4c72-880a-45ba557e3b86

📥 Commits

Reviewing files that changed from the base of the PR and between 83b5693 and ab40d37.

📒 Files selected for processing (8)
  • src/cachekit/serializers/arrow_serializer.py
  • src/cachekit/serializers/auto_serializer.py
  • src/cachekit/serializers/base.py
  • src/cachekit/serializers/encryption_wrapper.py
  • src/cachekit/serializers/orjson_serializer.py
  • src/cachekit/serializers/standard_serializer.py
  • src/cachekit/serializers/wrapper.py
  • tests/unit/test_serialization_wrapper.py

Walkthrough

SerializationWrapper.unwrap is changed to return a zero-copy memoryview slice for v3 frames instead of allocating a new bytes object. All concrete serializer deserialize methods (ArrowSerializer, AutoSerializer, OrjsonSerializer, StandardSerializer, EncryptionWrapper) and the SerializerProtocol base contract are updated to accept bytes | memoryview, with each implementation coercing to bytes where downstream libraries require it.

Changes

Zero-copy deserialisation pipeline

Layer / File(s) Summary
Zero-copy unwrap in SerializationWrapper
src/cachekit/serializers/wrapper.py
unwrap now accepts bytes, bytearray, and memoryview as input; v3 payload extraction returns mv[header_end:] (a memoryview slice) instead of a bytes copy. Docstrings and return type updated accordingly. Legacy base64+JSON path is unchanged.
SerializerProtocol contract widened
src/cachekit/serializers/base.py
SerializerProtocol.deserialize parameter type broadened from bytes to `bytes
Concrete serialiser signature and coercion updates
src/cachekit/serializers/arrow_serializer.py, src/cachekit/serializers/auto_serializer.py, src/cachekit/serializers/orjson_serializer.py, src/cachekit/serializers/standard_serializer.py, src/cachekit/serializers/encryption_wrapper.py
Each serialiser's deserialize signature is widened to `bytes
Zero-copy aliasing tests
tests/unit/test_serialization_wrapper.py
TestZeroCopyRead asserts that v3 unwrapping returns a memoryview and that the returned view aliases the source bytearray (mutations to the source are reflected in the view).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

  • cachekit-io/cachekit-py#152: Directly overlaps with this PR's changes to SerializationWrapper.unwrap v3 frame parsing and ArrowSerializer deserialization logic operating on raw frame bytes.
  • cachekit-io/cachekit-py#172: Modifies SerializationWrapper.unwrap at the same v3 frame parsing location, adjusting structural bounds validation and typing that intersects with this PR's payload extraction change.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarises the main change: returning zero-copy memoryview instead of copying payload in unwrap, directly corresponding to the primary objective.
Description check ✅ Passed The PR description addresses all major template sections: What (unwrap returns memoryview), Why (avoids repeated payload copies), How (implementation details), and Tests (TestZeroCopyRead validation).
Linked Issues check ✅ Passed The PR fully addresses issue #162: returns v3-frame payload as memoryview slice instead of bytes copy, propagates bytes|memoryview through deserialize protocol, and adds tracemalloc-validated zero-copy tests.
Out of Scope Changes check ✅ Passed All changes directly support the zero-copy unwrap optimisation: wrapper.py returns memoryview, all serializers accept bytes|memoryview for compatibility, and tests validate the new behaviour.
Docstring Coverage ✅ Passed Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/unwrap-zero-copy-memoryview

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@27Bslash6 27Bslash6 merged commit 0901732 into main Jun 17, 2026
32 checks passed
@27Bslash6 27Bslash6 deleted the perf/unwrap-zero-copy-memoryview branch June 17, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SerializationWrapper.unwrap copies the full payload on every read (incl. every L1 hit)

1 participant