perf: unwrap returns a zero-copy memoryview instead of copying the payload (#162)#184
Conversation
…yload (#162) SerializationWrapper.unwrap copied the full payload on every read (`bytes(mv[header_end:])`) — including every L1 hit and every Arrow read. For a ~300MB Arrow frame that copy alone doubled read peak RSS. unwrap now returns a memoryview slice that aliases the input frame. The deserialize protocol accepts `bytes | memoryview`: Arrow flows the view zero-copy into `pa.py_buffer`, while the msgpack / orjson / encrypted paths coerce `bytes()` at their Rust/C boundary (a no-op when the input is already bytes, and those paths can't be zero-copy anyway — msgpack rebuilds Python objects, AES-GCM decrypt owns its buffer). This is the prerequisite for the mmap read path (epic #171): the payload can now travel uncopied from unwrap into pa.memory_map.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (8)
Walkthrough
ChangesZero-copy deserialisation pipeline
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
What
SerializationWrapper.unwrapreturns a zero-copymemoryviewslice of the cache frame instead of copying the payload out withbytes(mv[header_end:]).Closes #162. Prerequisite for the mmap read path (epic #171).
Why
unwrapcopied the full payload on every read — every L1 hit, every backend read. For a ~300 MB Arrow frame that one copy doubled read peak RSS, and it sat on the only place a memoryview could otherwise flow uncopied intopa.py_buffer/pa.memory_map. The advertised "zero-copy" Arrow read could never actually be zero-copy whileunwrapre-materialized the payload.How
unwrapreturns the v3-frame payload as amemoryviewaliasing the input (the legacy base64+JSON path still returnsbytes). The view keeps the source buffer alive, so it never dangles.deserializeprotocol + all serializers acceptbytes | memoryview.memoryview→pa.py_buffer).bytes(data)at their Rust/C boundary (ByteStorage.retrieve, AES-GCMdecrypt_with_keys,.startswith).bytes(b)is a no-op whenbis alreadybytes, so the hot path pays nothing — and those paths can't be zero-copy regardless (msgpack rebuilds Python objects; AES-GCM decrypt owns its buffer).Net effect: Arrow reads drop one full-payload copy on any backend; every other path is copy-neutral.
Tests
TestZeroCopyReadproves the contract two ways: the v3 payload is amemoryview, and mutating the source frame shows through the returned payload (a copy would not). Plus the full existing wrapper/encryption/legacy suite as regression.No protocol/Rust/TS/SaaS change — the envelope is Python-SDK-internal and the cross-SDK wire format (ByteStorage) is untouched. Backward-compatible reads (legacy base64+JSON entries still decode).
Summary by CodeRabbit
Release Notes
Performance Enhancements
Compatibility