backup: Redis set encoder (Phase 0a)#758
Conversation
Adds the !st|meta| + !st|mem| → sets/<key>.json encoder per the Phase
0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md).
Wire format mirrors store/set_helpers.go:
- !st|meta|<userKeyLen(4)><userKey> → 8-byte BE Len
- !st|mem|<userKeyLen(4)><userKey><member> → empty value (member
bytes live in the
key, binary-safe)
- !st|meta|d|<userKeyLen(4)><userKey>... → skipped silently
(same policy as hash
and list deltas)
Output JSON shape matches the design's other wide-column types:
{"format_version": 1,
"members": [..., {"base64":"..."}, ...],
"expire_at_ms": null | <ms>}
Members are emitted as a sorted array (not a JSON object) for the
same binary-safety reason the hash encoder uses an array for fields:
distinct binary member names can collide under JSON's percent-encoded
object-key path, and base64-envelope encoding for non-UTF-8 members
keeps each record byte-faithful. Duplicate HandleSetMember calls
collapse via a map[string]struct{} buffer so a snapshot iterator that
re-emits the same !st|mem| key is harmless (Redis sets are
mathematical sets, not multisets).
TTL records on !redis|ttl| route into the set JSON's expire_at_ms
field via the HandleTTL switch — same fold-into-record policy as
hash + list encoders; no separate sidecar.
The uint64 → int64 overflow guard for the declared-length field is
applied symmetrically (matching the hash + list encoders) so a
corrupted store with the high bit set fails closed at meta-record
ingest rather than silently wrapping to negative declaredLen and
firing spurious redis_set_length_mismatch warnings.
Refactor: leverages the existing flushWideColumnDir generic helper
introduced for the list encoder (PR #755) — no further changes to
redis_string.go's shared infra.
Tests cover: sorted byte-order round-trip; empty set still emits a
file (SCARD==0 observable); TTL inlined from scan index; length-
mismatch warning shape; binary member base64 envelope; meta-delta
key silently skipped; malformed meta length rejected; overflow guard
rejected; members-without-meta still emits without false-positive
mismatch warning; duplicate members collapse idempotently; parser-
level delta-key rejection; math.MaxInt64 boundary accepted.
Phase 0a remaining after this PR: redis_zset / redis_stream
encoders, cmd/elastickv-snapshot-decode CLI, cmd/elastickv-snap-token
helper, docs/operations/snapshot_restore.md runbook.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements a Redis set encoder that translates snapshot records into a per-set JSON format, including support for TTL inlining and deterministic member sorting. The review feedback identifies the use of a hash-specific constant in the set encoder and suggests using a local value instead. Additionally, it emphasizes the importance of correctly handling all wide-column types in the HandleTTL and Finalize logic to prevent data loss or state inconsistencies.
| if !ok { | ||
| return nil, nil, false | ||
| } | ||
| member := rest[hashUserKeyLenSize+len(userKey):] |
There was a problem hiding this comment.
The constant hashUserKeyLenSize appears to be a leftover from the hash encoder implementation. While it likely holds the correct value (4 bytes), using a hash-specific constant in the set encoder is misleading and creates an unnecessary dependency on the hash implementation details. Please define a local constant for the set encoder or use a more generic shared constant if one exists in the backup package.
| member := rest[hashUserKeyLenSize+len(userKey):] | |
| member := rest[4+len(userKey):] |
| r.warn("redis_orphan_ttl", | ||
| "count", r.orphanTTLCount, | ||
| "hint", "remaining wide-column encoders (set/zset/stream) have not landed yet") | ||
| "hint", "remaining wide-column encoders (zset/stream) have not landed yet") |
There was a problem hiding this comment.
The hint message for redis_orphan_ttl has been updated to remove set, which is correct as the set encoder is now implemented. However, ensure that the corresponding logic in HandleTTL and Finalize correctly handles all wide-column types. This is crucial to avoid silently dropping entries or creating state inconsistencies during normalization, ensuring all data is preserved on disk.
References
- Avoid silently dropping entries during serialization or normalization. Provide specific handling for all valid contexts to ensure they are preserved on disk and avoid state inconsistencies.
Summary
!st|meta|+!st|mem|→sets/<key>.jsonencoder per the Phase 0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md).{"base64":"..."}envelope without colliding under percent-encoded JSON object keying — same shape as the hash encoder'sfieldsarray.HandleSetMembercalls collapse via amap[string]struct{}buffer; Redis sets are mathematical sets, so a snapshot iterator that re-emits an!st|mem|key is harmless.!redis|ttl|for a set user key fold into the set JSON'sexpire_at_msfield via theHandleTTLswitch — same policy as hash + list encoders, no separate sidecar.uint64 → int64overflow guard applied symmetrically to the hash + list encoders (see PR backup: Redis list encoder (Phase 0a) #755 round 2) is also enforced here on the set's declared-length field.!st|meta|d|(LenDelta) records are silently skipped:!st|mem|keys are the source of truth at backup time.Test plan
go test -race ./internal/backup/...— all 11 new set tests + existing hash/list/string suites pass (1.4s)golangci-lint run ./internal/backup/...— 0 issuesmath.MaxInt64boundary accepted.Self-review (5 lenses)
writeFileAtomic(tmp+rename). Only meta-deltas are skipped, which is consistent with the hash/list policy and the live read path's source-of-truth invariant.RedisDB.Handle*methods documented not goroutine-safe; decoder pipeline is sequential per scope. No shared mutable state.map[string]struct{}; single allocation per member. Sort at flush is O(n log n); set member count bounded bymaxWideColumnItemson the live side.redis_set_length_mismatchwarning fires on declared-vs-observed drift; sorted byte order is deterministic across runs.Phase 0a remaining after this PR
redis_zset.go(!zs|...) encoder — sorted-set with(member, score)records, JSON shape{"format_version": 1, "members": [{"member":..., "score":...}], "expire_at_ms": null}redis_stream.go(!stream|...) encoder — JSONL output (streams/<key>.jsonlwith_metatrailer)cmd/elastickv-snapshot-decode/CLI binarycmd/elastickv-snap-tokenhelperdocs/operations/snapshot_restore.mdrunbook