Skip to content

backup: Redis list encoder (Phase 0a)#755

Merged
bootjp merged 2 commits into
mainfrom
feat/backup-phase0a-redis-list
May 15, 2026
Merged

backup: Redis list encoder (Phase 0a)#755
bootjp merged 2 commits into
mainfrom
feat/backup-phase0a-redis-list

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 15, 2026

Summary

  • Adds the !lst|meta| + !lst|itm|lists/<key>.json encoder per the Phase 0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md).
  • Item order in the dump is ascending sortable-int64 seq, matching the live store's LPUSH-leftmost-to-RPUSH-rightmost read order (verified by submitting items out-of-order in the round-trip test).
  • TTL records on !redis|ttl| for a list user key fold into the list JSON's expire_at_ms field via the HandleTTL switch — same policy as the hash encoder (no separate sidecar).
  • !lst|meta|d|... deltas and !lst|claim|... POP tombstones are silently skipped: !lst|itm| keys are the source of truth at backup time, and the live read path (fetchListRange in adapter/redis.go:3281) doesn't consult claims either — POPs delete the item key in the same OCC commit.
  • Extracted flushWideColumnDir generic helper into redis_string.go. Both flushHashes and flushLists now call it, eliminating the ~30-line "create subdir + sort user keys + iterate" duplication that would otherwise be repeated for every wide-column encoder.

Test plan

  • go test -race ./internal/backup/... (all 5 list tests + existing hash/string suite pass, 1.3s)
  • golangci-lint run ./internal/backup/... (0 issues)
  • Tests cover: LPUSH ordering preserved across out-of-order HandleListItem submissions; empty list still emits a file (LLEN==0 observable); TTL inlined from scan index; length-mismatch warning shape; binary item base64 envelope; meta-delta key silently skipped; claim key silently skipped; malformed meta length rejected; truncated item key rejected.

Self-review (5 lenses)

  1. Data loss: encoder writes via writeFileAtomic (tmp+rename). No write path drops records — only !lst|meta|d| / !lst|claim| records are skipped, and both are non-source-of-truth in the live read path.
  2. Concurrency / distributed: RedisDB.Handle* methods are documented not goroutine-safe; the decoder pipeline is sequential per scope. No new locks or shared state.
  3. Performance: items buffered in a map[int64][]byte keyed by seq. Single allocation per item via bytes.Clone. Sort at flush is O(n log n) per list; lists are bounded by maxWideColumnItems on the live side.
  4. Data consistency: meta declared_len mismatch warns via redis_list_length_mismatch; ascending-seq order is deterministic across runs (Go's randomised map iteration is mitigated by the sort).
  5. Test coverage: 9 named tests cover every public handler plus the parse-error paths.

Phase 0a remaining after this PR

  • redis_set.go / redis_zset.go / redis_stream.go encoders
  • cmd/elastickv-snapshot-decode/ CLI binary
  • cmd/elastickv-snap-token helper
  • docs/operations/snapshot_restore.md runbook

Summary by CodeRabbit

  • New Features
    • Added Redis list backup output (per-list JSON), preserving LPUSH order and inline TTL; binary/non-UTF-8 items are serialized safely.
  • Bug Fixes
    • Prevented integer overflow when reading hash metadata, avoiding spurious length-mismatch warnings.
    • Emit length-mismatch warnings when declared vs. decoded counts differ.
  • Tests
    • Expanded tests for list round-trips, TTL inlining, binary items, length-mismatch, and malformed-meta handling.
  • Refactor
    • Generalized wide-column flush logic for deterministic per-type directory output.

Review Change Stack

Adds the !lst|meta| + !lst|itm| → lists/<key>.json encoder per the
Phase 0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md).
Wire format mirrors store/list_helpers.go:

  - !lst|meta|<userKey>          -> 24-byte (Head, Tail, Len) blob
  - !lst|itm|<userKey><seq(8)>   -> raw item bytes (binary-safe)
  - !lst|meta|d|...              -> meta delta. Skipped silently
                                    (same policy as hash deltas):
                                    !lst|itm| keys are the source of
                                    truth at backup time and the
                                    delta arithmetic does not need
                                    to be replayed.
  - !lst|claim|...               -> POP tombstone for OCC uniqueness.
                                    The live read path (rangeList →
                                    fetchListRange in redis.go:3281)
                                    does NOT consult claims; POPs
                                    Del the underlying item key in
                                    the same OCC commit, so a
                                    snapshot taken after a POP has
                                    no item record for the popped
                                    seq. The encoder therefore skips
                                    claim keys entirely.

Output JSON shape matches the design's other wide-column types:

  {"format_version": 1,
   "items": [..., {"base64":"..."}, ...],
   "expire_at_ms": null | <ms>}

Items are emitted in ascending sortable-int64 seq order, which is the
live store's left-to-right read order (LPUSH writes seqs at decreasing
positions, RPUSH at increasing). Non-UTF-8 item bytes round-trip via
the typed {"base64":"..."} envelope, sharing marshalRedisBinaryValue
with the hash encoder. TTL records on !redis|ttl| route into the list
JSON's expire_at_ms field via the HandleTTL switch (mirrors the hash
fold-into-record policy; lists carry no separate sidecar).

Refactor: extract flushWideColumnDir generic helper into redis_string.go.
Both flushHashes and flushLists now call it, eliminating ~30 lines of
"sort user keys + iterate" duplication that dupl would otherwise flag
on every new wide-column encoder. flushHashes diff is the inline
substitution; behaviour is unchanged.

Tests cover: LPUSH ordering preserved across out-of-order
HandleListItem submissions; empty list still emits a file (LLEN==0 is
observable); TTL inlined from scan index; length-mismatch warning
shape; binary item base64 envelope; meta-delta key silently skipped;
claim key silently skipped; malformed meta length rejected;
truncated item key rejected.

Phase 0a remaining after this PR: redis_set / redis_zset /
redis_stream encoders, the cmd/elastickv-snapshot-decode CLI, the
cmd/elastickv-snap-token helper, and the docs/operations runbook.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c754529d-4671-4cae-aa45-47fe5ad1d46b

📥 Commits

Reviewing files that changed from the base of the PR and between abb8149 and 5cafc8a.

📒 Files selected for processing (4)
  • internal/backup/redis_hash.go
  • internal/backup/redis_hash_test.go
  • internal/backup/redis_list_test.go
  • internal/backup/redis_string.go

📝 Walkthrough

Walkthrough

Adds a Redis list backup encoder producing per-list JSON files, inlines list TTLs, introduces a generic wide-column directory flush helper shared by hashes and lists, refactors hash flushing to use it, and adds tests including a hash-meta overflow regression.

Changes

Redis list backup encoder with wide-column flushing

Layer / File(s) Summary
Wide-column directory flushing infrastructure
internal/backup/redis_string.go, internal/backup/redis_hash.go
Adds redisKindList, RedisDB.lists, TTL routing for lists, flushWideColumnDir generic helper, and refactors flushHashes to use the helper.
Redis list codec: handlers, parsing, and output
internal/backup/redis_list.go, internal/backup/redis_string.go
Implements list handlers (HandleListMeta, HandleListItem, claim/delta no-ops), per-list buffering, sortable-seq decoding, length-mismatch warning emission, marshalListJSON and atomic per-list JSON writes with optional expire_at_ms.
Comprehensive Redis list test suite & hash regression
internal/backup/redis_list_test.go, internal/backup/redis_hash_test.go
Adds tests validating LPUSH ordering, empty-list file emission, TTL inlining, length-mismatch warnings, binary item base64 envelope, delta/claim skipping, malformed meta/key errors, and a test ensuring overflowing hash meta returns ErrRedisInvalidHashMeta.

Sequence Diagram(s)

sequenceDiagram
  participant Snapshot
  participant RedisDB
  participant Handler (ListMeta/ListItem)
  participant listState
  participant flushWideColumnDir
  participant writeListJSON
  Snapshot->>RedisDB: stream records (meta/item/TTL)
  RedisDB->>Handler: dispatch to HandleListMeta or HandleListItem
  Handler->>listState: create/update per-list buffer (declaredLen, items, TTL)
  RedisDB->>flushWideColumnDir: finalize lists -> iterate keys
  flushWideColumnDir->>writeListJSON: per-key callback to marshal and write JSON
  writeListJSON-->>Snapshot: atomic lists/<encoded>.json written
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • bootjp/elastickv#713: prior work on Redis wide-column encoder wiring that this change builds upon.

Poem

🐰 I hop through bytes and sort each seq,

lists align where items seek,
Wide-column paths now flush with care,
TTLs tucked safely in JSON there. 🎋

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'backup: Redis list encoder (Phase 0a)' directly and accurately reflects the main change—implementing a Redis list encoder as part of Phase 0a of the backup feature, which is clearly the primary focus of all changes.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-redis-list

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 5m 45s —— View job


Code Review

  • Read changed files
  • Review redis_string.go (flushWideColumnDir helper)
  • Review redis_hash.go (refactored flushing)
  • Review redis_list.go (new list encoder)
  • Review redis_list_test.go (test coverage)
  • Check alignment with design doc
  • Post review findings

Overall this is a well-structured PR. The list encoder faithfully mirrors the hash encoder's patterns, the key format aligns correctly with the live store, and the JSON shape matches the design doc ({"format_version": 1, "items": [...], "expire_at_ms": null}). Below are my findings in priority order.

Findings

1. Inconsistent meta-value overflow handling between hash and list encoders (Minor)

redis_hash.go:75 silently wraps a uint64 field-count into int64 with no overflow check:

st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design

redis_list.go:112-115 explicitly guards against the same overflow:

if rawLen > math.MaxInt64 {
    return cockroachdberr.Wrapf(ErrRedisInvalidListMeta, "declared len %d overflows int64", rawLen)
}

The list encoder is more defensive (good), but the inconsistency is worth backporting. With the hash encoder, a live store that somehow stores a field count with the high bit set would produce a negative declaredLen, causing every redis_hash_length_mismatch warning to fire incorrectly.

Fix this →

2. flushWideColumnDir aborts the entire type on first write error (Observation)

redis_string.go:341-345:

for _, uk := range userKeys {
    if err := flushOne(dir, uk, states[uk]); err != nil {
        return err  // ← subsequent user keys are not written
    }
}

If writing list "bar" fails, lists "baz" through "zoo" are never written. Finalize continues with other types (flushHashes, etc.), but within a single type, subsequent keys are skipped silently. This is not a regression — the pre-refactoring inline loop had identical semantics — but it's worth a comment noting the intentional "fail-fast per type" policy so future readers don't "fix" it to continue (which could produce a partially-written dump that's harder to detect as corrupt).

3. Missing test: items-without-meta edge case (Minor)

The test suite covers meta-without-items (TestRedisDB_ListEmptyListStillEmitsFile) but not items-without-meta — where !lst|itm| records arrive without a !lst|meta| record. In that case listState creates an entry with metaSeen=false, declaredLen=0, flushLists writes a JSON file, and no mismatch warning fires (because st.metaSeen is false). This is intentional and consistent with the hash encoder, but given the 9-test suite's explicit coverage of parse-error paths, a one-liner round-trip test for this case would close the gap.

Fix this →

4. flushWideColumnDir lives in redis_string.go (Organizational)

The helper is shared by flushHashes and flushLists (and will be reused by the set/zset/stream encoders), but it's defined in the string encoder file. A file named redis_widecolumn.go (or redis_common.go) would make the dependency clearer to future contributors. That said, the PR description calls this placement out explicitly and it's a purely organizational concern — not blocking.

Correctness Checks Passed

  • Key format alignment
    !lst|meta|<userKey> (no length prefix, matches store.ListMetaKey) and !lst|itm|<userKey><seq(8)> (matches store.ListItemKey). The decoder's parseListItemKey correctly inverts the encoder by taking the last 8 bytes as seq — the mapping (userKey, seq) → item_key is a bijection, so no ambiguity.

  • Meta value layout
    value[16:24] reads Len from the 24-byte [Head(8)][Tail(8)][Len(8)] blob, matching store.MarshalListMeta.

  • Sortable-int64 decoding
    int64(rawSeq) ^ math.MinInt64 correctly undoes the live store's sign-flip encoding; ascending sort on int64 seq yields LPUSH-leftmost-first order.

  • Delta and claim records
    ListMetaDeltaPrefix = "!lst|meta|d|" is correctly checked before ListMetaPrefix = "!lst|meta|" (since delta is a superset prefix, the order matters). Claim keys are discarded via the no-op HandleListClaim.

  • TTL inlining
    HandleTTL routes redisKindList to listState(userKey).expireAtMs. The test calls HandleTTL([]byte("k"), ...) after HandleListMeta(listMetaKey("k"), ...), and listState sets kindByKey["k"] = redisKindList when meta is processed — so the routing works correctly.

  • JSON shape vs design doc
    {"format_version": 1, "items": [...], "expire_at_ms": null} matches line 330 of the design doc.

  • Generic flushWideColumnDir
    Type inference correctly resolves T as *redisHashState / *redisListState for the two call sites.


Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Redis list encoder to the backup system. It includes the implementation of HandleListMeta, HandleListItem, and flushLists to process and store Redis list data as JSON. Additionally, it refactors the existing hash flushing logic into a generic flushWideColumnDir helper to be shared across wide-column types. Comprehensive tests are added to verify the list encoding logic, including TTL handling and error cases. I have no feedback to provide as there were no review comments to assess.

…rage

Claude bot round-1 review surfaced two follow-ups worth backporting
from the list encoder to the hash encoder for cross-encoder symmetry,
plus one coverage gap and one design-intent comment.

1. HandleHashMeta now rejects uint64 declared-field counts with the
   high bit set instead of silently wrapping them to a negative int64
   declaredLen. Without the guard, a corrupted store would fire
   spurious `redis_hash_length_mismatch` warnings on every flush
   regardless of how many fields actually arrived. Mirrors the list
   encoder's symmetric guard so both wide-column encoders fail closed
   on the same shape of corruption.

   Caller audit: HandleHashMeta has zero production callers in the
   repo (the decoder driver pipeline is not yet in tree); all test
   callers feed encodeHashMetaValue(N) with non-negative small ints,
   so none trip the new check. The new error case wraps the same
   ErrRedisInvalidHashMeta sentinel the existing length-check returns,
   so caller-side error-classification stays unchanged — purely a
   fail-closed expansion of an existing error class.

2. New regression test TestRedisDB_HashRejectsOverflowingMetaValue
   pins the new guard with a 1<<63 input.

3. New regression test TestRedisDB_ListItemsWithoutMetaStillEmitsFile
   closes the round-1 coverage gap: !lst|itm| records may arrive
   without a paired !lst|meta| in dump order, and the encoder must
   still emit the per-list JSON without firing the length-mismatch
   warning (metaSeen=false means no "declared" baseline to compare
   against).

4. Documented flushWideColumnDir's fail-fast-per-type error policy.
   The earlier "continue past per-key flush errors" alternative trades
   a noisy hard failure for a quiet soft failure that survives
   `find -name '*.json'` scrutiny — calling that out so future
   contributors don't "fix" the early return to a continue.

go test -race ./internal/backup, golangci-lint, and go vet all clean.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

Round 2 — claude bot の round-1 レビュー4件のうち2件 (#1, #3) を対応:

#1 Hash encoder overflow guard backportHandleHashMeta が uint64 declared field count の high bit を silent に int64 ラップしていた問題を、list encoder と同形のガードに修正。コラプト store で偽陽性 mismatch warning を出さなくなる。

Caller audit: HandleHashMeta は production caller ゼロ (decoder driver はまだ in-tree でない)、テスト呼び出しは全て小さい正整数。新規エラーは既存の ErrRedisInvalidHashMeta クラスに wrap されるので caller 側のエラー分類は変わらない — fail-closed 拡張のみ。

#3 items-without-meta regression test!lst|itm| レコードが !lst|meta| なしに到着するケースで、JSON ファイルが書かれること & length mismatch warning が出ないことを pin する TestRedisDB_ListItemsWithoutMetaStillEmitsFile を追加。

加えて:

スキップ:

  • Finding Dependency Dashboard #4 (flushWideColumnDir のファイル分割) — 純粋な organisation の好み、PR description に置き場所の理由を明記済み

go test -race ./internal/backupgolangci-lintgo vet すべてクリーン (1.4s)。

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 1m 34s —— View job


Code Review (Round 2)


Round 2 is clean. All four cited findings are addressed correctly and the implementations are sound.

Finding #1 — Hash overflow guard (redis_hash.go:83-89)

The backport mirrors the list encoder exactly:

rawLen := binary.BigEndian.Uint64(value)
if rawLen > math.MaxInt64 {
    return cockroachdberr.Wrapf(ErrRedisInvalidHashMeta,
        "declared len %d overflows int64", rawLen)
}
st.declaredLen = int64(rawLen) //nolint:gosec // bounds-checked above

✅ Correct. The //nolint:gosec // bounds-checked above annotation is in the right place (on the now-safe cast, not the check itself). The new test (TestRedisDB_HashRejectsOverflowingMetaValue) uses 1<<63 = 0x8000_0000_0000_0000 > math.MaxInt64 — the minimal distinguishing input.

Minor style note: The inline comment block (redis_hash.go:76-82) includes "Round-2 review on PR #755". The CLAUDE.md convention explicitly asks not to reference tasks, fixes, or issues in code comments because they rot. The PR description is the right home for that attribution. Worth a cleanup pass before merge.

Finding #3 — Items-without-meta test (redis_list_test.go:357-378)

func TestRedisDB_ListItemsWithoutMetaStillEmitsFile(t *testing.T) {
    ...
    if err := db.HandleListItem(listItemKey("q", 0), []byte("v")); err != nil { ... }
    if err := db.Finalize(); err != nil { ... }
    // asserts JSON file written AND no redis_list_length_mismatch
}

✅ Covers both required invariants: the file is produced, and no mismatch warning fires (metaSeen=false means there is no declared baseline). listItemKey("q", 0) encodes seq=0 as the sortable-int64 value 0 ^ math.MinInt64 — the smallest valid sequence, correctly triggering the same parse path as a real store snapshot.

Finding #2flushWideColumnDir fail-fast comment (redis_string.go:329-337)

✅ The comment is accurate and well-targeted. The "fail-fast per type" framing precisely describes the semantics: Finalize continues across types but aborts within a type on first error. The "survives find -name '*.json' scrutiny" rationale is the right thing to call out for future maintainers.

Finding #4 — File placement (flushWideColumnDir in redis_string.go)

Skipped as agreed. No concerns.


Summary: All four findings addressed. The only thing worth a cleanup before merge is the "Round-2 review on PR #755" attribution in the code comment (redis_hash.go:82) — move it to the commit message / PR description to keep the source free of time-decaying references.

@bootjp bootjp merged commit b59e587 into main May 15, 2026
9 checks passed
@bootjp bootjp deleted the feat/backup-phase0a-redis-list branch May 15, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant