backup: Redis list encoder (Phase 0a) by bootjp · Pull Request #755 · bootjp/elastickv

bootjp · 2026-05-15T13:05:00Z

Summary

Adds the !lst|meta| + !lst|itm| → lists/<key>.json encoder per the Phase 0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md).
Item order in the dump is ascending sortable-int64 seq, matching the live store's LPUSH-leftmost-to-RPUSH-rightmost read order (verified by submitting items out-of-order in the round-trip test).
TTL records on !redis|ttl| for a list user key fold into the list JSON's expire_at_ms field via the HandleTTL switch — same policy as the hash encoder (no separate sidecar).
!lst|meta|d|... deltas and !lst|claim|... POP tombstones are silently skipped: !lst|itm| keys are the source of truth at backup time, and the live read path (fetchListRange in adapter/redis.go:3281) doesn't consult claims either — POPs delete the item key in the same OCC commit.
Extracted flushWideColumnDir generic helper into redis_string.go. Both flushHashes and flushLists now call it, eliminating the ~30-line "create subdir + sort user keys + iterate" duplication that would otherwise be repeated for every wide-column encoder.

Test plan

go test -race ./internal/backup/... (all 5 list tests + existing hash/string suite pass, 1.3s)
golangci-lint run ./internal/backup/... (0 issues)
Tests cover: LPUSH ordering preserved across out-of-order HandleListItem submissions; empty list still emits a file (LLEN==0 observable); TTL inlined from scan index; length-mismatch warning shape; binary item base64 envelope; meta-delta key silently skipped; claim key silently skipped; malformed meta length rejected; truncated item key rejected.

Self-review (5 lenses)

Data loss: encoder writes via writeFileAtomic (tmp+rename). No write path drops records — only !lst|meta|d| / !lst|claim| records are skipped, and both are non-source-of-truth in the live read path.
Concurrency / distributed: RedisDB.Handle* methods are documented not goroutine-safe; the decoder pipeline is sequential per scope. No new locks or shared state.
Performance: items buffered in a map[int64][]byte keyed by seq. Single allocation per item via bytes.Clone. Sort at flush is O(n log n) per list; lists are bounded by maxWideColumnItems on the live side.
Data consistency: meta declared_len mismatch warns via redis_list_length_mismatch; ascending-seq order is deterministic across runs (Go's randomised map iteration is mitigated by the sort).
Test coverage: 9 named tests cover every public handler plus the parse-error paths.

Phase 0a remaining after this PR

redis_set.go / redis_zset.go / redis_stream.go encoders
cmd/elastickv-snapshot-decode/ CLI binary
cmd/elastickv-snap-token helper
docs/operations/snapshot_restore.md runbook

Summary by CodeRabbit

New Features
- Added Redis list backup output (per-list JSON), preserving LPUSH order and inline TTL; binary/non-UTF-8 items are serialized safely.
Bug Fixes
- Prevented integer overflow when reading hash metadata, avoiding spurious length-mismatch warnings.
- Emit length-mismatch warnings when declared vs. decoded counts differ.
Tests
- Expanded tests for list round-trips, TTL inlining, binary items, length-mismatch, and malformed-meta handling.
Refactor
- Generalized wide-column flush logic for deterministic per-type directory output.

Adds the !lst|meta| + !lst|itm| → lists/<key>.json encoder per the Phase 0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md). Wire format mirrors store/list_helpers.go: - !lst|meta|<userKey> -> 24-byte (Head, Tail, Len) blob - !lst|itm|<userKey><seq(8)> -> raw item bytes (binary-safe) - !lst|meta|d|... -> meta delta. Skipped silently (same policy as hash deltas): !lst|itm| keys are the source of truth at backup time and the delta arithmetic does not need to be replayed. - !lst|claim|... -> POP tombstone for OCC uniqueness. The live read path (rangeList → fetchListRange in redis.go:3281) does NOT consult claims; POPs Del the underlying item key in the same OCC commit, so a snapshot taken after a POP has no item record for the popped seq. The encoder therefore skips claim keys entirely. Output JSON shape matches the design's other wide-column types: {"format_version": 1, "items": [..., {"base64":"..."}, ...], "expire_at_ms": null | <ms>} Items are emitted in ascending sortable-int64 seq order, which is the live store's left-to-right read order (LPUSH writes seqs at decreasing positions, RPUSH at increasing). Non-UTF-8 item bytes round-trip via the typed {"base64":"..."} envelope, sharing marshalRedisBinaryValue with the hash encoder. TTL records on !redis|ttl| route into the list JSON's expire_at_ms field via the HandleTTL switch (mirrors the hash fold-into-record policy; lists carry no separate sidecar). Refactor: extract flushWideColumnDir generic helper into redis_string.go. Both flushHashes and flushLists now call it, eliminating ~30 lines of "sort user keys + iterate" duplication that dupl would otherwise flag on every new wide-column encoder. flushHashes diff is the inline substitution; behaviour is unchanged. Tests cover: LPUSH ordering preserved across out-of-order HandleListItem submissions; empty list still emits a file (LLEN==0 is observable); TTL inlined from scan index; length-mismatch warning shape; binary item base64 envelope; meta-delta key silently skipped; claim key silently skipped; malformed meta length rejected; truncated item key rejected. Phase 0a remaining after this PR: redis_set / redis_zset / redis_stream encoders, the cmd/elastickv-snapshot-decode CLI, the cmd/elastickv-snap-token helper, and the docs/operations runbook.

coderabbitai · 2026-05-15T13:05:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c754529d-4671-4cae-aa45-47fe5ad1d46b

📥 Commits

Reviewing files that changed from the base of the PR and between abb8149 and 5cafc8a.

📒 Files selected for processing (4)

internal/backup/redis_hash.go
internal/backup/redis_hash_test.go
internal/backup/redis_list_test.go
internal/backup/redis_string.go

📝 Walkthrough

Walkthrough

Adds a Redis list backup encoder producing per-list JSON files, inlines list TTLs, introduces a generic wide-column directory flush helper shared by hashes and lists, refactors hash flushing to use it, and adds tests including a hash-meta overflow regression.

Changes

Redis list backup encoder with wide-column flushing

Layer / File(s)	Summary
Wide-column directory flushing infrastructure `internal/backup/redis_string.go`, `internal/backup/redis_hash.go`	Adds `redisKindList`, `RedisDB.lists`, TTL routing for lists, `flushWideColumnDir` generic helper, and refactors `flushHashes` to use the helper.
Redis list codec: handlers, parsing, and output `internal/backup/redis_list.go`, `internal/backup/redis_string.go`	Implements list handlers (`HandleListMeta`, `HandleListItem`, claim/delta no-ops), per-list buffering, sortable-seq decoding, length-mismatch warning emission, `marshalListJSON` and atomic per-list JSON writes with optional `expire_at_ms`.
Comprehensive Redis list test suite & hash regression `internal/backup/redis_list_test.go`, `internal/backup/redis_hash_test.go`	Adds tests validating LPUSH ordering, empty-list file emission, TTL inlining, length-mismatch warnings, binary item base64 envelope, delta/claim skipping, malformed meta/key errors, and a test ensuring overflowing hash meta returns `ErrRedisInvalidHashMeta`.

Sequence Diagram(s)

sequenceDiagram
  participant Snapshot
  participant RedisDB
  participant Handler (ListMeta/ListItem)
  participant listState
  participant flushWideColumnDir
  participant writeListJSON
  Snapshot->>RedisDB: stream records (meta/item/TTL)
  RedisDB->>Handler: dispatch to HandleListMeta or HandleListItem
  Handler->>listState: create/update per-list buffer (declaredLen, items, TTL)
  RedisDB->>flushWideColumnDir: finalize lists -> iterate keys
  flushWideColumnDir->>writeListJSON: per-key callback to marshal and write JSON
  writeListJSON-->>Snapshot: atomic lists/<encoded>.json written

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

bootjp/elastickv#713: prior work on Redis wide-column encoder wiring that this change builds upon.

Poem

🐰 I hop through bytes and sort each seq,

lists align where items seek,
Wide-column paths now flush with care,
TTLs tucked safely in JSON there. 🎋

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'backup: Redis list encoder (Phase 0a)' directly and accurately reflects the main change—implementing a Redis list encoder as part of Phase 0a of the backup feature, which is clearly the primary focus of all changes.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/backup-phase0a-redis-list

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-05-15T13:12:42Z

@claude review

claude · 2026-05-15T13:13:01Z

Claude finished @bootjp's task in 5m 45s —— View job

Code Review

Read changed files
Review redis_string.go (flushWideColumnDir helper)
Review redis_hash.go (refactored flushing)
Review redis_list.go (new list encoder)
Review redis_list_test.go (test coverage)
Check alignment with design doc
Post review findings

Overall this is a well-structured PR. The list encoder faithfully mirrors the hash encoder's patterns, the key format aligns correctly with the live store, and the JSON shape matches the design doc ({"format_version": 1, "items": [...], "expire_at_ms": null}). Below are my findings in priority order.

Findings

1. Inconsistent meta-value overflow handling between hash and list encoders (Minor)

redis_hash.go:75 silently wraps a uint64 field-count into int64 with no overflow check:

st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design

redis_list.go:112-115 explicitly guards against the same overflow:

if rawLen > math.MaxInt64 {
    return cockroachdberr.Wrapf(ErrRedisInvalidListMeta, "declared len %d overflows int64", rawLen)
}

The list encoder is more defensive (good), but the inconsistency is worth backporting. With the hash encoder, a live store that somehow stores a field count with the high bit set would produce a negative declaredLen, causing every redis_hash_length_mismatch warning to fire incorrectly.

Fix this →

2. `flushWideColumnDir` aborts the entire type on first write error (Observation)

redis_string.go:341-345:

for _, uk := range userKeys {
    if err := flushOne(dir, uk, states[uk]); err != nil {
        return err  // ← subsequent user keys are not written
    }
}

If writing list "bar" fails, lists "baz" through "zoo" are never written. Finalize continues with other types (flushHashes, etc.), but within a single type, subsequent keys are skipped silently. This is not a regression — the pre-refactoring inline loop had identical semantics — but it's worth a comment noting the intentional "fail-fast per type" policy so future readers don't "fix" it to continue (which could produce a partially-written dump that's harder to detect as corrupt).

3. Missing test: items-without-meta edge case (Minor)

The test suite covers meta-without-items (TestRedisDB_ListEmptyListStillEmitsFile) but not items-without-meta — where !lst|itm| records arrive without a !lst|meta| record. In that case listState creates an entry with metaSeen=false, declaredLen=0, flushLists writes a JSON file, and no mismatch warning fires (because st.metaSeen is false). This is intentional and consistent with the hash encoder, but given the 9-test suite's explicit coverage of parse-error paths, a one-liner round-trip test for this case would close the gap.

Fix this →

4. `flushWideColumnDir` lives in `redis_string.go` (Organizational)

The helper is shared by flushHashes and flushLists (and will be reused by the set/zset/stream encoders), but it's defined in the string encoder file. A file named redis_widecolumn.go (or redis_common.go) would make the dependency clearer to future contributors. That said, the PR description calls this placement out explicitly and it's a purely organizational concern — not blocking.

Correctness Checks Passed

Key format alignment ✓
!lst|meta|<userKey> (no length prefix, matches store.ListMetaKey) and !lst|itm|<userKey><seq(8)> (matches store.ListItemKey). The decoder's parseListItemKey correctly inverts the encoder by taking the last 8 bytes as seq — the mapping (userKey, seq) → item_key is a bijection, so no ambiguity.
Meta value layout ✓
value[16:24] reads Len from the 24-byte [Head(8)][Tail(8)][Len(8)] blob, matching store.MarshalListMeta.
Sortable-int64 decoding ✓
int64(rawSeq) ^ math.MinInt64 correctly undoes the live store's sign-flip encoding; ascending sort on int64 seq yields LPUSH-leftmost-first order.
Delta and claim records ✓
ListMetaDeltaPrefix = "!lst|meta|d|" is correctly checked before ListMetaPrefix = "!lst|meta|" (since delta is a superset prefix, the order matters). Claim keys are discarded via the no-op HandleListClaim.
TTL inlining ✓
HandleTTL routes redisKindList to listState(userKey).expireAtMs. The test calls HandleTTL([]byte("k"), ...) after HandleListMeta(listMetaKey("k"), ...), and listState sets kindByKey["k"] = redisKindList when meta is processed — so the routing works correctly.
JSON shape vs design doc ✓
{"format_version": 1, "items": [...], "expire_at_ms": null} matches line 330 of the design doc.
Generic flushWideColumnDir ✓
Type inference correctly resolves T as *redisHashState / *redisListState for the two call sites.

gemini-code-assist

Code Review

This pull request introduces a Redis list encoder to the backup system. It includes the implementation of HandleListMeta, HandleListItem, and flushLists to process and store Redis list data as JSON. Additionally, it refactors the existing hash flushing logic into a generic flushWideColumnDir helper to be shared across wide-column types. Comprehensive tests are added to verify the list encoding logic, including TTL handling and error cases. I have no feedback to provide as there were no review comments to assess.

…rage Claude bot round-1 review surfaced two follow-ups worth backporting from the list encoder to the hash encoder for cross-encoder symmetry, plus one coverage gap and one design-intent comment. 1. HandleHashMeta now rejects uint64 declared-field counts with the high bit set instead of silently wrapping them to a negative int64 declaredLen. Without the guard, a corrupted store would fire spurious `redis_hash_length_mismatch` warnings on every flush regardless of how many fields actually arrived. Mirrors the list encoder's symmetric guard so both wide-column encoders fail closed on the same shape of corruption. Caller audit: HandleHashMeta has zero production callers in the repo (the decoder driver pipeline is not yet in tree); all test callers feed encodeHashMetaValue(N) with non-negative small ints, so none trip the new check. The new error case wraps the same ErrRedisInvalidHashMeta sentinel the existing length-check returns, so caller-side error-classification stays unchanged — purely a fail-closed expansion of an existing error class. 2. New regression test TestRedisDB_HashRejectsOverflowingMetaValue pins the new guard with a 1<<63 input. 3. New regression test TestRedisDB_ListItemsWithoutMetaStillEmitsFile closes the round-1 coverage gap: !lst|itm| records may arrive without a paired !lst|meta| in dump order, and the encoder must still emit the per-list JSON without firing the length-mismatch warning (metaSeen=false means no "declared" baseline to compare against). 4. Documented flushWideColumnDir's fail-fast-per-type error policy. The earlier "continue past per-key flush errors" alternative trades a noisy hard failure for a quiet soft failure that survives `find -name '*.json'` scrutiny — calling that out so future contributors don't "fix" the early return to a continue. go test -race ./internal/backup, golangci-lint, and go vet all clean.

bootjp · 2026-05-15T20:21:22Z

@claude review

Round 2 — claude bot の round-1 レビュー4件のうち2件 (#1, #3) を対応:

#1 Hash encoder overflow guard backport — HandleHashMeta が uint64 declared field count の high bit を silent に int64 ラップしていた問題を、list encoder と同形のガードに修正。コラプト store で偽陽性 mismatch warning を出さなくなる。

Caller audit: HandleHashMeta は production caller ゼロ (decoder driver はまだ in-tree でない)、テスト呼び出しは全て小さい正整数。新規エラーは既存の ErrRedisInvalidHashMeta クラスに wrap されるので caller 側のエラー分類は変わらない — fail-closed 拡張のみ。

#3 items-without-meta regression test — !lst|itm| レコードが !lst|meta| なしに到着するケースで、JSON ファイルが書かれること & length mismatch warning が出ないことを pin する TestRedisDB_ListItemsWithoutMetaStillEmitsFile を追加。

加えて:

TestRedisDB_HashRejectsOverflowingMetaValue — Configure Renovate #1 の新ガード regression test
flushWideColumnDir の fail-fast-per-type error policy をコメントで明示 (Bot Finding Update dependency go to v1.21.4 #2)

スキップ:

Finding Dependency Dashboard #4 (flushWideColumnDir のファイル分割) — 純粋な organisation の好み、PR description に置き場所の理由を明記済み

go test -race ./internal/backup、golangci-lint、go vet すべてクリーン (1.4s)。

claude · 2026-05-15T20:21:40Z

Claude finished @bootjp's task in 1m 34s —— View job

Code Review (Round 2)

Read changed files
Review hash overflow guard backport (Configure Renovate #1)
Review items-without-meta regression test (Update docker/setup-qemu-action action to v3 #3)
Review new hash overflow test
Review fail-fast comment on flushWideColumnDir
Post findings

Round 2 is clean. All four cited findings are addressed correctly and the implementations are sound.

Finding #1 — Hash overflow guard (`redis_hash.go:83-89`)

The backport mirrors the list encoder exactly:

rawLen := binary.BigEndian.Uint64(value)
if rawLen > math.MaxInt64 {
    return cockroachdberr.Wrapf(ErrRedisInvalidHashMeta,
        "declared len %d overflows int64", rawLen)
}
st.declaredLen = int64(rawLen) //nolint:gosec // bounds-checked above

✅ Correct. The //nolint:gosec // bounds-checked above annotation is in the right place (on the now-safe cast, not the check itself). The new test (TestRedisDB_HashRejectsOverflowingMetaValue) uses 1<<63 = 0x8000_0000_0000_0000 > math.MaxInt64 — the minimal distinguishing input.

Minor style note: The inline comment block (redis_hash.go:76-82) includes "Round-2 review on PR #755". The CLAUDE.md convention explicitly asks not to reference tasks, fixes, or issues in code comments because they rot. The PR description is the right home for that attribution. Worth a cleanup pass before merge.

Finding #3 — Items-without-meta test (`redis_list_test.go:357-378`)

func TestRedisDB_ListItemsWithoutMetaStillEmitsFile(t *testing.T) {
    ...
    if err := db.HandleListItem(listItemKey("q", 0), []byte("v")); err != nil { ... }
    if err := db.Finalize(); err != nil { ... }
    // asserts JSON file written AND no redis_list_length_mismatch
}

✅ Covers both required invariants: the file is produced, and no mismatch warning fires (metaSeen=false means there is no declared baseline). listItemKey("q", 0) encodes seq=0 as the sortable-int64 value 0 ^ math.MinInt64 — the smallest valid sequence, correctly triggering the same parse path as a real store snapshot.

Finding #2 — `flushWideColumnDir` fail-fast comment (`redis_string.go:329-337`)

✅ The comment is accurate and well-targeted. The "fail-fast per type" framing precisely describes the semantics: Finalize continues across types but aborts within a type on first error. The "survives find -name '*.json' scrutiny" rationale is the right thing to call out for future maintainers.

Finding #4 — File placement (`flushWideColumnDir` in `redis_string.go`)

Skipped as agreed. No concerns.

Summary: All four findings addressed. The only thing worth a cleanup before merge is the "Round-2 review on PR #755" attribution in the code comment (redis_hash.go:82) — move it to the commit message / PR description to keep the source free of time-decaying references.

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

bootjp merged commit b59e587 into main May 15, 2026
9 checks passed

bootjp deleted the feat/backup-phase0a-redis-list branch May 15, 2026 20:57

bootjp mentioned this pull request May 15, 2026

backup: Redis set encoder (Phase 0a) #758

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backup: Redis list encoder (Phase 0a)#755

backup: Redis list encoder (Phase 0a)#755
bootjp merged 2 commits into
mainfrom
feat/backup-phase0a-redis-list

bootjp commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

bootjp commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

bootjp commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Self-review (5 lenses)

Phase 0a remaining after this PR

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

bootjp commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Findings

1. Inconsistent meta-value overflow handling between hash and list encoders (Minor)

2. flushWideColumnDir aborts the entire type on first write error (Observation)

3. Missing test: items-without-meta edge case (Minor)

4. flushWideColumnDir lives in redis_string.go (Organizational)

Correctness Checks Passed

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bootjp commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (Round 2)

Finding #1 — Hash overflow guard (redis_hash.go:83-89)

Finding #3 — Items-without-meta test (redis_list_test.go:357-378)

Finding #2 — flushWideColumnDir fail-fast comment (redis_string.go:329-337)

Finding #4 — File placement (flushWideColumnDir in redis_string.go)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bootjp commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading

claude Bot commented May 15, 2026 •

edited

Loading

2. `flushWideColumnDir` aborts the entire type on first write error (Observation)

4. `flushWideColumnDir` lives in `redis_string.go` (Organizational)

claude Bot commented May 15, 2026 •

edited

Loading

Finding #1 — Hash overflow guard (`redis_hash.go:83-89`)

Finding #3 — Items-without-meta test (`redis_list_test.go:357-378`)

Finding #2 — `flushWideColumnDir` fail-fast comment (`redis_string.go:329-337`)

Finding #4 — File placement (`flushWideColumnDir` in `redis_string.go`)