feat(encryption): Stage 6A — §6.3 Applier scaffolding + FSM WithEncryption wiring#765
Conversation
…ter-registry insert First slice of Stage 6A per the PR #762 plan. Lands the concrete kv.EncryptionApplier with the §4.1 four-case dispatch for ApplyRegistration; ApplyBootstrap and ApplyRotation return a typed ErrKEKNotConfigured marker deferred to Stage 6B. ## What this commit ships - internal/encryption/applier.go (new) - WriterRegistryStore interface — abstract Pebble Get/Set over the !encryption|writers|<dek_id>|<uint16(node_id)> namespace. Kept separate from store.MVCCStore because writer-registry rows live outside the MVCC namespace (no commit timestamps, no visibility, no retention). - Applier struct + NewApplier constructor; nil-registry refusal at construction time so wiring mistakes surface at startup rather than deep in Raft apply. - ApplyRegistration: §4.1 case 1 (insert) / case 2 (monotonic advance) / case 3 (rollback halt) / case 4 (uint16 collision halt) dispatch. The halt paths are wrapped with ErrEncryptionApply so the kv/fsm seam routes them through the HaltApply pipeline. - ApplyBootstrap and ApplyRotation: return ErrKEKNotConfigured. Stage 6B will swap these for the real KEK-unwrap + sidecar + keystore path. - internal/encryption/errors.go - Add ErrKEKNotConfigured sentinel with godoc that references the PR #762 Stage 6 plan's safety boundary decomposition (RPC gate primary, apply-time halt secondary defense-in-depth). - internal/encryption/applier_test.go (new) - Map-backed WriterRegistryStore test fake. - TestNewApplier_RejectsNilRegistry — construction-time guard. - TestApplyRegistration_Case1_Insert — fresh row. - TestApplyRegistration_Case2_MonotonicAdvance — same full_node_id, strictly greater local_epoch, FirstSeen preserved. - TestApplyRegistration_Case3_Rollback — equal AND lower local_epoch under same full_node_id both halt with ErrEncryptionApply. - TestApplyRegistration_Case4_Uint16Collision — two distinct full_node_ids whose lower 16 bits collide halt with ErrEncryptionApply. - TestApplyRegistration_StoreErrorPropagates — synthetic Get/Set errors propagate (not swallowed); the kv/fsm dispatch will mark them with ErrEncryptionApply regardless of upstream marking. - TestApplyBootstrap_ReturnsKEKNotConfigured / TestApplyRotation_ReturnsKEKNotConfigured — stub contract for the 6B-deferred paths. ## What this commit does NOT ship (deferred to a follow-up commit in this PR) - kv.NewKvFSMWithHLC(..., kv.WithEncryption(applier)) wiring in main.go's FSM construction site (main.go:694). - Concrete WriterRegistryStore implementation backed by the FSM's Pebble store. This needs either a new method on store.MVCCStore or a separate handle from PebbleStore; the design choice is the bulk of the second commit. The Applier package compiles and tests pass on its own; the wiring commit adds it to the FSM construction path. ## Stage 5D safety boundary preserved registerEncryptionAdminServer is NOT touched in this PR. Mutating EncryptionAdmin RPCs continue to refuse with FailedPrecondition exactly as Stage 5D left them. The PR plan's safety invariant — do not re-enable mutator wiring until both the applier AND a KEK unwrapper are present — holds: this PR lands the applier but not the KEK plumbing (6B); 6B re-enables the mutator wiring gated on (--encryption-enabled AND KEKConfigured()). ## Five-lens self-review 1. Data loss — applier's writes go through WriterRegistryStore.SetRegistryRow, which the production wiring (next commit) will route through pebble.Sync. No silent skip path: every error return is propagated; the kv/fsm dispatch marks them with ErrEncryptionApply for HaltApply. Map-backed test fake makes Set durable trivially. 2. Concurrency / distributed failures — Applier carries no in-memory state between calls; the WriterRegistryStore is the single source of truth. FSM apply already serialises 0x03 entries via applyMu in kv/fsm.go, so the §4.1 case-dispatch read-modify-write is safe under that lock. 3. Performance — ApplyRegistration does one Get + at most one Set per call. The map-backed test fake exercises the hot path; production Pebble will add the sync cost but that is bounded. 4. Data consistency — §4.1 case 1/2/3/4 dispatch matches the design doc. Case 3 (rollback) halts apply — a 6C startup guard will prevent the condition from being reachable in production; until then the apply-time halt is the load-bearing backstop. 5. Test coverage — 7 unit tests covering all four §4.1 cases, construction refusal, store-error propagation, and the 6B-deferred stub contracts. ## Verification - go test -race -timeout=60s ./internal/encryption/... — PASS - golangci-lint run ./internal/encryption/... — 0 issues
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR adds the encryption applier runtime, which validates and persists FSM encryption events (DEK registrations) to durable storage. It introduces the ChangesEncryption Applier and Writer Registry
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the Applier component and WriterRegistryStore interface to manage encryption state, specifically implementing writer registration logic with checks for monotonic epochs and NodeID collisions. It includes comprehensive unit tests and stub implementations for bootstrap and rotation. Feedback suggests renaming the localEpochMaskU64 constant to nodeIDMask to more accurately describe its role in masking the FullNodeID.
| // the gate, or in a forensic / corruption scenario) still halts | ||
| // rather than silently advancing setApplied. | ||
| func (a *Applier) ApplyRegistration(p fsmwire.RegistrationPayload) error { | ||
| key := RegistryKey(p.DEKID, uint16(p.FullNodeID&localEpochMaskU64)) //nolint:gosec // masked to 16 bits; G115 cannot trace the bitwise narrowing |
There was a problem hiding this comment.
The constant name localEpochMaskU64 is misleading here. This mask is being applied to p.FullNodeID to extract its lower 16 bits for the registry key, as specified in §4.1 of the design. The LocalEpoch field is already a uint16 and does not require masking. Please rename this constant to something more descriptive, such as nodeIDMask or nodeIDMaskU16. Additionally, verify that the NodeID from the decoded context matches the NodeID of the current change being processed to prevent incorrect metadata reuse.
| key := RegistryKey(p.DEKID, uint16(p.FullNodeID&localEpochMaskU64)) //nolint:gosec // masked to 16 bits; G115 cannot trace the bitwise narrowing | |
| key := RegistryKey(p.DEKID, uint16(p.FullNodeID&nodeIDMask)) //nolint:gosec // masked to 16 bits; G115 cannot trace the bitwise narrowing |
References
- When processing a configuration change, verify that the NodeID from the decoded context matches the NodeID of the current change being processed to prevent incorrect metadata reuse.
| return errors.Wrap(ErrKEKNotConfigured, "applier: rotation requires KEK unwrapper (Stage 6B)") | ||
| } | ||
|
|
||
| const localEpochMaskU64 uint64 = 0xFFFF |
There was a problem hiding this comment.
As noted in the comment for line 117, this constant should be renamed to reflect its purpose of masking the FullNodeID. Additionally, since it is a constant used for bitwise operations, it can be untyped for better flexibility.
| const localEpochMaskU64 uint64 = 0xFFFF | |
| const nodeIDMask = 0xFFFF |
… + FSM WithEncryption wiring Second slice of Stage 6A per the PR #762 plan. Wires the §6.3 Applier (landed in part 1) into the FSM construction path with a Pebble-backed WriterRegistryStore adapter so the 0x03 RegisterEncryptionWriter dispatch path is fully functional end-to-end. Mutator-RPC wiring in registerEncryptionAdminServer remains OFF per Stage 5D's posture — re-enabled in Stage 6B alongside KEK plumbing gated on (--encryption-enabled AND KEKConfigured()). ## What this commit ships ### store/encryption_glue.go (additions) - pebbleWriterRegistry — internal adapter satisfying encryption.WriterRegistryStore. Both Get and Set hold dbMu for read only (the apply path's applyMu in kv/fsm.go is the authoritative serialiser for the §4.1 case-dispatch read-modify-write). Set uses pebble.Sync so the row is durable before FSM apply returns — a crash between FSM dispatch returning success and Pebble flushing the WAL would otherwise leave the Raft log claiming the entry applied but the registry row missing on restart, violating §4.1 case 2's monotonic-epoch invariant. - WriterRegistryFor(MVCCStore) factory — type-asserts against *pebbleStore and returns the adapter, or ErrUnsupportedStoreForWriterRegistry if the store is not the Pebble-backed implementation. Construction-time refusal so the failure surface is "binary refuses to start" rather than "FSM apply panics inside Raft loop". - ErrUnsupportedStoreForWriterRegistry — new sentinel. ### store/writer_registry_test.go (new) Three table-driven unit tests: - TestWriterRegistryFor_RoundTrip — fresh Pebble store, missing row reads as (nil, false, nil), Set writes, Get returns byte-identical value. - TestWriterRegistryFor_Overwrite — Set on existing key replaces the value; no MVCC versioning, no append. - TestWriterRegistryFor_RejectsNonPebble — non-Pebble MVCCStore stub returns ErrUnsupportedStoreForWriterRegistry at construction. ### main.go (FSM construction site, line 694) Replaces `sm := kv.NewKvFSMWithHLC(st, clock)` with the applier-wired form: reg, err := store.WriterRegistryFor(st) if err != nil { _ = st.Close() return nil, nil, errors.Wrapf(err, ...) } applier, err := encryption.NewApplier(reg) if err != nil { _ = st.Close() return nil, nil, errors.Wrapf(err, ...) } sm := kv.NewKvFSMWithHLC(st, clock, kv.WithEncryption(applier)) Both error paths Close the store before returning so a construction-time failure doesn't leak the Pebble handle. Adds `internal/encryption` to the import list (the only new import in main.go). ## What this commit does NOT do - registerEncryptionAdminServer is unchanged. Mutating EncryptionAdmin RPCs continue to refuse with FailedPrecondition exactly as Stage 5D left them. The existing TestEncryptionAdmin_MutatingRPCRefusedUntilStage6 test (both `sidecar_empty` and `sidecar_set` subcases) passes unchanged. - The applier's ApplyBootstrap / ApplyRotation paths still return ErrKEKNotConfigured. Stage 6B re-enables mutator wiring AND threads in the KEK unwrapper at the same time so a re-enabled mutator never hits the ErrKEKNotConfigured halt path. ## Stage 5D safety boundary The two PR760-merged regression tests pass on this branch: - TestEncryptionAdminFullNodeID_DistinctPerRaftId — PASS - TestEncryptionAdmin_MutatingRPCRefusedUntilStage6/sidecar_empty — PASS - TestEncryptionAdmin_MutatingRPCRefusedUntilStage6/sidecar_set — PASS - TestRegisterEncryptionAdminServer_Registers — PASS The new FSM wiring path does not weaken any of these. The applier exists internally; the gRPC layer continues to refuse the operator-facing mutators that would route into it. ## Five-lens self-review 1. Data loss — pebbleWriterRegistry uses pebble.Sync; FSM apply only returns success after the row is durable. Construction-time error paths Close the Pebble store on failure (no handle leak). 2. Concurrency / distributed failures — dbMu.RLock is sufficient because the apply path's applyMu in kv/fsm.go serialises all 0x03 dispatches; the read- modify-write pattern in Applier.ApplyRegistration is safe under that lock. 3. Performance — one Get + at most one Set per ApplyRegistration. pebble.Sync adds wal-flush cost, bounded. 4. Data consistency — writer-registry rows live outside the MVCC namespace (no commit timestamps, no MVCC visibility). Pebble Sync semantics match the §4.1 contract that a registered writer's first nonce-issuing write under that DEK is durably preceded by its registry row. 5. Test coverage — 3 new store-package unit tests for the adapter; the part-1 applier tests already cover the applier-side dispatch; Stage 5D regression tests confirm no operator-surface change. ## Verification - go test -race -timeout=60s -run 'TestWriterRegistryFor' ./store/... — PASS - go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS - go test -race -timeout=60s -run 'TestEncryptionAdmin|TestRegisterEncryptionAdminServer' . — PASS - go test -race -timeout=180s -run 'TestEncrypt|TestApplier|TestFSM|TestKvFSM|TestWriterRegistry' ./... — PASS - go build ./... — PASS - golangci-lint run ./... — 0 issues on touched files The full adapter test suite has a pre-existing 120s-timeout flake on race-mode (SQS reaper / HLC lease renewal long- running goroutines visible in the dump). The encryption- specific adapter tests pass cleanly when run in isolation; the flake is unrelated to this PR's changes (no adapter code is touched). Will run the suite again before marking the PR ready for review.
|
@claude review Stage 6A complete (2 commits) — applier scaffolding + Pebble-backed FSM wiring. Marked ready for review. What this PR ships
Stage 5D safety boundary preserved
The applier exists internally and dispatches end-to-end on direct Caller audit (semantic changes)
Verification
A 120s-timeout flake exists in the full adapter test suite under PlanThis is the first of six PRs (PR #762's 6A → 6F). 6B will follow with KEK plumbing + |
|
Claude finished @bootjp's task in 4m 29s —— View job Code Review: Stage 6A — §6.3 Applier scaffolding + FSM wiringTodo list
Five-lens review1. Data lossClean. 2. Concurrency / distributed failuresClean. The One structural note: 3. PerformanceOne Get + at most one Set per 4. Data consistencyThe §4.1 four-case dispatch is correctly implemented:
5. Test coverageComprehensive. All four §4.1 cases are covered. Issues
|
…lEpochMaskU64 → nodeIDMask gemini-code-assist round-1 review on commit e2475b8 flagged the constant name 'localEpochMaskU64' as misleading at the applier's writer-registry-key construction site. The mask is applied to p.FullNodeID to extract its lower 16 bits per §4.1's registry-key shape — not to a local_epoch field, which is already uint16 and needs no masking. The mask happens to share the value 0xFFFF with the unrelated local-epoch field width, so the type checker is happy, but a reader scanning the code for 'why are we masking local_epoch' would be confused. The original name was copied verbatim from adapter/encryption_admin.go (where it's used for local_epoch); the applier inherited the misnomer. ## Fix - Rename localEpochMaskU64 → nodeIDMask - Drop explicit uint64 typing so the cast at the call site is explicit (no implicit uint64 promotion) — also addresses gemini's secondary suggestion on the constant declaration. - Add a godoc explaining the rename motivation and the shared-value-different-purpose distinction. The numeric value (0xFFFF) and the mask behaviour are identical; this is a pure renaming. No callers changed (only one call site exists in applier.go). ## Caller audit Single rename inside internal/encryption/applier.go. No exports change. The other localEpochMaskU64 in adapter/encryption_admin.go remains correctly named for that file's local_epoch usage and is untouched. ## Verification - go build ./... — PASS - go test -race -timeout=60s -run 'TestApply|TestNewApplier|TestWriterRegistry' ./internal/encryption/... ./store/... — PASS - golangci-lint run ./internal/encryption/... — 0 issues
|
@claude review Round-2 (commit Also dropped explicit Pure rename, single call site, no exports changed. Verification
|
|
Claude finished @bootjp's task in 3m 15s —— View job Code Review: Stage 6A Round-2 —
|
| Severity | Location | Issue |
|---|---|---|
| Minor | applier.go:183-192 |
Trim review-history meta-reference from nodeIDMask godoc |
Everything else — §4.1 dispatch correctness, pebble.Sync durability, RLock justification, error propagation, construction-time guards, and test coverage — is in good shape. The Stage 5D safety boundary is preserved.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@main.go`:
- Around line 705-714: The error branches that return after failing to construct
the writer registry or encryption applier close only st and return, leaking
previously started group runtimes; update those branches (the error returns
after store.WriterRegistryFor(st) and after encryption.NewApplier(reg)) to run
the same cleanup used later: iterate over the previously appended runtimes slice
and stop/close each runtime (the same method used in the normal cleanup path),
then close st and return the wrapped error referencing g.id so no runtimes are
left running on partial startup.
In `@store/encryption_glue.go`:
- Around line 86-92: The function WriterRegistryFor currently allows a typed-nil
*pebbleStore to pass the type assertion and later panic; after the assertion for
ps in WriterRegistryFor, add an explicit nil check (if ps == nil) and return nil
with errors.WithStack(ErrUnsupportedStoreForWriterRegistry) to fail fast; ensure
the construction of &pebbleWriterRegistry{s: ps} only happens when ps is non-nil
so the registry never wraps a typed-nil pebbleStore.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 724bb42a-272e-464f-be38-62df0db2845d
📒 Files selected for processing (6)
internal/encryption/applier.gointernal/encryption/applier_test.gointernal/encryption/errors.gomain.gostore/encryption_glue.gostore/writer_registry_test.go
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8bbcd03f9b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if p.LocalEpoch <= prev.LastSeenLocalEpoch { | ||
| // case 3: epoch rollback / replay |
There was a problem hiding this comment.
Treat equal local_epoch as idempotent replay
Raft apply is designed to replay committed entries after restart until the latest FSM snapshot, so a RegisterEncryptionWriter entry can be applied again with the same (dek_id, full_node_id, local_epoch). With the current <= check, that legitimate replay is classified as rollback and returns ErrEncryptionApply, which halts apply and can leave a node stuck in a restart loop after any crash before snapshotting this entry. The rollback guard should only reject strictly smaller epochs; an equal epoch for the same writer should be a no-op.
Useful? React with 👍 / 👎.
…coderabbit Major×2 (typed-nil reject + runtimes cleanup) Three actionable findings on round-2 (commit cd76077), all addressed in this commit: ## codex P1 — Equal local_epoch must be idempotent, not halt Raft replays committed entries after restart until the latest FSM snapshot. A RegisterEncryptionWriter entry can legitimately be applied again with the same (dek_id, full_node_id, local_epoch). The round-2 case-3 check used <= which classified that legit replay as rollback and halted apply — pinning the node in a halt-on-replay loop after any crash before snapshotting the entry. Fix: split the case-3 branch into two paths. - equal epoch + same FullNodeID → no-op return (idempotent replay; row already reflects this epoch). - strictly-smaller epoch + same FullNodeID → halt with ErrEncryptionApply (true rollback). Updated test split: - TestApplyRegistration_Case3_Rollback now exercises ONLY the strictly-smaller path; the previous equal-epoch subtest was wrong and is removed. - TestApplyRegistration_Case2_IdempotentReplay (new) pins the no-op-on-replay invariant: apply the same entry twice, second apply returns nil, row is unchanged. Godoc + struct-doc updated to name the case-2-idempotent branch explicitly. ## coderabbit Major (line 92) — Typed-nil Pebble store A typed-nil (*pebbleStore)(nil) passes the type assertion `s.(*pebbleStore)` with ok=true but ps==nil; the adapter would then nil-deref on first call. Add `ps == nil` check after the assertion so the failure mode is "binary refuses to start" rather than "FSM apply panics deep in Raft loop". ## coderabbit Major (line 714) — Close prior runtimes on new error paths In `main.go::startRaftServers`, the two new error paths added in Part 2 (WriterRegistryFor failure and NewApplier failure) only closed `st` before returning. In multi-group startup with N > 1, this leaked previously-appended runtimes (each carries an etcd-raft Engine background goroutine, a Pebble handle, and a HLC lease renewer). The existing `buildRuntimeForGroup` error path at line 717+ already does the right thing (range + Close); my new branches matched the pattern. ## Caller audit (semantic changes) - Applier.ApplyRegistration: equal-epoch semantics flipped from "halt apply" to "no-op return nil". This is strictly fail-OPEN — the dispatch caller (kv/fsm_encryption.go:89) is the only production caller and it just propagates the return value, so a relaxation of error returns can only make the dispatch path MORE permissive, never less. Test caller (kv/fsm_encryption_test.go fakeApplier) is a mock and is unaffected. - store.WriterRegistryFor: stricter at construction time (typed-nil now rejected). The only production caller (main.go:705) passes a freshly-constructed store.NewPebbleStore return, which is never typed-nil, so observed behavior at the production call site is unchanged. - main.go error paths: internal control flow only. No callers — these are inside the function body. ## Verification - go build ./... — PASS - go test -race -timeout=60s -run 'TestApply|TestNewApplier|TestWriterRegistry|TestEncryptionAdmin|TestRegisterEncryptionAdminServer' ./internal/encryption/... ./store/... . — PASS - golangci-lint run ./internal/encryption/... ./store/... — 0 issues
|
@claude review Round-3 (commit codex P1 — Equal local_epoch must be idempotent (Raft replay), not haltRaft replays committed entries after restart until the latest FSM snapshot. The round-2 case-3 Fix: split into two paths:
Test surgery: coderabbit Major (line 92) — Typed-nil Pebble storeA typed-nil coderabbit Major (line 714) — Close prior runtimes on new error pathsThe two new Caller audit (semantic changes)
Verification
|
|
Claude finished @bootjp's task in 2m 12s —— View job Code Review: Stage 6A Round-3 — Codex P1 + 2 CodeRabbit MajorsTodo list
All three fixes are correct. Detailed verification below. Fix 1: Equal local_epoch idempotency (Codex P1)Verdict: Correct. The split is clean and the dispatch ordering is right: The equal-epoch path returns The Minor observation on test comments: Fix 2: Typed-nil Pebble store (
|
| File | Lines | Meta-reference |
|---|---|---|
applier.go |
107 | "— codex PR #765 round-2 P1" in ApplyRegistration godoc |
applier.go |
165–166 | "(PR #765 round-2 codex P1)" in case-2-idempotent inline comment |
applier.go |
217–220 | "gemini-code-assist PR #765 round-1 medium flagged..." in nodeIDMask godoc (carry-over from round-2) |
store/encryption_glue.go |
92–93 | "— coderabbit PR #765 round-2 Major" in WriterRegistryFor inline comment |
applier_test.go |
155 | "(PR #765 round-2 codex P1)" in TestApplyRegistration_Case2_IdempotentReplay |
In each case the semantic explanation is valuable (keep it); only the PR #N round-M reviewer-name attribution belongs in the PR description, not the source. Fix this →
Summary
| Fix | Verdict |
|---|---|
| Codex P1: equal-epoch → no-op idempotent | ✓ Correct |
| CodeRabbit Major: typed-nil Pebble rejection | ✓ Correct |
| CodeRabbit Major: prior-runtime cleanup on error paths | ✓ Correct |
| Review-meta comment cleanup (round-2 carry-over + new) | Non-blocking — trim before merge |
The implementation is correct across all three fixes. The only remaining item before merge is the meta-reference comment cleanup — the semantic content is fine, just the PR/reviewer attribution attribution should be stripped.
…tion from inline comments claude[bot] r3 verdict was 'all three fixes correct' but flagged 5 inline-comment sites that referenced 'PR #765 round-N reviewer-name' meta. CLAUDE.md: Don't reference the current task, fix, or callers ("used by X", "added for the Y flow", "handles the case from issue #123"), since those belong in the PR description and rot as the codebase evolves. This commit strips the attribution from all 5 sites while preserving the semantic content (why the code does what it does, separate from who flagged it). ## Sites cleaned up - internal/encryption/applier.go:107 — ApplyRegistration godoc case-2-idempotent paragraph: removed '— codex PR #765 round-2 P1'. - internal/encryption/applier.go:158-162 — case-2-idempotent inline comment: removed '(PR #765 round-2 codex P1)'. - internal/encryption/applier.go:213-217 — nodeIDMask godoc (round-2 carry-over): rewrote 'gemini-code-assist PR #765 round-1 medium flagged the original localEpochMaskU64 name as misleading at this call site' → 'the name distinguishes this FullNodeID-truncation use from the local_epoch masking in adapter/encryption_admin.go' (semantic content preserved, attribution dropped). - store/encryption_glue.go:92 — WriterRegistryFor typed-nil inline comment: removed '— coderabbit PR #765 round-2 Major'. - internal/encryption/applier_test.go:151 — TestApplyRegistration_Case2_IdempotentReplay godoc: removed '(PR #765 round-2 codex P1)'. ## Caller audit Pure comment edit — no callers to audit. Code unchanged. ## Verification - go build ./... — PASS - go test -race -timeout=60s ./internal/encryption/... ./store/... . — PASS - golangci-lint run ./internal/encryption/... ./store/... — 0 issues
|
@claude review Round-4 (commit Changes5 inline-comment sites stripped of
Caller auditPure comment edit — no callers to audit. Code unchanged. Verification
|
|
Claude finished @bootjp's task in 1m 15s —— View job Code Review: Stage 6A Round-4 — Comment attribution cleanupTodo list
Round-4 verdict: all 5 sites correctly cleanedThe commit is a pure comment edit (11 insertions, 13 deletions across 3 files); no code changed. Each attribution site was verified against the current branch:
In all cases the semantic content is preserved — the why of the code remains intact; only the reviewer-attribution phrases were dropped. Scan for stragglers: Overall Stage 6A assessmentAcross all four rounds, the PR is clean:
No outstanding issues. Ready to merge. |
…e scope limitation codex r2 P1 on round-2 (commit 0f946c1): the triple-gate encryptionMutatorsEnabled() is purely node-local; during a staged rolling upgrade where some peers still run the pre-6B-2 binary or have the flags unset, a mutator proposal committed by a fully-configured leader would reach those replicas' appliers and HaltApply on the under-configured side. ## Resolution This is a real concern but the cluster-wide readiness check belongs at the §7.1 Phase-1 cutover RPC layer (enable-storage-envelope) in Stage 6D, not in the node-local flag gate. PR #762's milestone breakdown explicitly assigns the Voters ∪ Learners capability fan-out gate to Stage 6D. Adding a partial cluster-wide check in Stage 6B-2 would duplicate the §7.1 capability gate and ship before the operator-facing GetCapability fan-out is wired into the admin CLI. The right resolution is to document the scope limitation explicitly so: 1. Operators reading the code understand the gate's scope and the rolling-upgrade discipline expected at this layer (same constraint as PR #765's 6A→6B caveat). 2. The Stage 6D PR has a clear handoff point — the cluster-wide gate is its scope, and this docstring points forward to that. ## Doc-only No code paths changed; the gate's three-flag readback is unchanged. Single 30-line docstring expansion explaining: - the gate is THIS NODE local - what cluster-wide readiness means at §7.1 - why Stage 6B-2 deliberately defers it to Stage 6D - the operator-discipline constraint matching PR #765 ## Caller audit Pure doc edit. No callers to audit. ## Verification - go build ./... — PASS - go test -race -timeout=60s -run 'TestEncryptionAdmin|TestRegisterEncryptionAdminServer' . — PASS - golangci-lint run ./... — 0 issues on touched files
…eference claude r3 verdict was 'Ready to merge' with one nice-to-have note: the scope-boundary docstring cited PR #765 by number but not the underlying design doc path. PR links can become stale; a path to the docs/design/ file makes the constraint findable without GitHub access. Added a reference to docs/design/2026_04_29_partial_data_at_rest_encryption.md (specifically the 6A rationale's 'Rolling 6A→6B upgrade caveat' section). The PR #765 reference is preserved as the historical trigger; the design doc path is the canonical source of the rolling-upgrade discipline going forward. Doc-only — no code paths touched.
Summary
First slice of Stage 6A per the PR #762 plan: the load-bearing safety boundary that lands the §6.3
EncryptionApplierso future PRs can re-enable mutating-RPC wiring without bricking the cluster.This PR is doc-driven by the merged Stage 6 plan; specifically the 6A row:
What this PR ships
Part 1 (this commit,
e2475b84) — applier packageinternal/encryption/applier.go(new) — concreteAppliersatisfyingkv.EncryptionApplier:WriterRegistryStoreinterface (abstract Get/Set over the!encryption|writers|<dek_id>|<uint16(node_id)>Pebble namespace; kept separate fromstore.MVCCStorebecause writer-registry rows live outside MVCC).ApplyRegistrationimplements the §4.1 four-case dispatch (insert / monotonic advance / rollback halt / uint16-collision halt).ApplyBootstrapandApplyRotationreturn the typedErrKEKNotConfigureddefense-in-depth marker — Stage 6B will replace them with the real KEK-unwrap + sidecar + keystore path.internal/encryption/errors.go— addErrKEKNotConfiguredsentinel with godoc referencing the PR docs(encryption): Stage 6 plan — decompose into 6A-6F + record 5D shipped #762 safety-boundary decomposition.internal/encryption/applier_test.go(new) — 7 unit tests covering all four §4.1 cases, construction-time nil-registry refusal, store-error propagation, and the 6B-deferred stub contracts. Uses a map-backedWriterRegistryStorefake.Part 2 (follow-up commit in this PR) — FSM wiring
WriterRegistryStoreimplementation backed by the FSM's Pebble store (store/encryption_glue.goor new file).kv.NewKvFSMWithHLC(... kv.WithEncryption(applier))wiring atmain.go:694.The Part 1 commit compiles and tests pass on its own; Part 2 plumbs it into production.
What this PR does NOT do
registerEncryptionAdminServeris untouched. Mutating EncryptionAdmin RPCs continue to refuse withFailedPreconditionexactly as Stage 5D left them. The Stage 6 plan's safety invariant — do not re-enable mutator wiring until both the applier AND a KEK unwrapper are present — holds: this PR lands the applier (and the FSM dispatch is no longer a halt path forApplyRegistration), but the KEK plumbing (6B) is still missing. 6B will re-enable the mutator wiring gated on (--encryption-enabledANDKEKConfigured()), both conditions required.Caller audit
kv.EncryptionApplierinterface — implemented by*encryption.Applier. No other callers; the FSM dispatch is the sole consumer.internal/encryption/errors.goexported sentinels —ErrKEKNotConfiguredis new; no callers yet outsideapplier.go(the kv/fsm dispatch will wrap any non-nil return withErrEncryptionApplyregardless of the inner marker).Five-lens self-review
WriterRegistryStore.SetRegistryRow, which the production wiring (Part 2) routes throughpebble.Sync. No silent skip path: every error return is propagated; thekv/fsmdispatch marks them withErrEncryptionApplyfor HaltApply.WriterRegistryStoreis the single source of truth. FSM apply already serialises0x03entries viaapplyMu, so the §4.1 case-dispatch read-modify-write is safe under that lock.ApplyRegistrationdoes one Get + at most one Set per call. Production Pebble will add the sync cost; bounded.Test plan
go test -race -timeout=60s ./internal/encryption/...— PASSgo build ./...— PASSgolangci-lint run ./internal/encryption/...— 0 issuesgo test -race ./...(full suite) — Will run before requesting reviewPlan: 6A → 6B → 6C → 6D → 6E → 6F
This is the first of six PRs per PR #762. 6B will follow with the KEK plumbing + mutator gate.
Summary by CodeRabbit
New Features
Tests
Chores