Skip to content

feat(encryption): Stage 6B part 1 — real ApplyBootstrap + ApplyRotation with KEK plumbing#768

Merged
bootjp merged 6 commits into
mainfrom
feat/encryption-stage6b
May 16, 2026
Merged

feat(encryption): Stage 6B part 1 — real ApplyBootstrap + ApplyRotation with KEK plumbing#768
bootjp merged 6 commits into
mainfrom
feat/encryption-stage6b

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 16, 2026

Summary

Stage 6B part 1 per the PR #762 plan. Lands the real ApplyBootstrap and ApplyRotation paths in internal/encryption.Applier — KEK-unwrap + Keystore install + crash-durable sidecar mutate + writer-registry batch insert. Replaces the Stage 6A ErrKEKNotConfigured stubs.

Mutator-RPC wiring in registerEncryptionAdminServer stays OFF (Stage 5D posture) so the operator surface is unchanged. The follow-up 6B-2 PR threads --kekFile / --encryption-enabled flags + startup KEK loader + keystore re-populate + re-enables mutator wiring gated on (--encryption-enabled AND KEKConfigured()).

API additions

  • KEKUnwrapper interface — local declaration so the encryption package does not pick up a transitive dependency on the kek package. kek.Wrapper satisfies it structurally.
  • ApplierOption functional-options: WithKEK, WithKeystore, WithSidecarPath, WithNowFunc. The Stage 6A caller NewApplier(reg) continues to work byte-for-byte.

Real ApplyBootstrap (§5.6 step 1a)

  1. KEK-unwrap storage + raft wrapped DEKs.
  2. Keystore.Set both (idempotent on matching bytes; ErrKeyConflict on mismatch → halts).
  3. WriteSidecar with Active.{Storage,Raft} set + keys[] map updated; crash-durable via §5.1 atomic write.
  4. Batch-insert every BatchRegistry row via ApplyRegistration (§4.1 case-2-idempotent makes replays no-op).

Returns ErrKEKNotConfigured if any of WithKEK / WithKeystore / WithSidecarPath is missing — the three are an indivisible trio.

Crash-recovery ordering: Keystore.Set (cheap, in-memory) → WriteSidecar (durable IO) → batch insert. Recovery is Raft replaying the same entry; every step is idempotent for matching inputs.

Real ApplyRotation (§5.2 RotateDEK)

Handles RotateSubRotateDEK only; other sub-tags (rewrap-deks, retire-dek, enable-storage-envelope, enable-raft-envelope) halt with ErrEncryptionApply so 6B-1 cannot silently mis-apply later sub-tags.

For RotateSubRotateDEK:

  1. KEK-unwrap the proposed wrapped DEK.
  2. Keystore.Set at the new DEK ID.
  3. WriteSidecar: Active slot updated for Purpose (Storage or Raft only — unknown Purpose halts); keys[] map updated; old DEKs preserved (retire is a separate Stage 6E sub-tag).
  4. ApplyRegistration on the proposer's row.

Tests

8 new applier tests covering happy paths + fail-closed boundaries:

Test What it pins
TestApplyBootstrap_ReturnsKEKNotConfigured No-options default returns ErrKEKNotConfigured
TestApplyBootstrap_FullyWired Happy path; factor'd per-aspect assertions
TestApplyBootstrap_PartialWiring (4 sub-cases) All-or-nothing trio invariant
TestApplyBootstrap_KEKUnwrapFailure Synthetic Unwrap error propagated; no partial install
TestApplyBootstrap_Replay_Idempotent Same payload twice succeeds
TestApplyRotation_ReturnsKEKNotConfigured No-options default
TestApplyRotation_FullyWired Happy path with pre-bootstrap
TestApplyRotation_UnknownSubTag Halts on 0xFE sub-tag
TestApplyRotation_UnknownPurpose Halts on unrecognised Purpose enum

Tests use a fakeKEK that pads/truncates wrapped input to encryption.KeySize so Keystore.Set's size check passes; same wrapped input always produces the same DEK bytes (deterministic), exercising idempotency.

Caller audit (semantic change — NewApplier signature)

NewApplier(registry)NewApplier(registry, opts...). Variadic; the no-options call (Stage 6A test caller) still compiles and behaves identically. Audited:

grep -rn 'NewApplier(' --include='*.go'
  • main.go:711 — production caller; unchanged in this PR (still passes no opts). 6B-2 will add WithKEK + WithKeystore + WithSidecarPath.
  • internal/encryption/applier_test.go — test callers; existing NewApplier(newMapRegistryStore()) calls still work; new tests pass opts.

No other callers.

Five-lens self-review

  1. Data lossWriteSidecar uses the existing §5.1 crash-durable write protocol. Keystore.Set is in-memory only but the FSM apply path is replayed from Raft on restart. No silent skip: every error propagates through the HaltApply seam.
  2. Concurrency / distributed failures — Applier carries no per-call mutable state. FSM apply already serialises 0x03/0x04/0x05 dispatches under applyMu. Keystore is internally locked. Sidecar IO is single-writer per §5.1.
  3. PerformanceApplyBootstrap: 2 KEK unwraps + 2 Keystore.Set + 1 WriteSidecar + N writer-registry inserts (N = bootstrap-time cluster size). ApplyRotation: 1+1+1+1. Bounded.
  4. Data consistency — sidecar mutations crash-durable; install order deterministic on replay; case-2-idempotent makes registry replay safe.
  5. Test coverage — 8 unit tests; both happy paths and fail-closed boundaries exercised. The fakeKEK is deterministic so replay tests are reproducible.

Test plan

  • go test -race -timeout=60s ./internal/encryption/... — PASS
  • go build ./... — PASS
  • golangci-lint run ./internal/encryption/... — 0 issues
  • Existing kv + store + main encryption tests — PASS
  • Re-run full suite (kv + store + adapter + main + Jepsen) once 6B-2 wires it in production — Will follow with 6B-2

Plan

This is Stage 6B part 1. Stage 6B-2 will:

  • Add --kekFile / --kekUri flags
  • Add --encryption-enabled flag (default off)
  • Load KEK at startup, populate Keystore from sidecar
  • Re-enable Proposer + LeaderView in registerEncryptionAdminServer gated on (--encryption-enabled AND KEKConfigured())

Summary by CodeRabbit

  • New Features

    • Encryption key management bootstrap and rotation operations are now fully operational, including proper key unwrapping, secure keystore installation, and configurable options for encryption setup.
  • Tests

    • Comprehensive test coverage added for bootstrap and rotation scenarios, including idempotency validation, failure handling, and configuration verification.

Review Change Stack

…on with KEK / Keystore / Sidecar plumbing

First slice of Stage 6B per the PR #762 plan. Threads
KEKUnwrapper + Keystore + sidecar path into the Applier
via functional options and replaces the Stage 6A
ErrKEKNotConfigured stubs with real §5.6 step 1a +
§5.2 RotateDEK apply paths.

The main.go wiring (flag introduction + RPC mutator gate +
startup KEK loader + startup keystore re-populate) lands in
the follow-up PR (6B-2); this commit ships only the applier
internals + a fakeKEK-based test surface so the apply path
is fully exercised end-to-end without touching main.go.

## API additions (internal/encryption/applier.go)

- KEKUnwrapper interface — local declaration so encryption
  does not pick up a transitive dependency on the kek
  package. kek.Wrapper satisfies it structurally.

- ApplierOption functional-option type plus four options:
    WithKEK(KEKUnwrapper)
    WithKeystore(*Keystore)
    WithSidecarPath(string)
    WithNowFunc(func() time.Time)
  WithNowFunc keeps tests deterministic; the production
  default is time.Now and feeds the sidecar's Created
  diagnostic field (not used for ordering — see §5.1).

- NewApplier signature extended to variadic opts. The
  Stage 6A test caller NewApplier(reg) continues to work
  byte-for-byte; the production caller in main.go (6B-2)
  will pass WithKEK + WithKeystore + WithSidecarPath.

## ApplyBootstrap (real, §5.6 step 1a)

Replaces the ErrKEKNotConfigured stub with:

  1. KEK-unwrap storage + raft wrapped DEKs.
  2. Keystore.Set both (idempotent on matching bytes;
     ErrKeyConflict on mismatch — caught by the dispatch).
  3. WriteSidecar with Active.Storage / Active.Raft set +
     keys[] map updated for both DEK IDs. Crash-durable
     via the existing §5.1 atomic write protocol.
  4. Batch-insert every BatchRegistry row via
     ApplyRegistration (the §4.1 case-2-idempotent path
     makes second-pass replays no-op).

Returns ErrKEKNotConfigured if any of WithKEK / WithKeystore /
WithSidecarPath are missing — the three are an indivisible
trio (a partial wiring would either compute DEK bytes that
get discarded, install bytes that vanish on restart, or
record wrapped DEKs the local node cannot decrypt).

Crash-recovery ordering: Keystore.Set fires first (in-memory,
cheap), then WriteSidecar (durable IO), then batch insert.
A crash anywhere is recovered by Raft replaying the same
entry — Keystore.Set is idempotent for matching bytes,
WriteSidecar is idempotent for the same payload, and
ApplyRegistration's case-2-idempotent path makes the batch
re-insert no-op.

## ApplyRotation (real, §5.2 RotateDEK)

Handles RotateSubRotateDEK only; other sub-tags
(rewrap-deks, retire-dek, enable-storage-envelope,
enable-raft-envelope) halt with ErrEncryptionApply so
Stage 6B-1 cannot silently mis-apply a later sub-tag.

For RotateSubRotateDEK:

  1. KEK-unwrap the proposed wrapped DEK.
  2. Keystore.Set at the new DEK ID.
  3. WriteSidecar: Active slot updated for Purpose
     (Storage or Raft only — unknown Purpose halts with
     ErrEncryptionApply); keys[] map updated for the new
     DEK ID; old DEKs preserved (retire is a separate
     sub-tag in Stage 6E).
  4. ApplyRegistration on the proposer's row so §4.1
     case-2 monotonicity covers their first encrypted
     write under the new DEK.

Same fail-closed wiring-check as ApplyBootstrap.

## Tests (internal/encryption/applier_test.go)

8 new tests covering both happy paths and fail-closed
boundaries:

- TestApplyBootstrap_ReturnsKEKNotConfigured (existing,
  reframed) — no-options default still returns the
  defense-in-depth marker.
- TestApplyBootstrap_FullyWired — happy path with
  factor'd per-aspect assertions (keystore / sidecar /
  registry rows).
- TestApplyBootstrap_PartialWiring — 4 sub-cases pinning
  the all-or-nothing trio.
- TestApplyBootstrap_KEKUnwrapFailure — synthetic Unwrap
  error propagated; no partial install.
- TestApplyBootstrap_Replay_Idempotent — same payload
  twice succeeds.

- TestApplyRotation_ReturnsKEKNotConfigured (existing,
  reframed) — no-options default.
- TestApplyRotation_FullyWired — happy path with a
  pre-bootstrap; verifies Active slot for the rotated
  purpose advances while the other stays put + old DEK
  retained in keys[].
- TestApplyRotation_UnknownSubTag — halts with
  ErrEncryptionApply on 0xFE sub-tag.
- TestApplyRotation_UnknownPurpose — same halt for
  unrecognised Purpose enum values.

Tests use a fakeKEK implementation that pads/truncates
wrapped input to encryption.KeySize so Keystore.Set's
size check passes; same wrapped input always produces
the same DEK bytes (deterministic), exercising the
Keystore.Set idempotency path.

## Stage 5D safety boundary preserved

registerEncryptionAdminServer is not touched in this PR.
Mutating EncryptionAdmin RPCs continue to refuse with
FailedPrecondition exactly as Stage 5D left them.

## Five-lens self-review

1. Data loss — WriteSidecar uses the existing §5.1
   crash-durable write protocol. Keystore.Set is in-memory
   only but the FSM apply path is replayed from the Raft
   log on restart so the in-memory state is reconstructed.
   No silent skip: every error propagates through the
   HaltApply seam.

2. Concurrency / distributed failures — Applier carries
   no per-call mutable state; the WriterRegistryStore +
   Keystore + sidecar IO are the single sources of truth.
   FSM apply already serialises 0x03/0x04/0x05 dispatches
   under applyMu. Keystore is internally locked. Sidecar
   IO is single-writer per the §5.1 protocol.

3. Performance — ApplyBootstrap does 2 KEK unwraps + 2
   Keystore.Set + 1 WriteSidecar + N writer-registry
   inserts where N is the cluster size at bootstrap.
   ApplyRotation does 1 KEK unwrap + 1 Set + 1
   WriteSidecar + 1 row insert. Bounded.

4. Data consistency — sidecar mutations are crash-durable
   via WriteSidecar. Keystore install order (storage
   before raft) is deterministic on every replay. Registry
   rows replay through the §4.1 case-2-idempotent path so
   double-apply produces the same final state.

5. Test coverage — 8 new unit tests covering happy paths
   and fail-closed boundaries (partial wiring, KEK
   failure, unknown sub-tag, unknown purpose, replay
   idempotency). The fakeKEK is deterministic so replay
   tests are reproducible.

## Verification

- go test -race -timeout=60s ./internal/encryption/... — PASS
- go build ./... — PASS
- golangci-lint run ./internal/encryption/... — 0 issues
- Existing kv + store + main encryption tests — PASS

## Plan

This is Stage 6B-1. The follow-up Stage 6B-2 PR will:
  - Add --kekFile / --kekUri flags
  - Add --encryption-enabled flag (default off)
  - Load KEK at startup, populate Keystore from sidecar
  - Re-enable Proposer + LeaderView in
    registerEncryptionAdminServer gated on
    (--encryption-enabled AND KEKConfigured())
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 51 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a14a88a7-1269-40a0-b87e-e7e97a320966

📥 Commits

Reviewing files that changed from the base of the PR and between 24a7e76 and c1ecd45.

📒 Files selected for processing (2)
  • internal/encryption/applier.go
  • internal/encryption/applier_test.go
📝 Walkthrough

Walkthrough

The PR implements Stage 6B encryption bootstrap and Stage 6B-1 rotation workflows. The applier gains KEK unwrapper, keystore, and sidecar path dependencies via functional options. ApplyBootstrap unwraps both DEKs, installs them into the keystore, persists sidecar state with active slots, and registers initial writers. ApplyRotation validates the sub-tag, unwraps and installs the proposed DEK, updates the sidecar for the rotated purpose, and registers the proposer. Both flows are comprehensively tested for happy paths, partial wiring failures, unwrap failures, idempotency, and dispatch errors on unknown sub-tags or purposes.

Changes

Stage 6B Encryption Bootstrap & Rotation Implementation

Layer / File(s) Summary
Applier Dependency Injection & Configuration
internal/encryption/applier.go
Applier struct adds registry, kek, keystore, sidecarPath, and now fields. New KEKUnwrapper interface abstracts unwrap behavior. ApplierOption functional type with WithKEK, WithKeystore, WithSidecarPath, and WithNowFunc enables configurable initialization. NewApplier accepts options, applies them to the applier, and initializes a default time.Now clock.
Bootstrap Flow & Sidecar Persistence
internal/encryption/applier.go, internal/encryption/applier_test.go
ApplyBootstrap unwraps both storage and raft DEKs via KEK, installs them into keystore, writes bootstrap sidecar with active slots and wrapped DEK entries keyed by DEKID (including created timestamp and LocalEpoch: 0), and batch-inserts initial registration rows. bootstrapAndRotationConfigured gates the flow; writeBootstrapSidecar persists sidecar state durably. Tests verify fully-wired happy path (keystore and sidecar state), partial wiring fails with ErrKEKNotConfigured, KEK unwrap failure prevents partial keystore installation, and replaying the same bootstrap payload is idempotent.
Rotation Flow & Purpose-Based Updates
internal/encryption/applier.go, internal/encryption/applier_test.go
ApplyRotation rejects non-RotateSubRotateDEK sub-tags by halting apply, validates wiring, unwraps the proposed DEK via KEK, sets it into keystore under DEKID, updates sidecar for the rotated purpose (storage/raft) with the new wrapped DEK entry and created timestamp, and inserts the proposer registration row. writeRotationSidecar updates only the target purpose's active slot while preserving other slots and retaining prior sidecar keys. sidecarPurposeFor maps Purpose enum to sidecar string key and returns wrapped ErrEncryptionApply on unknown purpose. Tests verify fully-wired rotation updates the target active slot unchanged on non-rotated purposes, and dispatch fails with ErrEncryptionApply on unknown sub-tag or Purpose.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • bootjp/elastickv#748: Introduces fsmwire types (Purpose, BootstrapPayload, RotationPayload) and opcodes that Stage 6B now consumes for bootstrap/rotation parameter passing and fail-closed dispatch.
  • bootjp/elastickv#765: Stage 6A scaffolding PR that adds the Applier stub, WriterRegistryStore, and initial test structure that this PR extends with full Stage 6B bootstrap/rotation implementation and comprehensive coverage.

Poem

🐰 Beneath the vault, a rabbit hops with care,
Unwraps the secrets hidden there,
Bootstrap seedlings, rotations dance,
Keys persist through every glance—
Stage 6B blooms, secure and fair! 🔐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically describes the main change: implementing real ApplyBootstrap and ApplyRotation methods with KEK plumbing in Stage 6B part 1, replacing earlier stubs.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/encryption-stage6b

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 16, 2026

@claude review

Stage 6B-1 — applier internals only (real ApplyBootstrap + ApplyRotation). Mutator-RPC wiring stays OFF in this PR per the Stage 6 plan; 6B-2 will thread the flags + RPC gate.

Looking for review on:

  1. Crash-recovery ordering — Keystore.Set (in-memory) → WriteSidecar (durable IO) → batch registry insert. Argued that every step is idempotent for matching inputs so replay is safe regardless of crash point. Sound?

  2. KEKUnwrapper interface placement — declared locally in internal/encryption rather than importing the kek package to avoid a future cycle when KMS providers (Stage 9) want to import encryption types. kek.Wrapper satisfies the interface structurally. Reasonable?

  3. Replay idempotency — TestApplyBootstrap_Replay_Idempotent exercises the same payload twice. Audited each step: Keystore.Set is idempotent for matching bytes, WriteSidecar is byte-overwrite, ApplyRegistration's case-2-idempotent makes batch replays no-op. Concerns?

  4. Fail-closed trio — WithKEK + WithKeystore + WithSidecarPath argued indivisible. Partial wiring returns ErrKEKNotConfigured (TestApplyBootstrap_PartialWiring covers 4 sub-cases). Is there a meaningful partial-wiring use case I'm missing?

  5. Unknown sub-tag / Purpose halt — RotateSubRotateDEK is the only sub-tag implemented; others halt with ErrEncryptionApply. Same for unknown Purpose enum. The intent is that 6E (cutover RPCs) will land enable-storage-envelope and enable-raft-envelope sub-tags; until then, halting prevents silent mis-apply. Right call?

@claude
Copy link
Copy Markdown

claude Bot commented May 16, 2026

Claude finished @bootjp's task in 4m 30s —— View job


Stage 6B-1 Applier Review

Tasks

  • Read trigger comment and understand review requests
  • Read internal/encryption/applier.go
  • Read internal/encryption/applier_test.go
  • Read related encryption files for context (keystore.go, sidecar.go, errors.go, fsmwire, kek package)
  • Answer the 5 specific questions
  • Post full review

Overall Assessment

The implementation is solid. The crash-recovery ordering, interface design, and fail-closed behaviors are all sound. I found one documentation bug and one gap in test coverage worth addressing.


Addressing Your 5 Questions

1. Crash-recovery ordering — Sound.

Keystore.Set (in-memory) → WriteSidecar (durable IO) → batch registry insert is correct. Walking each crash point:

  • Crash before Keystore.Set: Raft replays from committed entry, full sequence re-runs. ✅
  • Crash after Keystore.Set, before WriteSidecar: Keystore is lost on restart, but Raft replay re-runs the full sequence. Keystore.Set is idempotent for matching bytes (verified in checkExistingEntry). ✅
  • Crash after WriteSidecar, before batch inserts: WriteSidecar is O_TRUNC | O_EXCL on the tmp, then atomic rename — an overwrite with the same content is safe. Batch registry inserts use case-2-idempotent, making them no-op on replay. ✅

One subtle thing worth noting in the code comments: the stated reason "an in-memory DEK only escapes to disk after the keystore install succeeded" is slightly misleading — the sidecar stores the wrapped DEK, not cleartext. The real invariant is that a Keystore.Set failure (e.g., ErrKeyConflict) aborts before any disk mutation, which is what the ordering actually achieves.

2. KEKUnwrapper interface placement — Correct.

Local declaration at the consumer is idiomatic Go and the right call for exactly the reasons stated. I confirmed kek.Wrapper satisfies it structurally — kek.go:24 declares Unwrap(wrapped []byte) ([]byte, error) and FileWrapper implements it at file.go:184. The interface-on-consumer pattern is the standard approach for this exact import-cycle prevention scenario.

3. Replay idempotency — Sound with one note.

All three claims verify:

  • Keystore.Set idempotency: confirmed via checkExistingEntry in keystore.go:175 — matching DEK bytes returns (true, nil).
  • WriteSidecar as overwrite: confirmed — creates tmp with O_TRUNC | O_EXCL, renames atomically.
  • ApplyRegistration case-2-idempotent: confirmed at applier.go:246.

One edge case that isn't tested: if Keystore.Set(StorageDEKID) succeeds but Keystore.Set(RaftDEKID) then fails (step 7 in the sequence), StorageDEKID is in the keystore without RaftDEKID. This partial state is recovered correctly by Raft replay (the fakeKEK is deterministic, so the replay produces the same bytes → idempotent Set), but there is no test exercising second-Set failure. In practice ErrKeyConflict at step 4 would require a non-deterministic KEK which violates the design contract, so this is low risk.

4. Fail-closed trio — Sound, but test coverage has two gaps.

I cannot identify a meaningful partial-wiring use case — the three are genuinely indivisible for the reasons stated. However, TestApplyBootstrap_PartialWiring covers 4 of 6 possible partial combinations:

Combination Covered
kek only
keystore only
sidecar_path only
kek + keystore
kek + sidecar_path (no keystore)
keystore + sidecar_path (no kek)

The implementation correctly handles all 6 via bootstrapAndRotationConfigured()a.kek != nil && a.keystore != nil && a.sidecarPath != "". The two missing cases would still return ErrKEKNotConfigured but they're not pinned by a test.

5. Unknown sub-tag / Purpose halt — Right call.

Halting with ErrEncryptionApply is essential for FSM correctness. Silently advancing setApplied on an unrecognized sub-tag would create state divergence: a future leader's enable-storage-envelope entry would be skipped on stale followers, leaving them unaware that envelope mode is active. Halting forces operator attention and replay from a correct state. This is the correct pattern for any deterministic state machine apply where the semantics of the entry matter.


Issues Found

Bug: Stale comment fragment on ApplyBootstrapapplier.go:279

The doc comment has a leftover line from the Stage 6A stub:

// ApplyBootstrap returns ErrKEKNotConfigured as the Stage 6A    // ← stale fragment
// ApplyBootstrap implements §5.6 step 1a's initial bootstrap apply:

The first line is the beginning of the old Stage 6A comment that was never removed when the real implementation replaced it. This reads as two conflicting doc sentences: one claiming it "returns ErrKEKNotConfigured" and the next claiming it "implements §5.6 step 1a". Fix this →

Test coverage gap: 2 missing partial-wiring sub-cases — applier_test.go:407

TestApplyBootstrap_PartialWiring is missing two of the six possible partial combinations. Suggested additions:

{name: "kek_and_sidecar_path", opts: []encryption.ApplierOption{
    encryption.WithKEK(&fakeKEK{}),
    encryption.WithSidecarPath("/tmp/never-written"),
}},
{name: "keystore_and_sidecar_path", opts: []encryption.ApplierOption{
    encryption.WithKeystore(encryption.NewKeystore()),
    encryption.WithSidecarPath("/tmp/never-written"),
}},

Fix this →


Minor Observations (no action needed)

  • writeRotationSidecar doesn't handle missing sidecar with IsNotExist (applier.go:434): Unlike writeBootstrapSidecar, a missing sidecar during rotation returns a generic "read sidecar for rotation" error. This is semantically correct (rotation without bootstrap is invalid), but an IsNotExist check with "sidecar not found; bootstrap must precede rotation" would be a clearer operator message. Fits naturally in 6B-2 when the startup sequence is wired.

  • LocalEpoch: 0 in writeRotationSidecar (applier.go:455): Hardcoding 0 is correct — the epoch counter for a freshly installed DEK starts at 0 on this node. The running counter tracking is a separate concern (Stage 6E or later).

  • The fakeKEK.override map is defined but unused in the test file. Not a problem, just dead setup capacity.


Verdict

Two concrete items worth landing before merge: the stale comment fragment (trivial fix) and the two missing partial-wiring test cases (completes coverage of the all-or-nothing invariant you argued). All five design questions are answered affirmatively — the implementation is correct.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/encryption/applier.go`:
- Around line 309-337: In ApplyBootstrap, detect and reject bootstrap payloads
where p.StorageDEKID == p.RaftDEKID by returning a wrapped error (e.g.,
errors.Errorf or errors.Wrap) instead of proceeding; this prevents the second
key write from overwriting the first (check the equality early in
Applier.ApplyBootstrap before Unwrap/Set/writeBootstrapSidecar). Also add the
same guard in the code path that writes sidecar keys (where
writeBootstrapSidecar or the sidecar sc.Keys assignments occur) to ensure the
sidecar writer also refuses payloads with identical DEK IDs.
- Around line 130-132: Guard against nil injections by validating both the
provided clock and option functions: in WithNowFunc(now func() time.Time) add an
early check and panic (or return a clear error) immediately if now == nil (e.g.,
panic("WithNowFunc: nil now func")), and also harden the Applier options
application loop (where ApplierOption funcs are iterated, e.g., in NewApplier or
the method that applies options) to check for nil options and fail fast (panic
or return an error) if any opt == nil so a nil ApplierOption cannot later cause
a runtime panic when a.now() is invoked.
- Around line 403-424: ApplyRotation must verify that the proposer registration
targets the same DEK ID as the rotated DEK: extract the DEK ID from
p.ProposerRegistration and compare it to p.DEKID (after you've unwrapped the dek
and persisted it via a.keystore.Set and/or a.writeRotationSidecar), and if they
differ return a wrapped error (e.g. errors.Wrapf(ErrEncryptionApply, "applier:
proposer-registration DEKID %s does not match rotation DEKID %s", regDEKID,
p.DEKID)). Perform this check before calling
a.ApplyRegistration(p.ProposerRegistration) so mismatched registrations are
rejected and the “first writer row for new DEK” monotonicity is preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5371d0e-ac95-4130-9d8c-67ed5d34df7b

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc931d and 24a7e76.

📒 Files selected for processing (2)
  • internal/encryption/applier.go
  • internal/encryption/applier_test.go

Comment thread internal/encryption/applier.go
Comment thread internal/encryption/applier.go
Comment thread internal/encryption/applier.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24a7e7607c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +421 to +423
if err := a.ApplyRegistration(p.ProposerRegistration); err != nil {
return errors.Wrap(err, "applier: rotation proposer-registration insert")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate rotation registration matches rotated DEK

ApplyRotation installs and activates p.DEKID in keystore/sidecar, but then blindly applies p.ProposerRegistration without checking that its DEKID is the same key. A malformed or buggy rotation payload can therefore mutate writer-registry state for an unrelated DEK while leaving the new DEK unregistered in this atomic step, silently violating the intended rotate-dek invariant and potentially causing later false rollback/collision behavior on the wrong DEK.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the core logic for bootstrapping and rotating encryption keys within the Applier. It introduces a functional options pattern for configuring the Applier with a KEK unwrapper, keystore, and sidecar path, enabling the transition from the previous stub implementation to a fully functional state. The changes include ApplyBootstrap for initial key setup and ApplyRotation for updating Data Encryption Keys (DEKs), both designed with crash-durability and Raft-replay idempotency in mind. Feedback was provided regarding the synchronous nature of batch registry inserts in ApplyBootstrap, suggesting that this blocking behavior be explicitly documented and potentially optimized into an atomic batch operation to minimize Raft loop latency.

Comment on lines +330 to +335
for i, reg := range p.BatchRegistry {
if err := a.ApplyRegistration(reg); err != nil {
return errors.Wrapf(err, "applier: bootstrap batch registry insert at index %d (dek_id=%d, full_node_id=%#x)",
i, reg.DEKID, reg.FullNodeID)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ApplyBootstrap method performs synchronous writes for each entry in p.BatchRegistry. While synchronous I/O within the Raft loop is used in this repository to ensure consistency during critical operations (like configuration changes), this blocking startup behavior should be explicitly documented in the method description. Additionally, consider if these sequential calls can be optimized into a single atomic batch to minimize the duration the Raft loop is blocked for large clusters.

References
  1. For limited-scope implementations, explicitly document blocking startup behavior to prevent misuse.
  2. When a method is intended to block until a condition is met or an operation completes, explicitly state this blocking semantic in the documentation.
  3. Synchronous I/O in the raft loop is acceptable when necessary to ensure consistency during membership or configuration changes.

Four actionable findings on round-1 (commit 24a7e76), all
addressed in this commit:

## coderabbit Major (line 132) — Guard nil clock injection

WithNowFunc(nil) would silently overwrite the default time.Now
and nil-deref at apply time when a.now() is invoked. NewApplier
now rejects:

  - any nil ApplierOption (each opt iterated and checked)
  - a final a.now == nil (sentinel for WithNowFunc(nil) having
    overwritten the default)

Both checks return a typed error at construction so the
misconfiguration surfaces at startup rather than deep inside a
Raft apply loop.

## coderabbit Major (line 337) — Reject identical bootstrap DEK IDs

If StorageDEKID == RaftDEKID, the second sc.Keys[...] assignment
in writeBootstrapSidecar overwrites the first, silently
mis-labelling the surviving key's purpose. Halts apply with
ErrEncryptionApply at the top of ApplyBootstrap (before the
keystore install) so the sidecar is never corrupted.

## codex P2 + coderabbit Major (line 423/424) — Proposer DEK mismatch

ApplyRotation installs p.DEKID into the keystore + sidecar but
then blindly applied p.ProposerRegistration even when its DEKID
targeted a different key. The whole point of bundling the
proposer-registration atomically with rotate-dek (§5.2) is to
cover the proposer's first encrypted write under the NEW DEK
with a §4.1 case-1 first-seen row — a mismatch would mutate
writer-registry state for an unrelated DEK and leave the new
DEK unregistered.

Halts apply with ErrEncryptionApply at the top of ApplyRotation
(after the sub_tag check, before the keystore install) so an
inconsistent rotation payload cannot atomically commit.

## Tests

4 new tests pin the guards:

- TestNewApplier_RejectsNilOption
- TestNewApplier_RejectsNilNowFunc
- TestApplyBootstrap_RejectsIdenticalDEKIDs — additionally
  verifies the sidecar is NOT written after the rejection
  (no side effects).
- TestApplyRotation_RejectsProposerDEKMismatch

## Caller audit

All three apply-time guards are stricter input validation —
strictly fail-CLOSED at the input layer. The single production
caller (kv/fsm dispatch) propagates the returned error through
the HaltApply seam, so the halt surface is identical to any
other ErrEncryptionApply path. The 4 partial-wiring sub-cases
in TestApplyBootstrap_PartialWiring still pass because they all
use valid (non-nil) options.

The NewApplier signature did not change; existing callers that
pass valid (non-nil) options remain unaffected.

## Verification

- go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (16 tests)
- golangci-lint run ./internal/encryption/... — 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 16, 2026

@claude review

Round-2 (commit 15681ca1): 4 actionable findings on round-1 addressed.

# Finding Resolution
coderabbit Major (line 132) WithNowFunc(nil) overwrites default → apply-time panic NewApplier rejects nil opts + nil-resulting a.now at construction
coderabbit Major (line 337) Identical StorageDEKID==RaftDEKID corrupts sidecar (second sc.Keys[] overwrites first) Reject at top of ApplyBootstrap before keystore install
codex P2 + coderabbit Major (line 423/424) ApplyRotation accepts proposer-reg DEK mismatch — mutates wrong DEK's registry Reject at top of ApplyRotation after sub_tag check

All three apply-time guards are stricter input validation — fail-CLOSED at input layer. Single production caller (kv/fsm dispatch) propagates via HaltApply, so the surface is identical to other ErrEncryptionApply paths.

4 new tests:

  • TestNewApplier_RejectsNilOption
  • TestNewApplier_RejectsNilNowFunc
  • TestApplyBootstrap_RejectsIdenticalDEKIDs — verifies sidecar NOT written after rejection (no side effects)
  • TestApplyRotation_RejectsProposerDEKMismatch

Verification

  • go test -race -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (16 tests)
  • golangci-lint run ./internal/encryption/... — 0 issues

@claude
Copy link
Copy Markdown

claude Bot commented May 16, 2026

Claude finished @bootjp's task in 1m 31s —— View job


Stage 6B-1 Round-2 Review

Tasks

  • Read trigger comment and understand review scope
  • Read internal/encryption/applier.go (current HEAD)
  • Read internal/encryption/applier_test.go (current HEAD)
  • Verify each round-1 fix is correctly implemented
  • Check for new issues introduced in round-2 changes
  • Post full review

Round-2 Fix Verification

All four round-2 fixes are correctly implemented.

Fix 1 — WithNowFunc(nil) guard (applier.go:151-163)

The nil-option check fires per-index (opts[%d]) before opt(a) runs, so a nil function pointer can never overwrite the default clock. The post-loop a.now == nil check is the backstop for any future option that could zero the field through a different code path. Both test cases (TestNewApplier_RejectsNilOption, TestNewApplier_RejectsNilNowFunc) correctly exercise the two distinct paths.

Fix 2 — Identical DEK IDs guard (applier.go:328-331)

Guard fires before the first a.kek.Unwrap call, so no keystore mutation can occur. TestApplyBootstrap_RejectsIdenticalDEKIDs verifies both the error type and the no-side-effects invariant (sidecar file must not exist). ✅

Fix 3 — Proposer-registration DEK mismatch guard (applier.go:437-441)

Guard placement is correct: after the sub-tag check but before a.kek.Unwrap, so a mismatch can never leave the keystore in a partial state. TestApplyRotation_RejectsProposerDEKMismatch exercises the path end-to-end and verifies ErrEncryptionApply. ✅


Remaining Items from Round-1 (Not in the Round-2 Fix Table)

Stale comment fragment — applier.go:289 (my round-1 finding, still present)

// ApplyBootstrap returns ErrKEKNotConfigured as the Stage 6A   ← stale fragment
// ApplyBootstrap implements §5.6 step 1a's initial bootstrap apply:

The first line is the beginning of the Stage 6A stub comment and was never removed when the real implementation replaced it. The two sentences directly contradict each other. Fix this →

Two missing partial-wiring test cases (applier_test.go:407) (my round-1 finding, still 4 of 6 cases)

TestApplyBootstrap_PartialWiring still covers only 4 of the 6 possible 2-of-3 partial combinations:

Combination Covered
kek only
keystore only
sidecar_path only
kek + keystore
kek + sidecar_path (no keystore)
keystore + sidecar_path (no kek)

Suggested additions (no new test logic, same pattern):

{name: "kek_and_sidecar_path", opts: []encryption.ApplierOption{
    encryption.WithKEK(&fakeKEK{}),
    encryption.WithSidecarPath("/tmp/never-written"),
}},
{name: "keystore_and_sidecar_path", opts: []encryption.ApplierOption{
    encryption.WithKeystore(encryption.NewKeystore()),
    encryption.WithSidecarPath("/tmp/never-written"),
}},

Fix this →


No New Issues in Round-2 Code

The three new guards are all positioned before any state mutation, the new tests are correct, and the fakeKEK's override field being unpopulated in tests is harmless (nil map read in Go returns zero value, no panic).

Verdict

Two carry-forward items remain: the one-line stale comment and the two missing partial-wiring cases. The round-2 fixes themselves are clean and all four were addressed correctly. The implementation is sound for merge modulo those two.

claude[bot] r1 verdict (post-r2 push, ungated by timing) was
'implementation is solid' but flagged 2 carry-over items, and
gemini-code-assist posted a separate medium on the same
commit. All addressed here:

## claude[bot] r1 #1 — Stale comment fragment at ApplyBootstrap godoc

Line 289 had a leftover line from the Stage 6A stub:

  // ApplyBootstrap returns ErrKEKNotConfigured as the Stage 6A
  // ApplyBootstrap implements §5.6 step 1a's initial bootstrap apply:

The first line is the head of the old Stage 6A stub doc that
was never removed when the real implementation replaced it.
Reads as two conflicting sentences. Removed.

## claude[bot] r1 #1b — Crash-recovery wording precision

The 'an in-memory DEK only escapes to disk after the keystore
install succeeded' phrasing was slightly misleading: the
sidecar stores the WRAPPED DEK (KEK-output bytes), not
cleartext. The actual invariant the ordering achieves is:
ErrKeyConflict from Keystore.Set aborts the apply before any
disk mutation. Reworded to state that invariant directly.

## claude[bot] r1 #2 — Missing partial-wiring sub-cases (test gap)

TestApplyBootstrap_PartialWiring covered 4 of 6 possible
partial combinations. Added the missing two:

  - kek_and_sidecar_path (no keystore)
  - keystore_and_sidecar_path (no kek)

Both correctly hit bootstrapAndRotationConfigured() == false
and return ErrKEKNotConfigured. Test now exhaustively covers
the 6-combination partial-wiring matrix.

## gemini-code-assist medium — Blocking behaviour documentation

gemini flagged that ApplyBootstrap's synchronous per-row
inserts (one pebble.Sync per BatchRegistry entry) block the
Raft apply loop. The synchronous IO is part of the §6.3
contract — the engine's applyMu already serialises the
dispatch, so blocking IS the expected shape — but the godoc
did not call it out explicitly.

Added a 'Blocking behaviour' section to the ApplyBootstrap
godoc:
  - KEK.Unwrap may dial a KMS in production providers.
  - WriteSidecar fsyncs to disk.
  - Every BatchRegistry row is one pebble.Sync.
  - BootstrapBatchRowCap = 1<<14 bounds the worst case.
  - A future optimisation could replace per-row Set with
    pebble.Batch.Commit(pebble.Sync); current shape matches
    Stage 6A ApplyRegistration semantics row-for-row.

This documents the contract; the optimisation is deferred —
not a blocker for 6B-1.

## Caller audit

Doc-only edits (1 stale-line removal + 1 wording precision fix
+ blocking-behaviour godoc addition) + 2 new test sub-cases.
No code paths changed; no callers to audit.

## Verification

- go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (18 tests, +2 sub-cases)
- golangci-lint run ./internal/encryption/... — 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 16, 2026

@claude review

Round-3 (commit 99c8b760): your r1 verdict's 2 carry-overs + gemini medium addressed.

# Source Finding Fix
1 claude[bot] r1 Stale comment at ApplyBootstrap godoc (Stage 6A stub leftover) Removed
1b claude[bot] r1 Crash-recovery wording: 'in-memory DEK escapes to disk' was misleading (sidecar stores WRAPPED DEK, not cleartext) Reworded to state the actual invariant: ErrKeyConflict from Keystore.Set aborts before any disk mutation
2 claude[bot] r1 TestApplyBootstrap_PartialWiring missing 2 of 6 combinations Added kek_and_sidecar_path + keystore_and_sidecar_path
3 gemini medium ApplyBootstrap synchronous per-row inserts block apply loop Added 'Blocking behaviour' section to godoc documenting the contract; pebble.Sync per BatchRegistry entry, BootstrapBatchRowCap=1<<14 bounds worst case, optimisation (pebble.Batch.Commit) deferred

Verification

  • go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (18 tests)
  • golangci-lint run ./internal/encryption/... — 0 issues

@claude
Copy link
Copy Markdown

claude Bot commented May 16, 2026

Claude finished @bootjp's task in 2m 11s —— View job


Stage 6B-1 Round-3 Review

Tasks

  • Read trigger comment and understand review scope
  • Read internal/encryption/applier.go (current HEAD)
  • Read internal/encryption/applier_test.go (current HEAD)
  • Verify each round-3 fix is correctly implemented
  • Check for any new issues introduced
  • Post full review

Round-3 Fix Verification

All four round-3 items are correctly addressed.

Fix 1 — Stale comment removed (applier.go:289)

The godoc now starts cleanly with // ApplyBootstrap implements §5.6 step 1a's initial bootstrap apply:. The Stage 6A stub fragment is gone. ✅

Fix 1b — Crash-recovery wording (applier.go:305-312)

The reworded invariant is accurate: "Keystore.Set fires before WriteSidecar so an ErrKeyConflict from Set aborts the apply before any disk mutation." This correctly describes what the ordering achieves — the abort-before-disk-mutation guarantee — without the previous misleading reference to cleartext DEKs escaping to disk. ✅

Fix 2 — Two missing partial-wiring test cases (applier_test.go:420-427)

Both kek_and_sidecar_path and keystore_and_sidecar_path are present. Each uses the correct assertion pattern (err != nil + errors.Is(err, ErrKEKNotConfigured)). All 6 of 6 possible two-of-three partial combinations are now pinned. ✅

Fix 3 — Blocking behaviour documented (applier.go:319-333)

The "Blocking behaviour" section correctly characterises each I/O touch point (KMS dial, fsync, pebble.Sync per registry row), explains why synchronous IO is the right contract for the FSM apply path, names the BootstrapBatchRowCap = 1<<14 bound, and defers the pebble.Batch.Commit optimisation appropriately. ✅


One New Issue Introduced by the Round-2 Guard

TestApplyRotation_UnknownPurpose is no longer testing what it claims (applier_test.go:599-629)

The round-2 proposer-registration DEK mismatch guard (p.ProposerRegistration.DEKID != p.DEKID) now fires before sidecarPurposeFor in ApplyRotation. The test payload uses a zero-value ProposerRegistration (DEKID 0) with p.DEKID = 9, so 0 != 9 triggers the mismatch guard and the function returns ErrEncryptionApply from there — sidecarPurposeFor is never reached. The test still passes (it checks ErrEncryptionApply), but it is testing the wrong path. The sidecarPurposeFor code path is now unreachable by any test in the suite.

Fix: set ProposerRegistration.DEKID to match p.DEKID so the payload passes the mismatch guard and reaches the purpose dispatch:

err := app.ApplyRotation(fsmwire.RotationPayload{
    SubTag:  fsmwire.RotateSubRotateDEK,
    DEKID:   9,
    Purpose: fsmwire.Purpose(0xFE), // not Storage / Raft
    Wrapped: []byte("y"),
    ProposerRegistration: fsmwire.RegistrationPayload{DEKID: 9, FullNodeID: 0x01, LocalEpoch: 0}, // must match DEKID
})

Fix this →


Verdict

Three carry-overs from rounds 1 and 2 are closed. One new issue: TestApplyRotation_UnknownPurpose was broken by the round-2 proposer-DEK mismatch guard — sidecarPurposeFor is now untested. Single-line payload fix, then the suite is clean for merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c8b76037

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +364 to +366
for i, reg := range p.BatchRegistry {
if err := a.ApplyRegistration(reg); err != nil {
return errors.Wrapf(err, "applier: bootstrap batch registry insert at index %d (dek_id=%d, full_node_id=%#x)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject bootstrap rows that target non-bootstrap DEKs

ApplyBootstrap writes every BatchRegistry row without checking that each row’s DEKID matches either StorageDEKID or RaftDEKID. A malformed 0x04 payload can therefore persist writer-registry state for unrelated DEKs while the sidecar/keystore only install the two bootstrap keys, violating the bootstrap invariant and potentially triggering false rollback/collision behavior on those stray DEKs later. The applier should fail closed here (similar to the rotation proposer-DEK consistency check) before calling ApplyRegistration.

Useful? React with 👍 / 👎.

…K) + claude[bot] r3 test correctness

Two findings on round-3 (commit 99c8b76), both addressed:

## codex P2 — Reject bootstrap rows that target non-bootstrap DEKs

ApplyBootstrap was writing every BatchRegistry row without
checking that each row's DEKID matches either StorageDEKID or
RaftDEKID. A malformed 0x04 payload could persist
writer-registry state for unrelated DEKs while the bootstrap
installs only the two declared keys — silently breaking the
§4.1 first-writer invariant on the next post-bootstrap write
under that foreign DEK.

This is the same class of input-validation gap as the round-2
proposer-DEK-mismatch guard on ApplyRotation.

Fix: validate every BatchRegistry[i].DEKID matches one of the
two bootstrap DEKs before any state mutation. Halt with
ErrEncryptionApply on mismatch. The check is placed in a new
validateBootstrap helper alongside the existing wiring-check
+ identical-DEK-ID guard so the dispatch hot path stays below
the cyclop complexity budget.

## claude[bot] r3 — TestApplyRotation_UnknownPurpose tests wrong path

The round-2 proposer-DEK-mismatch guard (added in
15681ca) fires before sidecarPurposeFor in ApplyRotation.
The test payload used a zero-value ProposerRegistration
(DEKID=0) with p.DEKID=9, so 0 != 9 triggered the mismatch
guard and the function returned ErrEncryptionApply from
THERE — sidecarPurposeFor was never reached. The test still
passed (it checked ErrEncryptionApply) but it was testing the
wrong code path.

Fix: set ProposerRegistration.DEKID: 9 in the test payload so
the mismatch guard passes and sidecarPurposeFor is reached.

## New test

TestApplyBootstrap_RejectsForeignBatchRegistryDEK pins the new
guard. The payload includes one valid row (DEKID matching
StorageDEKID) and one foreign row (DEKID 99). The applier
halts with ErrEncryptionApply BEFORE any state mutation, so
the test also asserts:
  - keystore.Has(1)/Has(2) are both false (no keystore mutation)
  - sidecar file does not exist (no disk write)

## Caller audit

Both fixes are stricter input validation — fail-CLOSED at the
input layer. Single production caller (kv/fsm dispatch) is
unaffected; the halt surface is identical to other
ErrEncryptionApply paths. The 4 partial-wiring sub-cases in
TestApplyBootstrap_PartialWiring still pass — they all use
empty BatchRegistry slices so the new loop is trivially
satisfied.

## Verification

- go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (19 tests)
- golangci-lint run ./internal/encryption/... — 0 issues (extracted validateBootstrap helper keeps ApplyBootstrap under the cyclop budget)
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 16, 2026

@claude review

Round-4 (commit 88da2b21): 2 findings on round-3 addressed.

# Source Finding Fix
1 codex P2 (new on r3 HEAD) ApplyBootstrap doesn't validate that each BatchRegistry row's DEKID matches StorageDEKID or RaftDEKID New guard in extracted validateBootstrap helper; halts with ErrEncryptionApply before any state mutation
2 claude[bot] r3 TestApplyRotation_UnknownPurpose exercised the wrong code path (round-2 proposer-DEK-mismatch guard fired before sidecarPurposeFor) Set ProposerRegistration.DEKID: 9 to match p.DEKID so the test reaches the intended path

New test TestApplyBootstrap_RejectsForeignBatchRegistryDEK exercises the new guard + verifies no side effects (no keystore mutation, no sidecar file written). The validation logic was extracted into a validateBootstrap helper to keep ApplyBootstrap below the cyclop complexity budget.

Caller audit

Both fixes are stricter input validation, fail-CLOSED at input layer, single production caller (kv/fsm dispatch) unaffected. 4 partial-wiring sub-cases still pass (empty BatchRegistry trivially satisfies the new loop).

Verification

  • go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (19 tests)
  • golangci-lint run ./internal/encryption/... — 0 issues

@claude
Copy link
Copy Markdown

claude Bot commented May 16, 2026

Claude finished @bootjp's task in 2m 15s —— View job


Stage 6B-1 Round-4 Review

Tasks

  • Read trigger comment and understand review scope
  • Read internal/encryption/applier.go (current HEAD)
  • Read internal/encryption/applier_test.go (current HEAD)
  • Verify round-3 fix 1: validateBootstrap guard for foreign BatchRegistry DEKIDs
  • Verify round-3 fix 2: TestApplyRotation_UnknownPurpose now reaches sidecarPurposeFor
  • Check for new issues introduced in round-4 changes
  • Post full review

Round-4 Fix Verification

Both round-4 fixes are correctly implemented.

Fix 1 — Foreign BatchRegistry DEKID guard (applier.go:381-397)

validateBootstrap is a clean extraction. It runs all three input invariants in sequence:

  1. bootstrapAndRotationConfigured() — KEK+Keystore+SidecarPath trio.
  2. StorageDEKID == RaftDEKID — distinct IDs required.
  3. Loop over BatchRegistry: each row's DEKID must match StorageDEKID or RaftDEKID.

Critically, validateBootstrap is called at the very top of ApplyBootstrap before the first kek.Unwrap, before any keystore.Set, and before writeBootstrapSidecar. The test TestApplyBootstrap_RejectsForeignBatchRegistryDEK correctly exercises the path where reg[0].DEKID=1 passes and reg[1].DEKID=99 trips the guard — and then verifies both ks.Has(1) and ks.Has(2) are false, and ReadSidecar returns an error. No-side-effects assertions are sound. ✅

The TestApplyBootstrap_PartialWiring cases with BootstrapPayload{StorageDEKID: 1, RaftDEKID: 2} (empty BatchRegistry) still return ErrKEKNotConfigured because the trio check fires first, before the BatchRegistry loop is reached. Partial-wiring coverage remains intact. ✅

Fix 2 — TestApplyRotation_UnknownPurpose exercises correct path (applier_test.go:617-632)

Setting ProposerRegistration: fsmwire.RegistrationPayload{DEKID: 9, ...} to match p.DEKID = 9 means the mismatch guard at applier.go:478 evaluates 9 != 9 → false, passes. Execution then reaches writeRotationSidecarsidecarPurposeFor(fsmwire.Purpose(0xFE))ErrEncryptionApply. The test comment documents exactly why ProposerRegistration.DEKID must match, which is the right information for future readers. ✅


One Design Observation (no action needed)

TestApplyRotation_UnknownPurpose — keystore side effect not verified (applier_test.go:620)

With the mismatch guard passing, the execution path is now: kek.Unwrapkeystore.Set(9, dek) ✅ → writeRotationSidecar fails at sidecarPurposeFor → error returned. This means keystore gets DEK 9 installed as a side effect even though the overall apply halts.

This is behaviorally correct: the FSM halts, the operator must fix the malformed entry before the cluster can resume, and on restart the keystore is rebuilt from the sidecar (6B-2 startup sequence), so the orphaned DEK 9 entry vanishes. The test doesn't verify this side effect exists but that's fine — the test is pinning the fail-closed behavior of sidecarPurposeFor, not the crash-recovery path. Worth knowing for 6B-2 when startup keystore reconstruction is wired.

Compare with TestApplyBootstrap_RejectsForeignBatchRegistryDEK which explicitly checks no-side-effects — that's possible because validateBootstrap fires before any mutations. The unknown-purpose path doesn't have that property because sidecarPurposeFor is called after keystore.Set.


No New Issues Found

The validateBootstrap extraction is clean. All 19 tests pass per the reported verification. The codex P2 finding is addressed correctly and the round-3 test path bug is fixed. The implementation is ready for merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88da2b21f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +420 to +421
sc.Active.Storage = p.StorageDEKID
sc.Active.Raft = p.RaftDEKID
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject repeated bootstrap once active DEKs exist

ApplyBootstrap currently reads any existing sidecar and then unconditionally overwrites Active.Storage/Active.Raft, so a second committed bootstrap entry with different DEK IDs will succeed and re-point the cluster’s active keys instead of failing as an already-bootstrapped state. This bypasses the intended one-time bootstrap invariant and allows key-state transitions outside the rotation path, which can leave writer-registry and key lifecycle semantics inconsistent with the bootstrap/rotation contract.

Useful? React with 👍 / 👎.

…h different DEKs

codex P1 on round-4 (commit 88da2b2): ApplyBootstrap was
reading the existing sidecar and unconditionally overwriting
Active.{Storage,Raft}, so a second committed bootstrap entry
with different DEK IDs would succeed and re-point the
cluster's active keys instead of failing as already-bootstrapped.

This bypasses the §5.6 one-time-bootstrap invariant and lets
key-state transitions happen outside the rotation path, leaving
writer-registry and key-lifecycle semantics inconsistent.

## Fix

validateBootstrap now calls checkBootstrapIdempotency after
the per-row BatchRegistry validation. The idempotency check:

  - sidecar absent → first bootstrap, fall through
  - sidecar present but Active=(0,0) → never bootstrapped (or
    a malformed sidecar), fall through. The §9.1 startup guard
    in Stage 6C will reject the malformed-sidecar shape; this
    apply-time check focuses on the bootstrap invariant.
  - Active matches the payload's (StorageDEKID, RaftDEKID) →
    idempotent replay of the SAME bootstrap (e.g., crash after
    apply but before snapshot). Allow.
  - Active differs from the payload → second bootstrap with
    different DEKs. Halt with ErrEncryptionApply.

checkBootstrapIdempotency is split out of validateBootstrap to
keep the dispatch path under the cyclop complexity budget
(validateBootstrap was 13, max is 10).

## Test

TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs:

  - First bootstrap with (Storage=1, Raft=2) succeeds.
  - Second bootstrap with (Storage=7, Raft=8) is rejected with
    ErrEncryptionApply.
  - Post-rejection sidecar Active is verified to STILL be
    (1, 2) — the rejected bootstrap must not mutate Active.

The existing TestApplyBootstrap_Replay_Idempotent continues to
pass: same payload twice → second apply hits the matching-DEK
branch of checkBootstrapIdempotency → falls through → succeeds.

## Caller audit

Stricter input validation, fail-CLOSED. Single production
caller (kv/fsm dispatch) propagates via HaltApply. Idempotent
replay (matching DEK IDs) still works, so no change to crash-
recovery semantics. The new rejected case is only a second
bootstrap with DIFFERENT DEK IDs — operationally only reachable
from a malformed or malicious BootstrapEncryption RPC.

## Verification

- go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (20 tests)
- golangci-lint run ./internal/encryption/... — 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 16, 2026

@claude review

Round-5 (commit f27a8a0f): codex r4 P1 (reject second bootstrap with different DEKs) addressed.

codex P1

ApplyBootstrap currently reads any existing sidecar and then unconditionally overwrites Active.Storage/Active.Raft, so a second committed bootstrap entry with different DEK IDs will succeed and re-point the cluster's active keys instead of failing as an already-bootstrapped state. This bypasses the intended one-time bootstrap invariant and allows key-state transitions outside the rotation path.

Fix

New checkBootstrapIdempotency (split from validateBootstrap to stay under cyclop budget) enforces §5.6 one-time-bootstrap:

Sidecar state Action
absent first bootstrap → fall through
present, Active=(0,0) never bootstrapped → fall through
present, Active matches payload idempotent replay → fall through
present, Active differs from payload second bootstrap with different DEKs → halt with ErrEncryptionApply

Test

TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs:

  • First bootstrap (1, 2) succeeds
  • Second bootstrap (7, 8) rejected
  • Post-rejection sidecar Active verified to still be (1, 2)

TestApplyBootstrap_Replay_Idempotent continues to pass (matching-DEK replay falls through).

Caller audit

Stricter fail-CLOSED at input layer. Single production caller (kv/fsm dispatch) propagates via HaltApply. Idempotent replay invariant preserved.

Verification

  • go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (20 tests)
  • golangci-lint run ./internal/encryption/... — 0 issues

@claude
Copy link
Copy Markdown

claude Bot commented May 16, 2026

Claude finished @bootjp's task in 1m 43s —— View job


Stage 6B-1 Round-5 Review

Tasks

  • Read trigger comment and understand review scope
  • Read internal/encryption/applier.go (current HEAD)
  • Read internal/encryption/applier_test.go (current HEAD)
  • Verify checkBootstrapIdempotency implementation
  • Check all 4 sidecar state transitions
  • Verify new test coverage
  • Check for new issues introduced
  • Post full review

Round-5 Fix Verification

checkBootstrapIdempotency implementation — Correct.

All four state transitions are correctly handled (applier.go:412-429):

Sidecar state Path Correct?
Absent (IsNotExist) err != nil && !IsNotExist(err) is false → sc == nilreturn nil
Present, Active=(0,0) sc.Active.Storage == 0 && sc.Active.Raft == 0return nil
Present, Active matches payload sc.Active.Storage == p.StorageDEKID && sc.Active.Raft == p.RaftDEKIDreturn nil
Present, Active differs from payload Falls through to ErrEncryptionApply with descriptive message

The function is correctly positioned at the tail of validateBootstrap, meaning the sidecar read only fires after all structural validation (trio check, distinct IDs, BatchRegistry DEKID check) has already passed. No redundant I/O on malformed payloads. ✅

Guard fires before any state mutation — correct ordering confirmed.

checkBootstrapIdempotency is called from validateBootstrap at line 335, before the first a.kek.Unwrap call at line 338. A rejected second bootstrap cannot pollute the keystore (DEKs 7, 8 never reach keystore.Set) or the sidecar (WriteSidecar is not called). ✅

TestApplyBootstrap_Replay_Idempotent still works.

On the second apply of the same payload: checkBootstrapIdempotency reads the sidecar, finds Active.Storage==1 == p.StorageDEKID and Active.Raft==2 == p.RaftDEKID, and falls through. Replay continues; all subsequent steps are idempotent. ✅

Note: writeBootstrapSidecar does overwrite sc.Keys[…].Created with a fresh a.now() timestamp on replay, but §5.1 explicitly does not require byte-equal sidecars across replicas or replays. The test correctly does not assert sidecar byte equality.

TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs — mostly correct.

The test correctly:

  • Verifies ErrEncryptionApply is returned.
  • Reads the sidecar post-rejection and asserts Active == (1, 2) (not re-pointed).

One no-side-effects assertion is missingTestApplyBootstrap_RejectsForeignBatchRegistryDEK set the precedent by also checking !ks.Has(1) && !ks.Has(2). The second-bootstrap test should follow the same pattern and assert !ks.Has(7) && !ks.Has(8). The omission is low-risk (the guard provably fires before kek.Unwrap), but it's an inconsistency with the existing pattern in the suite. Suggested addition after the sidecar assertion:

if ks.Has(7) || ks.Has(8) {
    t.Error("keystore mutated despite rejection of second bootstrap")
}

Fix this →


Minor Observations (no action needed)

Double sidecar read on idempotent replay (applier.go:413, applier.go:442): checkBootstrapIdempotency and writeBootstrapSidecar both call ReadSidecar. Since ApplyBootstrap runs under applyMu (serialized), the two reads are guaranteed to see the same content. No correctness issue — worth noting for 6B-2's startup sequence where the double read may matter for observability/metrics.

The Active=(0,0) sentinel (applier.go:420-422): Treating DEK ID 0 as "never bootstrapped" is the right choice given the existing struct zero value. Worth noting this as a reserved/invalid DEK ID in the fsmwire type comments when 6B-2 adds the protobuf/wire encoding validation.


Verdict

Round-5 is clean. The checkBootstrapIdempotency fix is correctly implemented and all four state transitions are handled. The single carry-over item is one missing no-side-effects assertion in TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs (add !ks.Has(7) && !ks.Has(8) to match the pattern established by TestApplyBootstrap_RejectsForeignBatchRegistryDEK). All prior round findings are closed. The implementation is ready for merge modulo that single-line addition.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f27a8a0fc7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/encryption/applier.go Outdated
Comment on lines +423 to +424
if sc.Active.Storage == p.StorageDEKID && sc.Active.Raft == p.RaftDEKID {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject bootstrap replays unless full payload matches

The idempotency check currently treats any bootstrap as valid replay when Active.Storage and Active.Raft match, then ApplyBootstrap still executes sidecar writes and the full BatchRegistry apply loop. That means a second committed bootstrap with the same DEK IDs but different BatchRegistry (or re-wrapped bytes that unwrap to the same DEK) can mutate writer-registry/sidecar state after the cluster is already bootstrapped, which violates the one-time bootstrap invariant. Fresh evidence in this revision is that checkBootstrapIdempotency returns early on ID match while ApplyBootstrap continues with for i, reg := range p.BatchRegistry.

Useful? React with 👍 / 👎.

…ay check) + claude r5 test gap

Two findings on round-5 (commit f27a8a0), both addressed:

## codex P1 — Reject bootstrap replays unless full payload matches

Round-5's checkBootstrapIdempotency matched on DEK IDs alone.
That allowed a second bootstrap entry committed with the SAME
DEK IDs but DIFFERENT wrapped DEK bytes (or BatchRegistry rows)
to flow past the idempotency check, then re-run
writeBootstrapSidecar (overwriting keys[].Wrapped with new
bytes) and iterate the new BatchRegistry — potentially
inserting writer-registry rows for previously-unregistered
nodes through the §4.1 case-1 first-seen path.

## Fix

checkBootstrapIdempotency now compares wrapped DEK bytes
against the sidecar's existing keys[id].Wrapped entries
after the DEK-ID match passes. Raft determinism couples
wrapped bytes to the entry hash — so:

  - Legitimate replay: same Raft entry, same payload bytes,
    wrapped bytes match → fall through to idempotent apply.
  - Malformed/malicious second bootstrap with different
    wrapped material at the same DEK IDs: wrapped bytes
    differ → halt with ErrEncryptionApply.

If the wrapped bytes match, BatchRegistry must also match
(Raft entry hash couples ALL fields), so the registry
batch insert is provably idempotent.

The new logic is split into checkBootstrapWrappedMatches to
keep the dispatch path under the cyclop complexity budget.

## Defense-in-depth for sidecar inconsistency

If sc.Active matches the payload but sc.Keys[id] is missing
for either DEK ID, the sidecar is internally inconsistent —
halt with a descriptive error rather than nil-deref on the
missing map entry.

## claude r5 — Missing no-side-effects assertion

TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs
verified the sidecar Active was not re-pointed but did not
check that the rejected DEKs (7, 8) were absent from the
keystore. Pattern set by
TestApplyBootstrap_RejectsForeignBatchRegistryDEK requires
both. Added the ks.Has(7) || ks.Has(8) assertion.

## New test

TestApplyBootstrap_RejectsSecondBootstrapWithSameIDsDifferentWrapped:

  - First bootstrap with wrapped 's1-orig' / 'r2-orig' succeeds.
  - Second bootstrap with same DEK IDs but wrapped
    's1-tampered' / 'r2-tampered' AND a previously-
    unregistered node 0x99 in BatchRegistry is rejected.
  - Post-rejection: sidecar keys[].Wrapped still has ORIGINAL
    bytes (not tampered); writer-registry row for node 0x99
    was NOT inserted (the rejection fired before
    ApplyRegistration could case-1-insert it).

## Caller audit

Stricter input validation, fail-CLOSED at the input layer.
Same as previous rounds: single production caller (kv/fsm
dispatch) propagates via HaltApply. Legitimate Raft replay
(matching wrapped bytes) still succeeds, so no change to
crash-recovery semantics.

## Verification

- go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (21 tests)
- golangci-lint run ./internal/encryption/... — 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 16, 2026

@claude review

Round-6 (commit c1ecd450): codex r5 P1 (stricter replay check) + your r5 test gap addressed.

codex P1

Round-5's idempotency check matched on DEK IDs alone. A second bootstrap with SAME DEK IDs but DIFFERENT wrapped bytes (or BatchRegistry rows) would have flowed past — re-running writeBootstrapSidecar with new wrapped bytes + iterating new BatchRegistry rows (potentially case-1-inserting unregistered nodes).

Fix

checkBootstrapIdempotency now also compares wrapped DEK bytes against sc.Keys[id].Wrapped (split into checkBootstrapWrappedMatches for cyclop). Raft determinism couples wrapped bytes to the entry hash, so matching wrapped bytes is the stable proof of legitimate replay (BatchRegistry must also match).

Added defense-in-depth: if Active matches the payload but keys[id] is missing for either DEK ID, the sidecar is internally inconsistent → halt with a descriptive error.

claude r5 gap

Added ks.Has(7) || ks.Has(8) no-side-effects assertion to TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs (matches the pattern set by TestApplyBootstrap_RejectsForeignBatchRegistryDEK).

New test

TestApplyBootstrap_RejectsSecondBootstrapWithSameIDsDifferentWrapped:

  • First bootstrap with wrapped 's1-orig' / 'r2-orig' succeeds.
  • Second bootstrap with same DEK IDs but tampered wrapped + previously-unregistered node 0x99 in BatchRegistry → rejected.
  • Verifies: sidecar keys[].Wrapped still has ORIGINAL bytes; writer-registry row for 0x99 was NOT inserted.

Caller audit

Stricter fail-CLOSED. Legitimate Raft replay (matching wrapped bytes) still succeeds.

Verification

  • go test -race -timeout=60s -run 'TestApply|TestNewApplier' ./internal/encryption/... — PASS (21 tests)
  • golangci-lint run ./internal/encryption/... — 0 issues

@claude
Copy link
Copy Markdown

claude Bot commented May 16, 2026

Claude finished @bootjp's task in 2m 29s —— View job


Stage 6B-1 Round-6 Review

Tasks

  • Read trigger comment and understand review scope
  • Read internal/encryption/applier.go (current HEAD)
  • Read internal/encryption/applier_test.go (current HEAD)
  • Verify checkBootstrapWrappedMatches implementation and placement
  • Verify defense-in-depth missing-keys guard
  • Verify TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs now has ks.Has(7)||ks.Has(8) check
  • Verify TestApplyBootstrap_RejectsSecondBootstrapWithSameIDsDifferentWrapped exercises the right path
  • Check for new issues introduced
  • Post final review

Round-6 Fix Verification

Fix 1 — checkBootstrapWrappedMatches + missing-keys guard

Both the wrapped-bytes check and the defense-in-depth guard are correctly implemented.

Call chain: ApplyBootstrapvalidateBootstrapcheckBootstrapIdempotencycheckBootstrapWrappedMatches. All of this fires before the first a.kek.Unwrap at applier.go:339, so a rejected second bootstrap cannot touch the keystore or sidecar.

Walking checkBootstrapIdempotency's state table (with the new wrapped-bytes branch added):

Sidecar state Path Correct?
Absent (IsNotExist) sc == nilreturn nil
Present, Active=(0,0) return nil
Present, Active differs from payload ErrEncryptionApply
Present, Active matches, key missing checkBootstrapWrappedMatches!hasStorage || !hasRaftErrEncryptionApply "sidecar inconsistent"
Present, Active matches, keys present, wrapped bytes differ !bytes.Equal(...)ErrEncryptionApply "different wrapped DEK material"
Present, Active matches, keys present, wrapped bytes match return nil → fall through to idempotent replay

The missing-keys guard is correctly positioned and catches the pathological case where Active is set but Keys has been partially truncated (e.g., a crash between the two sc.Keys[...] assignments in writeBootstrapSidecar). Halting with a descriptive error is the right operator signal there.

Fix 2 — TestApplyBootstrap_RejectsSecondBootstrapWithDifferentDEKs keystore assertion (applier_test.go:683-685)

if ks.Has(7) || ks.Has(8) is present and correctly positioned after the sidecar Active check. The guard in checkBootstrapIdempotency fires at the Active.Storage != p.StorageDEKID branch (1 != 7), before kek.Unwrap, so DEKs 7 and 8 can never reach keystore.Set. The assertion completes the no-side-effects proof for this test, matching the pattern set by TestApplyBootstrap_RejectsForeignBatchRegistryDEK. ✅

Fix 3 — TestApplyBootstrap_RejectsSecondBootstrapWithSameIDsDifferentWrapped (applier_test.go:701-753)

The test exercises the new code path correctly:

  1. First bootstrap ("s1-orig" / "r2-orig") succeeds, writing sc.Keys["1"].Wrapped = []byte("s1-orig") to the sidecar.
  2. Second bootstrap (same IDs, "s1-tampered" / "r2-tampered") → checkBootstrapIdempotency: Active.Storage==1==p.StorageDEKID and Active.Raft==2==p.RaftDEKID → both match → calls checkBootstrapWrappedMatchesbytes.Equal([]byte("s1-orig"), []byte("s1-tampered")) → false → ErrEncryptionApply. ✅
  3. Post-rejection assertions: sc.Keys["1"].Wrapped == "s1-orig" ✅, sc.Keys["2"].Wrapped == "r2-orig" ✅, registry row for node 0x99 not inserted ✅.

The Raft-determinism argument in the PR description is sound: if wrapped bytes match, the entire log entry is the same, so BatchRegistry must match too. The wrapped-bytes check is the correct proxy for "legitimate replay."

TestApplyBootstrap_Replay_Idempotent still valid

On the second apply: checkBootstrapWrappedMatches compares sc.Keys["1"].Wrapped (which is []byte("storage") after the JSON round-trip) against p.WrappedStorage = []byte("storage")bytes.Equal returns true. checkBootstrapIdempotency returns nil → full sequence runs → all steps are idempotent. ✅


No New Issues Found

The implementation is clean. All five state transitions in checkBootstrapIdempotency are handled, the wrapped-bytes check is correctly implemented and positioned, the r5 gap is closed, and the new test exercises the intended path with appropriate assertions.

One observation for completeness (no action needed): TestApplyBootstrap_RejectsSecondBootstrapWithSameIDsDifferentWrapped could additionally assert ks.Has(1) && ks.Has(2) to confirm the original DEKs survived the rejected second apply. Since the guard fires before kek.Unwrap, this is trivially true — the test already proves it implicitly via the sidecar assertion, and the tampered material could never reach keystore.Set to trigger an ErrKeyConflict. Not worth a round-7.


Verdict

All six rounds of findings are closed. No new issues in round-6. The implementation is ready for merge.

@bootjp bootjp merged commit cae5bf8 into main May 16, 2026
9 checks passed
@bootjp bootjp deleted the feat/encryption-stage6b branch May 16, 2026 21:03
bootjp added a commit that referenced this pull request May 17, 2026
…ag + mutator-RPC gate (#776)

## Summary

Stage 6B-2 per the [PR #762
plan](#762). Completes Stage 6B
with the operator-facing surface: `--kekFile` / `--encryption-enabled`
flags, KEK loader at startup, applier threading, and the re-enabled
mutator-RPC wiring in `registerEncryptionAdminServer` gated on
(`--encryption-enabled` AND `--kekFile` non-empty).

Builds on Stage 6B-1 (PR #768) which landed the real `ApplyBootstrap` /
`ApplyRotation` paths. This PR wires those paths into production by
threading KEK + Keystore + sidecar path through the FSM construction and
re-enabling the EncryptionAdmin mutator RPCs under a double gate.

## New flags

| Flag | Default | Purpose |
|---|---|---|
| `--kekFile` | `""` | §5.1 KEK file path (32 raw bytes, owner-only
mode). Constructs `kek.FileWrapper` at startup; threaded into the §6.3
EncryptionApplier. `--kekUri` deferred to Stage 9 (KMS providers). |
| `--encryption-enabled` | `false` | §6.5 opt-in to the mutating
EncryptionAdmin RPCs. Required (alongside `--kekFile`) for
`BootstrapEncryption`/`RotateDEK`/`RegisterEncryptionWriter` to wire. |

## Double-gate

Both flags are independently necessary for the mutator wiring:

```go
func encryptionMutatorsEnabled() bool {
    return *encryptionEnabled && *kekFile != ""
}
```

- `--encryption-enabled` alone → no KEK loaded → mutators wired but
ApplyBootstrap would HaltApply with `ErrKEKNotConfigured` on every
replica. The RPC-layer gate refuses BEFORE the proposal commits.
- `--kekFile` alone → no operator opt-in → mutators stay refused at the
RPC boundary.
- BOTH set → `WithEncryptionAdminProposer` +
`WithEncryptionAdminLeaderView` installed and mutators reach the §6.3
applier through the shard's engine.

## Caller audit (semantic changes — signatures)

- **`buildShardGroups`** gains 3 new tail params `(kekWrapper
*kek.FileWrapper, keystore *encryption.Keystore, sidecarPath string)`. 1
production caller (`main.go run()`), 3 test callers
(`multiraft_runtime_test.go` ×2, `main_bootstrap_e2e_test.go` ×1) — all
updated to pass `nil, nil, ""` for the no-encryption posture.

- **`registerEncryptionAdminServer`** gains 2 new tail params
`(enableMutators bool, engine encryptionAdminEngine)`. 1 production
caller (`main.go startRaftServers`), 2 test callers
(`main_encryption_admin_test.go`) — all updated.

- The mutator gate is fail-CLOSED by default at every site that does not
explicitly opt in.

## Test surface

| Test | Pins |
|---|---|
| `TestEncryptionAdmin_MutatingRPCRefusedWhenGateOff` (3 sub-cases) |
All 3 corners of (enableMutators, engine) with at least one false →
`FailedPrecondition` |
| `TestEncryptionAdmin_MutatingRPCEnabledWhenGateOn` | The (true,
non-nil) corner — asserts the `FailedPrecondition` gate is NO LONGER
firing |

A `stubEncryptionAdminEngine` satisfies the `encryptionAdminEngine`
interface for the gate-on test (no actual Raft proposal — the stub
`Propose()` returns success).

## Stage 5D safety boundary preserved

The Stage 5D regression — operators who set `--encryptionSidecarPath`
alone must not see mutators wired — is preserved by the double gate.
`sidecarPath` alone is strictly the capability surface; the mutator
wiring is gated on the AND of two SEPARATE flags.

## Five-lens self-review

1. **Data loss** — no data path touched. The applier remains fail-closed
under HaltApply for any `ErrEncryptionApply` return. Pebble Sync
semantics unchanged.
2. **Concurrency / distributed failures** — shared Keystore internally
locked; KEK wrapper documented safe for concurrent use; per-shard
applier reads from shared instances.
3. **Performance** — no hot-path change. KEK loading is a single file
read at startup. Per-shard applier construction adds one slice
allocation per group (0–3 element slice).
4. **Data consistency** — mutator gate is the load-bearing safety
boundary. With both gate inputs true, the applier still has Stage 6B-1's
input-validation guards (bootstrap idempotency, foreign-DEK batch
rejection, proposer-DEK mismatch, etc.).
5. **Test coverage** — 4 gate corners + the existing applier tests from
6B-1 + the `buildShardGroups` callers.

## Test plan

- [x] `go test -race -timeout=120s -run
'TestEncryptionAdmin|TestRegisterEncryptionAdminServer|TestApply|TestNewApplier|TestWriterRegistry'
./internal/encryption/... ./store/... ./kv/... .` — PASS
- [x] `go build ./...` — PASS
- [x] `golangci-lint run ./...` — 0 issues on touched files (cyclop /
mnd under budget via factored `loadKEKWrapperFromFlag` +
`applierOptionsFor` + `encryptionMutatorsEnabled` helpers)
- [ ] Full Jepsen suite — not run in this PR (no envelope cutover
happens until Stage 6D; the mutator path is reachable but the operator
must hand-craft a `BootstrapEncryption` RPC, which is not part of the
Jepsen workload set)

## Plan

Stage 6B is now complete. The next stages per [PR
#762](#762):
- **6C** — §9.1 startup refusal guards
- **6D** — `enable-storage-envelope` admin RPC + §7.1 Phase-1 cutover
- **6E** — `enable-raft-envelope` admin RPC + §7.1 Phase-2 cutover
- **6F** — `--encryption-rotate-on-startup` ergonomics


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added encryption opt-in configuration with new command-line flags for
enabling encryption support.
* Encryption admin mutations are now conditionally gated and fail safely
when encryption is not properly configured.

* **Tests**
  * Updated test coverage for encryption admin server gating behavior.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/776?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant