FROST/ROAST readiness branch by mswilkison · Pull Request #3866 · threshold-network/keep-core

mswilkison · 2026-02-19T20:58:37Z

Current State (as of 2026-05-17)

This draft PR is the umbrella readiness branch for feat/frost-schnorr-migration-scaffold.
It is being kept current with main so it can become a direct merge target if the FROST/ROAST stack is approved for activation.

It remains in draft until the remaining phase-gate, governance, and cross-repository readiness items are closed.

Canonical Status Sources

Cross-repo migration tracker: docs/frost-migration/external-repository-tracking.md (in tlabs-xyz/tbtc)
Companion tBTC umbrella draft: https://github.com/tlabs-xyz/tbtc/pull/10
Latest readiness audit: docs/reviews/frost-roast-production-readiness-2026-05-16.md (in tlabs-xyz/tbtc)

Latest Refresh

Merged current main into this branch.
Local verification passed for the FROST signing package and tBTC signer backend paths, with and without frost_native.
Local verification also passed the native TBTC signer-path tests covering the FFI signing primitive and signing executor.

Remaining Cross-Repo Closure Items

Wait for CI from the latest refresh to complete.
Capture the first post-fix funded nightly live run artifact for Phase 4.
Record final approver signoff in the Phase 4 decision/packet docs.
Execute external org archive/redirect mapping and record results.

Notes

Keep this PR in draft until the activation decision is explicit.
Treat it as the readiness branch for the integrated keep-core side of the stack, not only a historical index.

## Summary Operational documentation for the ROAST-driven retry path introduced across RFC-21 Phases 1-7. Intended for node operators and release engineers planning a rollout of the new retry semantics. Doc-only; no code changes; ~130 lines AsciiDoc. Stacked on #3986 (Phase 7.2). ## What it covers - **Activation prerequisites:** build tag + env var + coordinator registration - **Behavioural matrix:** all 5 combinations of build tag / env var / registry / bundle and what each one runs - **Error handling discipline:** static-vs-runtime taxonomy per the RFC-21 Phase-6 resolution - **Production rollout sequencing:** build with tag → verify V1 migration → stage operator opt-in → monitor → roll back via env var - **Cross-references:** every file the multi-phase implementation touched ## What it doesn't include The **readiness manifest itself** -- the cross-repo evidence ledger that gates production enablement -- lives in the \`tlabs-xyz/tbtc\` monorepo's \`docs/operations/\` directory, not in keep-core. This document is the keep-core-side operational guide; the manifest is the operational gate. The manifest stays in \`missing-no-go\` until real testnet evidence is attached (per the standing constraint that readiness manifests are machine-checked evidence, not aspirational documents). ## Phase 7 status | PR | Scope | State | |---|---|---| | 7.1 (#3985) | AggregateBundle + bundle registry | open | | 7.2 (#3986) | ROAST-driven signingParticipantSelector | open | | **7.3 (this)** | **Operational rollout guide (docs)** | **open** | The remaining Phase-7 work is *not in code*: integration testnet evidence, manifest flip in the cross-repo monorepo, and post-rollout build-tag removal (optional). Those happen on the operations side, not in this repository. ## Test plan - [ ] CI green (AsciiDoc renders cleanly via the docs build). - [ ] Reviewer confirms the prerequisites and behavioural matrix match the actual code paths. - [ ] Reviewer confirms the cross-references are accurate.

Closes the M4 gap from the original PR #3866 review by adding the two evidence categories the RFC-21 Phase-2 work left as future work: validation-rejection evidence and first-write-wins-conflict evidence. With this PR, the NextAttempt policy can permanently exclude misbehaving peers on all four ROAST blame channels -- transport-overflow, validation-reject, equivocation-conflict, and silence -- instead of just overflow + silence. Why this matters: a peer that only sends malformed messages (validation rejects, never overflows the channel) was previously indistinguishable from a silent peer. The transient silence- parking policy would bench-and-reinstate them indefinitely, never permanently excluding the malicious behaviour. Same for a peer equivocating mid-attempt: the existing first-write-wins assembly correctly dropped the conflicting retransmission but only logged the event -- the bundle carried no structured evidence the coordinator's policy could act on. * pkg/frost/roast/attempt/evidence_recorder.go - EvidenceRecorder interface gains RecordReject(sender, reason) and RecordConflict(sender). - RejectQuotaDefault = 8, ConflictQuotaDefault = 4 (matches categoryQuota in RFC-21 Layer A). - Evidence struct extended with Rejects (map[MemberIndex][]RejectEntry: per-(sender, reason)) and Conflicts (map[MemberIndex]uint). - boundedRecorder: per-reason quota counter keeps each reason bucket independent so a peer cannot saturate one reason to mask another. Conflicts counter saturates at the conflict quota. - noOpRecorder: every category discards. - NewBoundedRecorderWithQuotas(overflow, reject, conflict) constructor for tests; existing NewBoundedRecorderWithQuota preserved for backward compat (defaults reject + conflict quotas). * pkg/frost/roast/transition_message.go - RejectEntry (Sender + Reason + Count) and ConflictEntry (Sender + Count) wire types added. - LocalEvidenceSnapshot gains Rejects []RejectEntry and Conflicts []ConflictEntry, both omitempty. - NewLocalEvidenceSnapshot canonicalises into sorted slices: rejects ascending by Sender then by Reason; conflicts ascending by Sender. - Evidence() reconstructs the map form for downstream consumption. - Validate() enforces sorted-ascending invariants on both new slices. * pkg/frost/roast/next_attempt.go - RejectExclusionThreshold = 1; ConflictExclusionThreshold = 1 (per RFC-21 Layer B). - computeNextAttempt now consults rejectBlamedSenders and conflictBlamedSenders alongside the existing overflowBlamed set. All three feed into the permanent ExcludedSet. - blamedSenders helper factored to share the threshold-comparison + sort logic across the three category helpers. * pkg/frost/signing/native_frost_protocol_frost_native.go and * pkg/frost/signing/native_ffi_primitive_transitional_frost_native.go - Three reject sites: in each of the three receive loops, the shouldAcceptNativeFROSTMessage failure path now calls evidence.RecordReject(senderID, "validation_gate_rejected") before returning. (Previously the message was just dropped.) - Three conflict sites: the first-write-wins assembly loop's "dropping conflicting" branch now calls evidence.RecordConflict(senderID) immediately before the existing log line. (Previously only the log line.) Tests (15 new cases): * pkg/frost/roast/attempt/evidence_recorder_categories_test.go (7) - RecordReject accumulates by reason - RecordReject per-reason quota saturates - Per-reason quotas independent across reasons - RecordConflict accumulates and saturates - All three categories present in Snapshot after mixed input - NoOp recorder inert across all categories - RFC-quota constants match documented values * pkg/frost/roast/next_attempt_categories_test.go (5) - Single reject crosses threshold -> permanent exclusion - Single conflict crosses threshold -> permanent exclusion - Reject and conflict on different senders -> both excluded - Empty rejects+conflicts -> no exclusion (sanity) - Threshold constants match RFC-21 * Receive-loop wiring is covered by existing send/recv tests combined with the recorder unit tests; no new behaviour test added at the integration level because the NoOp default keeps pre-RFC-21 receive semantics observably unchanged. Verification: * go build ./... + go build -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./... -- both clean * go test ./pkg/frost/... + go test -race ./pkg/frost/roast/... + go test -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./pkg/frost/... -- all pass (5 packages) * staticcheck -checks '-SA1019' ./pkg/frost/... -- silent * go vet ./pkg/frost/... + gofmt -l ./pkg/frost/ -- clean This PR completes M4 from the original PR #3866 review. All four ROAST evidence categories (overflow, reject, conflict, silence) are now operational; the NextAttempt policy excludes on the first three and parks transiently on the fourth, matching RFC-21 Layer B exactly.

…ase-6 milestone) Closes the Phase-6 milestone the RFC named but the implementation skipped: receive callbacks now reject messages whose AttemptContextHash does not match the session's bound AttemptContext. Default builds and sessions without a ROAST- attempt binding skip enforcement entirely, so the change is observationally identical to pre-Phase-6 behaviour outside the ROAST path. The Phase 1B AttemptContextHash field was structural-only (present, 32 bytes) until now. Senders could populate it but receivers ignored the value -- meaning a peer could send a message bound to attempt N to a receiver running attempt N+1 of the same session and the receiver would accept it as long as SessionID matched. This PR closes that gap. * pkg/frost/signing/attempt_context_binding_validation_frost_native.go (new, gated frost_native) - attemptContextHashCarrier interface so the helper covers all three FROST/tbtc-signer message types via their existing GetAttemptContextHash methods. - verifyMessageAttemptContextHash: looks up the session's handle binding via currentAttemptHandleForCollect. No binding -> return nil (legacy / default build). Binding present + matching hash -> return nil. Binding present + missing hash -> ErrAttemptContextHashMissing. Binding present + mismatched hash -> ErrAttemptContextHashMismatch. * pkg/frost/signing/native_frost_protocol_frost_native.go and * pkg/frost/signing/native_ffi_primitive_transitional_frost_native.go Three receive callbacks updated. After the existing shouldAcceptNativeFROSTMessage gate, each callback now calls verifyMessageAttemptContextHash. Failure paths call evidence.RecordReject(senderID, "attempt_context_hash_mismatch") so the policy can permanently exclude peers that consistently send stale-attempt messages. Tests: * attempt_context_binding_validation_frost_native_test.go (gated frost_native && frost_roast_retry, 5 cases) - No binding -> any message passes - Binding + matching hash -> passes - Binding + missing hash -> ErrAttemptContextHashMissing - Binding + mismatched hash -> ErrAttemptContextHashMismatch - Integration with a real nativeFROSTRoundOneCommitmentMessage via SetAttemptContextHash; rebinding to a different context produces a mismatch * attempt_context_binding_validation_default_build_test.go (gated frost_native && !frost_roast_retry, 1 case) - In the default build the helper always passes regardless of message contents, matching the rollback promise. Verification: * go build ./... + go build -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./... -- both clean * go test ./pkg/frost/... -- pass * go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/... -- pass * go test -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./pkg/frost/... -- pass (5 packages) * staticcheck -checks '-SA1019' ./pkg/frost/... -- silent * gofmt -l ./pkg/frost/signing/ -- silent * go vet ./pkg/frost/... -- clean Migration path: * Phase 1B (already shipped): AttemptContextHash is structurally validated when present, optional otherwise. * This PR: the field is enforced *only* when the session has a ROAST-attempt binding. Sessions without a binding -- including every default-build session and every non-ROAST tagged-build session -- continue to ignore the field. * Future PR: once production has rolled out a version that populates the field on every outbound message, enforcement can be made unconditional (binding-or-not).

Adds process-wide cumulative counters for the three evidence categories (overflow / reject / conflict) and exposes them through keep-core's clientinfo registry so operators can observe per- category event rates via the standard Prometheus scrape. The counters increment whenever a metrics-emitting recorder records an event. In default builds and in unregistered-coordinator states the recorder is NoOp, so the counters stay at zero. Operators only see non-zero values once the ROAST-retry registry is populated and live signing flows record evidence -- the "do I have ROAST retry running?" smoke test. * pkg/frost/signing/roast_retry_metrics.go (new, untagged) - Cumulative atomic counters: roastRetryOverflowEvents, roastRetryRejectEvents, roastRetryConflictEvents. - RegisterRoastRetryMetrics(*clientinfo.Registry) registers Source functions under the "frost_roast_retry" application prefix, producing metrics named: - frost_roast_retry_overflow_events_total - frost_roast_retry_reject_events_total - frost_roast_retry_conflict_events_total via the existing ObserveApplicationSource mechanism. - metricsEmittingRecorder wraps an attempt.EvidenceRecorder and bumps the matching counter on each Record* call before delegating to the inner recorder. - Nil-safe: a nil inner recorder collapses to NoOp; a nil clientinfo.Registry is a no-op registration. * pkg/frost/signing/roast_retry_recorder.go (modified) - roastRetryRecorderForCollect now wraps the bounded recorder with newMetricsEmittingRecorder when the registry is populated. NoOp path is unchanged (no metrics emission). Tests (6 cases in roast_retry_metrics_test.go): * Counters increment on Record* (with different per-category counts). * Snapshot delegates to the inner recorder. * Nil inner falls back to NoOp without panicking. * Unregistered coordinator -> NoOp recorder -> no counter bumps. * Concurrent counter increments are race-safe. * RegisterRoastRetryMetrics(nil) is a no-op (defensive guard). Operator wiring: The keep-core node's startup sequence should call RegisterRoastRetryMetrics(&clientinfo.Registry) alongside the existing registry observation calls. Documentation will be added in a follow-up to the rollout guide (docs/development/frost-roast-retry-rollout.adoc). Verification: * go build ./... -- clean * go test ./pkg/frost/... -- pass (5 packages) * go test -race ./pkg/frost/signing/... -- pass * go test -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./pkg/frost/... -- pass (5 packages) * staticcheck -checks '-SA1019' ./pkg/frost/... -- silent * go vet ./pkg/frost/... -- clean * gofmt -l ./pkg/frost/signing/ -- silent Stacked on the AttemptContextHash enforcement PR.

…3988) ## Summary Closes the **M4 gap** from the original PR #3866 review by adding the two evidence categories the RFC-21 Phase-2 work left as future work: **validation-rejection evidence** and **first-write-wins-conflict evidence**. With this PR, the \`NextAttempt\` policy can permanently exclude misbehaving peers on all four ROAST blame channels -- transport-overflow, validation-reject, equivocation-conflict, and silence -- instead of just overflow + silence. ## Why this matters A peer that only sends **malformed messages** (validation rejects, never overflows the channel) was previously indistinguishable from a silent peer. The transient silence-parking policy would bench-and-reinstate them indefinitely, never permanently excluding the malicious behaviour. Same for a peer **equivocating mid-attempt**: the existing first-write-wins assembly correctly dropped the conflicting retransmission but only logged the event -- the bundle carried no structured evidence the coordinator's policy could act on. ## What lands ### Recorder API | Surface | Notes | |---|---| | \`RecordReject(sender, reason)\` | reason captured verbatim; per-reason quota counter | | \`RecordConflict(sender)\` | saturates at conflict quota | | \`RejectQuotaDefault = 8\`, \`ConflictQuotaDefault = 4\` | matches RFC-21 Layer A categoryQuota | | Per-reason quotas independent | peer cannot saturate one reason to mask another | ### Wire types | Type | Sort order | Cap | |---|---|---| | \`RejectEntry{Sender, Reason, Count}\` | asc by Sender, then asc by Reason | per-attempt evidence size bounded by Σ quotas | | \`ConflictEntry{Sender, Count}\` | asc by Sender | per-attempt evidence size bounded by Σ quotas | Both fields use \`omitempty\` so pre-PR snapshots round-trip without the new fields. \`Validate()\` enforces sorted-ascending invariants. ### NextAttempt policy | Threshold | Value | Source | |---|---|---| | \`RejectExclusionThreshold\` | 1 | RFC-21 Layer B ("any non-transport reject is sufficient cause") | | \`ConflictExclusionThreshold\` | 1 | A single conflict is byzantine evidence | \`computeNextAttempt\` merges \`overflowBlamed\`, \`rejectBlamed\`, \`conflictBlamed\` into the permanent ExcludedSet. The \`blamedSenders\` helper is factored out so all three categories share the deterministic sort + threshold-comparison logic. ### Receive-loop wiring Three reject sites and three conflict sites updated across the two files that house the three FROST/tbtc-signer receive loops: | Site | Was | Now | |---|---|---| | \`shouldAcceptNativeFROSTMessage\` returns false | silent drop | \`evidence.RecordReject(senderID, "validation_gate_rejected")\` + drop | | First-write-wins conflict in assembly loop | warn log only | \`evidence.RecordConflict(senderID)\` + warn log | ## Test coverage (15 new cases) - 7 recorder tests: accumulation, per-reason quota saturation, per-reason independence, conflict saturation, all-categories-present, NoOp-inert, RFC-constant assertions - 5 policy tests: single reject excludes, single conflict excludes, reject+conflict on different senders, empty evidence (sanity), threshold-constant assertions - Receive-loop wiring is covered indirectly by the recorder unit tests; the NoOp default keeps pre-RFC-21 receive semantics observably unchanged so no integration-level test is required. ## Verification | Command | Result | |---|---| | \`go build ./...\` + \`go build -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./...\` | both clean | | \`go test ./pkg/frost/...\` + race | pass | | \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./pkg/frost/...\` | pass (5 packages) | | \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent | | \`go vet ./pkg/frost/...\` + \`gofmt -l ./pkg/frost/\` | clean | ## RFC-21 status With this PR, all four ROAST evidence categories are operational. M4 from the original PR #3866 review is **fully closed**. The keep-core code arc for RFC-21 is now feature-complete; remaining work is operations-side (integration testnet, manifest flip). ## Test plan - [ ] CI green. - [ ] Reviewer confirms the per-reason quota independence is the right semantics (alternative: single per-sender reject counter). - [ ] Reviewer confirms threshold = 1 for both reject and conflict (alternative: higher to absorb noise; trade-off is faster vs slower exclusion of misbehaving peers).

…ase-6 milestone) (#3989) ## Summary Closes the Phase-6 milestone the RFC named but the implementation skipped: receive callbacks now reject messages whose \`AttemptContextHash\` does not match the session's bound \`AttemptContext\`. Default builds and sessions without a ROAST-attempt binding skip enforcement entirely, so the change is observationally identical to pre-Phase-6 behaviour outside the ROAST path. Stacked on #3988 (M4 closure). ## Why this matters The Phase 1B \`AttemptContextHash\` field was structural-only (present, 32 bytes) until now. Senders could populate it but receivers ignored the value -- meaning a peer could send a message bound to attempt N to a receiver running attempt N+1 of the same session and the receiver would accept it as long as \`SessionID\` matched. This PR closes that gap. ## What lands | Surface | Behaviour | |---|---| | \`verifyMessageAttemptContextHash(msg, sessionID)\` | No binding → pass (legacy/default). Binding + matching hash → pass. Binding + missing hash → \`ErrAttemptContextHashMissing\`. Binding + mismatch → \`ErrAttemptContextHashMismatch\`. | | \`attemptContextHashCarrier\` interface | One implementation covers all three FROST/tbtc-signer message types via their existing \`GetAttemptContextHash\` methods. | | 3 receive callbacks updated | After \`shouldAcceptNativeFROSTMessage\`, call \`verifyMessageAttemptContextHash\`. Failure → \`evidence.RecordReject(senderID, "attempt_context_hash_mismatch")\` so the policy can permanently exclude peers that consistently send stale-attempt messages. | ## Test coverage | File | Build | Cases | |---|---|---| | \`attempt_context_binding_validation_frost_native_test.go\` | \`frost_native && frost_roast_retry\` | 5 (no-binding, matching, missing, mismatch, real-message integration with rebind) | | \`attempt_context_binding_validation_default_build_test.go\` | \`frost_native && !frost_roast_retry\` | 1 (default build always passes; rollback promise upheld) | ## Migration path 1. **Phase 1B (already shipped):** field structurally validated when present, optional otherwise. 2. **This PR:** enforced *only* when the session has a ROAST-attempt binding. Default builds and non-ROAST tagged sessions continue to ignore the field. 3. **Future PR:** once production has rolled out a version that populates the field on every outbound message, enforcement can be made unconditional. ## Verification | Command | Result | |---|---| | \`go build ./...\` + tagged | both clean | | \`go test ./pkg/frost/...\` | pass | | \`go test -tags 'frost_native frost_tbtc_signer'\` | pass | | \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'\` | pass | | \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent | | \`go vet ./pkg/frost/...\` | clean | | \`gofmt -l ./pkg/frost/signing/\` | silent | ## Test plan - [ ] CI green. - [ ] Reviewer confirms the "no binding = skip enforcement" gate is acceptable (alternative: always enforce when build tag set, regardless of binding -- riskier during transitions). - [ ] Reviewer confirms the failure-mode rejects record evidence rather than just dropping (so misbehaving peers accumulate exclusion-worthy counts).

…nfo (#3990) ## Summary Process-wide cumulative counters for the three evidence categories (overflow / reject / conflict), exposed through keep-core's \`clientinfo\` registry so operators can observe per-category event rates via the standard Prometheus scrape. In default builds and unregistered-coordinator states, the metrics-emitting recorder is bypassed entirely (the receive loops use \`attempt.NoOpRecorder\`), so the counters stay at zero. Once the ROAST-retry registry is populated and live signing flows record evidence, the counters increment -- providing the "do I have ROAST retry running?" smoke test from operator dashboards. Stacked on #3989 (AttemptContextHash enforcement). ## What lands | File | Role | |---|---| | \`roast_retry_metrics.go\` (new, untagged) | Cumulative atomic counters; \`RegisterRoastRetryMetrics(*clientinfo.Registry)\` registers Source functions under the \`frost_roast_retry\` application prefix; \`metricsEmittingRecorder\` wraps the bounded recorder and bumps the counter on each Record* call. | | \`roast_retry_recorder.go\` (modified) | \`roastRetryRecorderForCollect\` now wraps the bounded recorder with \`newMetricsEmittingRecorder\` when the registry is populated. | ## Metrics exposed Via \`clientinfo.Registry.ObserveApplicationSource\`: | Metric name | Description | |---|---| | \`frost_roast_retry_overflow_events_total\` | Cumulative count of receive-channel overflow events | | \`frost_roast_retry_reject_events_total\` | Cumulative count of validation-gate rejections (incl. \`attempt_context_hash_mismatch\` from #3989) | | \`frost_roast_retry_conflict_events_total\` | Cumulative count of first-write-wins equivocation events | ## Test coverage (6 cases) - Counters increment on `Record*` (different per-category counts) - Snapshot delegates to inner recorder - Nil inner falls back to NoOp without panicking - Unregistered coordinator → NoOp recorder → no counter bumps - Concurrent counter increments are race-safe (16 workers × 100 calls) - `RegisterRoastRetryMetrics(nil)` is a no-op (defensive guard) ## Operator wiring The keep-core node's startup sequence should call: \`\`\`go signing.RegisterRoastRetryMetrics(clientinfoRegistry) \`\`\` alongside the existing registry observation calls. A follow-up to \`docs/development/frost-roast-retry-rollout.adoc\` will document this step. ## Verification | Command | Result | |---|---| | \`go build ./...\` | clean | | \`go test ./pkg/frost/...\` | pass (5 packages) | | \`go test -race ./pkg/frost/signing/...\` | pass | | \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./pkg/frost/...\` | pass | | \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent | | \`go vet ./pkg/frost/...\` | clean | | \`gofmt -l ./pkg/frost/signing/\` | silent | ## Test plan - [ ] CI green. - [ ] Reviewer confirms the process-wide cumulative counter shape (alternative: per-session gauges, more granular but harder to query at a glance). - [ ] Reviewer confirms the \`frost_roast_retry\` application prefix is acceptable (alternative: more specific prefix like \`frost_roast_retry_evidence\`).

Adds a top-of-file design-rationale block to roast_retry_orchestration.go that captures the load-bearing decision (from RFC-21 Phase 6 review) about which orchestration errors are fallback-eligible and which must hard-fail. The decision had been distributed across commit messages, the RFC text, and inline comments on individual sentinel definitions. The block centralises it next to the code that enforces it, so future maintainers can find the rationale without having to reconstruct it from spelunking history. Key statements captured: STATIC errors -> safe to fall back to the legacy retry path. Every honest signer observes the same node-local config at startup so fallback decisions are deterministic across the group. Sentinel: ErrNoRoastRetryCoordinatorRegistered, detected via errors.Is in signing_loop_roast_dispatcher.go. RUNTIME errors -> HARD FAIL. Per-attempt protocol state errors can be observed by some participants and not others within the same attempt; falling back to legacy under those conditions creates split-brain (some operators running new code, others running legacy on the same attempt). The orchestration layer returns these as bare errors that the dispatcher treats as terminal. The block also notes the historical redirect: the earlier design had BeginAttempt failures fall back, on the assumption that BeginAttempt was cheap idempotent setup. Review identified BeginAttempt mutates per-attempt state and can fail from races with concurrent receives, which the static-error fallback can't safely handle. Documenting the "why" prevents the regression from being re-introduced by a maintainer who reads only the code. Pure documentation -- no behaviour change, no test changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the cross-repo HASH160-based wallet pubkey hash derivation fixture from the tbtc bridge repo (companion PR tlabs-xyz/tbtc#432, fixture at docs/test-vectors/wallet-pubkey-hash-derivation-vectors-v1.json). The byte-identical JSON is checked into both repos; each side's test reads its local copy and asserts its own derivation function reproduces the expected output. If frost.WalletPublicKeyHashCompat- ibilityAlias drifts from BitcoinTx.deriveWalletPubKeyHashFromXOnly on the bridge side, at least one repo's test fails. The drift this catches is the silent killer for the bridge-protocol identity contract: if keep-core derives a different 20-byte alias than the bridge for the same input, FROST wallets registered by the DKG coordinator land at addresses the bridge doesn't recognize, or vice versa. The failure mode is invisible until a wallet is actually created in production. Test cases TestFrostWalletPubKeyHashDerivationVectors Asserts frost.WalletPublicKeyHashCompatibilityAlias produces the expected 20-byte alias for every FROST vector (HASH160(0x02 || xOnlyOutputKey)). TestEcdsaCompressedPubKeyHash160Vectors Asserts HASH160 of the compressed pubkey matches the expected value for every ECDSA vector. The bridge performs this derivation implicitly during registerNewWallet (compress then hash160); this test pins the algorithm on the keep-core side using the same vectors the bridge pins on its side. TestDriftCheckMetadata Pins the drift_check.tbtc_path / drift_check.keep_core_path / drift_check.rule fields, so a future cross-repo CI sync check has stable references. TestFixtureFileShouldExistAtMirrorPath Documents the convention that the file lives at the path the fixture self-declares; a nudge for anyone moving the file. Companion PR tlabs-xyz/tbtc#432 lands the same JSON fixture and the bridge-side test against TestBitcoinTx.deriveWalletPubKeyHashFromXOnly. Both PRs ship together; landing only one provides no drift protection. Lineage Surfaced in the cross-PR review re-evaluation, originally flagged as "Cross-repo walletID derivation test fixture -- separate effort." Priority was raised when the Phase B-2 keep-core DKG coordination protocol became part of the active roadmap; that phase produces walletIDs the bridge must accept, and this fixture validates the contract before B-2's implementation goes through end-to-end testing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The derivationFixture type declaration had one extra space between each field name and its type, which gofmt's alignment rule rejects. The five fields share the same column-aligned formatting; trim the extra space per field so `gofmt -l .` returns clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…file via runtime.Caller Codex round-2 validation caught that the previous walletPubKeyHashDerivationVectorsPath constant equalled "testdata/..." (package-relative, used by os.ReadFile because go test runs with the package dir as cwd), but the fixture's drift_check.keep_core_path declares "pkg/frost/testdata/..." (repo-root-relative, the canonical location for cross-repo sync tooling). These two paths refer to the same file but are intentionally different representations. TestDriftCheckMetadata compared them directly via != and failed unconditionally; TestFixtureFileShouldExist- AtMirrorPath called filepath.Abs(fixture.DriftCheck.KeepCorePath) from the package cwd and stat'd pkg/frost/pkg/frost/testdata/... which doesn't exist. Fix - Split the constant into two named pairs that document each convention: walletPubKeyHashDerivationVectorsTestPath = "testdata/..." (package-relative, for os.ReadFile from the test's cwd) walletPubKeyHashDerivationVectorsRepoPath = "pkg/frost/testdata/..." (repo-root-relative, matches the fixture metadata) - TestDriftCheckMetadata: assert against the repo-relative constant, not the package-relative one. Now compares apples to apples. - TestFixtureFileShouldExistAtMirrorPath: resolve fixture.DriftCheck.KeepCorePath relative to the repo root, obtained by walking two directories up from this test file's location via runtime.Caller(0). This stat's the right path regardless of which cwd `go test` ran with. Local verification go test ./pkg/frost -run "PubKeyHash|DriftCheck|FixtureFile" -v ... --- PASS: TestFrostWalletPubKeyHashDerivationVectors (0.00s) --- PASS: TestEcdsaCompressedPubKeyHash160Vectors (0.00s) --- PASS: TestDriftCheckMetadata (0.00s) --- PASS: TestFixtureFileShouldExistAtMirrorPath (0.00s) PASS ok github.com/keep-network/keep-core/pkg/frost 0.225s gofmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#3993) ## Why The RFC-21 Phase 6 review decided which orchestration errors are fallback-eligible (static config errors → safe to fall back to legacy retry path) and which must hard-fail (runtime per-attempt errors → no fallback, since per-participant divergence creates split-brain group fracture). The rationale lived in commit messages, the RFC text, and inline comments on individual sentinels — distributed enough that a future maintainer reading just \`roast_retry_orchestration.go\` could miss the load-bearing constraint. This PR adds a top-of-file design-rationale block that centralises the decision in the place that enforces it. ## What changed - One file changed: \`pkg/frost/signing/roast_retry_orchestration.go\` - Pure documentation: no behavior change, no test changes, no API change - 49 lines added (one comment block) ## What it captures 1. **STATIC vs RUNTIME classification** — explicit definitions, with the sentinel (\`ErrNoRoastRetryCoordinatorRegistered\`) and detection mechanism (\`errors.Is\` in \`signing_loop_roast_dispatcher.go\`) named. 2. **Why static-error fallback is safe** — every honest signer observes the same node-local config at startup, so the fallback decision is deterministic across the group. 3. **Why runtime-error fallback is unsafe** — per-attempt protocol state errors can be observed by some participants and not others within the same attempt; fallback would put some operators on new code and others on legacy for the same attempt. 4. **Enforcement rule** — any error surfaced from this package that is intended to permit fallback MUST be the sentinel; wrapping ANY runtime error in the sentinel is a safety regression that PR reviewers should reject. 5. **Historical redirect** — the earlier design had \`BeginAttempt\` failures fall back, on the assumption that BeginAttempt was cheap idempotent setup. Review identified that BeginAttempt mutates per-attempt state and can fail from races with concurrent receives; the taxonomy was tightened so only true configuration errors are fallback-eligible. ## Lineage Surfaced in the cross-PR review re-evaluation following PR #3866 follow-up landings. Originally tracked as "Document static-vs-runtime classification canonically" — initially flagged as "available if you want," now elevated because the rationale was the most important architectural decision in the RFC-21 stack and is currently the easiest piece of design context to lose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Why The bridge contracts (tlabs-xyz/tbtc) and the keep-core FROST protocol (this repo) both derive the 20-byte walletPubKeyHash from cryptographic inputs: - **Bridge** (\`BitcoinTx.deriveWalletPubKeyHashFromXOnly\`) computes \`bytes20(HASH160(0x02 || xOnlyKey))\` for FROST wallets, and HASH160 of the compressed ECDSA pubkey for legacy wallets. - **keep-core** (\`frost.WalletPublicKeyHashCompatibilityAlias\` in \`pkg/frost/types.go\`) computes the same HASH160(0x02 || outputKey). Drift between these two derivations silently breaks the bridge-protocol identity contract for any wallet whose canonical identity is established cross-repo. Today the two agree (both invoke HASH160 the same way) — but there's no enforcement, and a future refactor on either side could introduce silent drift that only surfaces when production traffic doesn't match. The Phase B-2 work in this repo (the FROST DKG coordination protocol) will produce walletIDs that the bridge must accept. Pre-staging the fixture now lets B-2's implementation be validated as it's built. ## What Mirrors the cross-repo fixture and adds a Go test that consumes it. The fixture lives byte-identically in both repos: - tbtc: \`docs/test-vectors/wallet-pubkey-hash-derivation-vectors-v1.json\` (companion PR [tlabs-xyz/tbtc#432](tlabs-xyz/tbtc#432)) - keep-core: \`pkg/frost/testdata/wallet-pubkey-hash-derivation-vectors-v1.json\` (this PR) Each repo's test reads its local copy and asserts its own derivation function reproduces the expected output. If either side drifts, at least one repo's test fails. ## Test cases - **\`TestFrostWalletPubKeyHashDerivationVectors\`** — asserts \`frost.WalletPublicKeyHashCompatibilityAlias\` produces the expected 20-byte alias for every FROST vector (HASH160(0x02 || xOnlyOutputKey)). - **\`TestEcdsaCompressedPubKeyHash160Vectors\`** — asserts HASH160 of the compressed pubkey matches expected for every ECDSA vector. The bridge performs this implicitly during \`registerNewWallet\` (compress then HASH160); this test pins the algorithm on the keep-core side. - **\`TestDriftCheckMetadata\`** — pins the \`drift_check.tbtc_path\` / \`drift_check.keep_core_path\` / \`drift_check.rule\` fields so a future cross-repo CI sync check has stable references. - **\`TestFixtureFileShouldExistAtMirrorPath\`** — documents the convention that the file lives at the path the fixture self-declares. ## Vectors **ECDSA legacy** (HASH160 of compressed pubkey): - secp256k1 generator point compressed (well-known Bitcoin vector — produces 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2) - Near-zero scalar pubkey - The tBTC ECDSA test fixture's pubkey (cross-validates against the bridge-side \`ecdsaWalletTestData.pubKeyHash160\` constant) **FROST P2TR** (HASH160(0x02 || xOnlyOutputKey)): - Representative key with non-zero high 12 bytes (matches the native-shape constraint on the FROST registration entry point from tlabs-xyz/tbtc#431) - All-ones x-only key (regression) - All-max x-only key (boundary) ## Why "fixture lives in both repos" instead of a shared submodule Considered but rejected: - **Git submodule** — adds tooling complexity for a 3 KB JSON file - **Single-source-of-truth repo** — requires bootstrapping a third repo or vendoring; both options add coordination friction - **Dual-checked file** — simplest; the \`drift_check\` metadata captures the rule, and a future small CI job can hash-compare the two files The dual-checked approach trades a slightly weaker guarantee (file drift can happen between repos if no one notices) for a much smaller operational footprint. The test-level drift (different derivation algorithms producing different outputs from the same input) is the load-bearing failure mode and is fully covered. ## Companion PR [tlabs-xyz/tbtc#432](tlabs-xyz/tbtc#432) lands the same JSON fixture and a TypeScript/Hardhat test against \`TestBitcoinTx.deriveWalletPubKeyHashFromXOnly\` and the off-chain HASH160 path. Both PRs are intended to land together — landing only one provides no drift protection. ## Lineage Surfaced in the cross-PR review re-evaluation, originally flagged as "Cross-repo walletID derivation test fixture — separate effort." Priority was raised when Phase B-2 (this repo's FROST DKG coordinator) became part of the active roadmap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary - add FROST WalletRegistry and FrostDkgValidator bindings plus config and chain attachment - implement v4 FROST DKG result digest assembly with full vs active member types and fixture-backed parity tests - add the native FROST DKG engine boundary, P2P round protocol, result signing, coordinator lifecycle, challenge monitoring, and wallet ID handling for x-only output keys ## Notes - Stacked on #3866 / `feat/frost-schnorr-migration-scaffold`. - Runtime DKG still requires the concrete native DKG engine registration from the frost-uniffi-sdk UDL/Rust export work. - The digest fixture now records the tBTC TypeScript generator source and regeneration command. A paired tBTC PR should still commit the mirror fixture at `docs/test-vectors/frost-dkg-result-digest-v1.json` and add the TS-side emitter/test; until then, the keep-core test verifies the pinned bytes and metadata but does not compare against a checked-in tBTC mirror file. ## Validation - `go test ./pkg/frost/registry ./pkg/chain/ethereum ./pkg/chain/ethereum/frost/gen/...` - `go test ./pkg/tbtc -run "TestFrostDKGSignatureThreshold|TestBoundedFrostDKGRecoveryStartBlock|TestFrostDKGRecoveryLookBackBlocks" -count=1` - `go test -tags "frost_native frost_tbtc_signer" ./pkg/tbtc -run "TestLowestLocalActiveMemberIndex|TestFrostMisbehavedMemberIndices|TestFrostDKGSignatureThreshold|TestBoundedFrostDKGRecoveryStartBlock|TestFrostDKGRecoveryLookBackBlocks" -count=1` - CI `client-build-test-publish` passes on the prior pushed commit; rerunning for the latest follow-up commit after push. ## Local Note - Full local `go test ./pkg/tbtc` currently fails in standalone `TestWatchCoordinationWindows`; this reproduces when run by itself and appears unrelated to the FROST DKG coordinator changes.

coderabbitai · 2026-05-27T18:05:53Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8f85e4f1-f18d-40a6-bb16-c0a862ab49dc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/frost-schnorr-migration-scaffold

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

maclane added 3 commits February 19, 2026 14:47

Add FROST migration scaffold package and RFC

a1b3e12

Set Schnorr/FROST RFC author to Threshold Labs

2d0a5c8

Wire tbtc runtime signing flow to frost signature types

fc58fed

mswilkison changed the title ~~Draft: Add Schnorr/FROST migration scaffold package and RFC~~ Draft: Add Schnorr/FROST scaffold and tBTC runtime signing adapter slice Feb 20, 2026

maclane added 26 commits February 20, 2026 09:30

Add FROST retry and coordinator attempt metadata path

2702c7a

Fix triplet retry eligibility to use third operator seat count

3fc7c9f

Add pluggable FROST signing backend execution seam

b57775a

Wire tbtc config to selectable FROST signing backend

952946f

Add native FROST signing backend scaffold

0df807c

Expose tbtc FROST signing backend CLI flag

7817a13

Add build-tagged native FROST adapter bootstrap

fc4c750

Test FROST backend selection in node startup

f85b3d5

Execute native-tag FROST adapter via legacy bridge

01ea9f4

Harden backend-state test guidance after interim review

9734656

Add native bridge scaffold with fallback routing

f57aa09

Split native and ffi backend fallback semantics

7efb2f9

Fail fast for strict ffi mode availability

e42bf43

Add dynamic native bridge registration for ffi

606e73d

Register build-tagged bridge for strict ffi path

d007635

frost/signing: add ffi executor registration path

8e23f5e

frost/signing: restore mode on failed backend selection

ed642b2

frost/signing: decouple request signer material from tecdsa share

74e894c

frost/tbtc: thread generic signer material through execute request

656f62f

frost/signing: add native signer material and ffi adapter contract

83bf3af

frost/signing: include transport context in ffi adapter request

3083e15

tbtc: persist native signer material envelope in signer state

31026eb

frost/native: fallback when ffi signer material is unavailable

a1525a0

tbtc: add signer-material resolver hook for dkg signer creation

eeaad8f

frost/native: add provider-based build registration hooks

4069ffe

frost/native: wire default transitional provider integrations

c3e8b02

mswilkison added 2 commits May 22, 2026 22:40

mswilkison mentioned this pull request May 23, 2026

feat(frost/roast): close M4 -- reject + conflict evidence categories #3988

Merged

3 tasks

mswilkison and others added 6 commits May 22, 2026 23:02

mswilkison mentioned this pull request May 23, 2026

docs(frost/signing): canonicalize the static-vs-runtime error taxonomy #3993

Merged

mswilkison and others added 6 commits May 23, 2026 12:08

feat(frost): implement DKG coordinator

71b524d

mswilkison mentioned this pull request May 25, 2026

Implement FROST DKG coordinator #4003

Merged

mswilkison added 7 commits May 25, 2026 07:13

fix(frost): include generated bindings in docker context

4bb1e5d

fix(frost): address DKG coordinator review feedback

71ce18a

fix(frost): harden DKG recovery and submitter election

916f91e

fix(frost): harden DKG event recovery and challenges

0e94716

docs(frost): note DKG digest wire-format coupling

dce62d7

Merge branch 'main' into feat/frost-schnorr-migration-scaffold

0a7c4be

mswilkison mentioned this pull request May 26, 2026

feat(tbtc/signer): mirror FROST/ROAST Rust signer from tBTC monorepo #4005

Open

fix(frost): share tbtc signer error helpers

2d9dc78

mswilkison added 3 commits May 28, 2026 11:41

fix(frost): harden native dkg validation

b778d25

fix(frost): add signer format telemetry

0555658

fix(frost): attach bound attempt context hashes

00ab1b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FROST/ROAST readiness branch#3866

FROST/ROAST readiness branch#3866
mswilkison wants to merge 195 commits into
mainfrom
feat/frost-schnorr-migration-scaffold

mswilkison commented Feb 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mswilkison commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current State (as of 2026-05-17)

Canonical Status Sources

Latest Refresh

Remaining Cross-Repo Closure Items

Notes

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mswilkison commented Feb 19, 2026 •

edited

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading