pos consensus: remove dormant Recovery state machine#3457
Open
peilun-conflux wants to merge 1 commit intoConflux-Chain:masterfrom
Open
pos consensus: remove dormant Recovery state machine#3457peilun-conflux wants to merge 1 commit intoConflux-Chain:masterfrom
peilun-conflux wants to merge 1 commit intoConflux-Chain:masterfrom
Conversation
`RoundProcessor::Recovery` and its machinery exist to reconstruct consensus state when `RecoveryData::new` fails to locate a root block matching the latest committed ledger info. That scenario has no remaining trigger: - Fresh-genesis startup already goes to Normal: `maybe_bootstrap` writes a genesis `LedgerInfo` with `next_epoch_state = Some(...)`, so `find_root` takes its `ends_epoch()` branch and synthesises a virtual genesis block + QC from the ledger info alone. - Normal running-node restarts also go to Normal: the consensus write path does `save_tree(block+qc) -> execute -> commit(ledger) -> prune`, with `prune_tree` preserving the root. On any clean crash/restart the root block is in ConsensusDB, so `find_root` succeeds directly. - The residual "state-sync ran ahead of consensus" case from upstream Diem was neutralised when the state-sync coordinator was deleted in 397cf34 — `state_computer.sync_to` became a no-op, so `fast_forward_sync` can no longer catch a ledger up anyway. What the Recovery path would actually handle today is on-disk corruption where `ConsensusDB` is missing the block that `PosLedgerDB`'s latest ledger info points at. That combination isn't reachable through normal operation, and the recovery attempt (now `sync_to`-neutered) couldn't repair the ledger even if we took it. A panic on that condition is strictly more diagnostic than silently stalling in Recovery. Collapses: - `RoundProcessor` enum -> `Option<RoundManager>`. The dispatch match in `process_event`, `filter_unverified_event`, `processor_mut`, and every `TestCommand` handler goes away. - `LivenessStorageData` enum and `LedgerRecoveryData` struct merge into `RecoveryData`; `PersistentLivenessStorage::start` now returns `RecoveryData` directly (panics on the corruption case above). `find_root`'s virtual-genesis synthesis moves from `LedgerRecoveryData` into a free function taking `&LedgerInfo`. - `RecoveryManager`, `start_recovery_manager`, `expect_recovery_data`, and `BlockStore::fast_forward_sync` are all deleted. `StateComputer::sync_to` and `StateSyncError` likewise — they had no remaining callers once `fast_forward_sync` was gone. - `MockStorage::start` / `EmptyStorage::start` now return `RecoveryData` directly and panic on inconsistent setups (previously returned a `LedgerRecoveryData` variant that tests then had to unwrap). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2fe7163 to
3e16f64
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RoundProcessor::Recovery+RecoveryManager+LedgerRecoveryData+BlockStore::fast_forward_syncexist to reconstruct consensus state whenRecoveryData::newcan't locate a root block matching the latest committed ledger info. That scenario has no remaining trigger:maybe_bootstrapwrites a genesisLedgerInfowithnext_epoch_state = Some(...), sofind_roottakes itsends_epoch()branch and synthesises a virtual genesis block + QC from the ledger info alone.save_tree(block+qc) → execute → commit(ledger) → prune, andprune_treepreserves the root. On any clean crash/restart the root is in ConsensusDB andfind_rootresolves it directly.397cf34c).state_computer.sync_tobecame a no-op, sofast_forward_synccan only write blocks to ConsensusDB — strictly weaker than Normal's sync path, which executes and commits.What Recovery would technically still fire on is on-disk corruption where ConsensusDB is missing the block PosLedgerDB's latest ledger info points at. That combination isn't reachable through normal operation, and the
sync_to-neutered recovery attempt couldn't repair the ledger even if we took it. A panic on that condition is strictly more diagnostic than silently stalling in Recovery.Follow-on removals fall out once Recovery is gone:
StateComputer::sync_to+StateSyncError+LogEvent::StateSynchad no remaining callers.This change is