Skip to content

pos: shrink NodeConfig and secure-storage to what's actually used#3458

Merged
peilun-conflux merged 6 commits intoConflux-Chain:masterfrom
peilun-conflux:cleanup/pos-shrink-nodeconfig-and-storage
Apr 27, 2026
Merged

pos: shrink NodeConfig and secure-storage to what's actually used#3458
peilun-conflux merged 6 commits intoConflux-Chain:masterfrom
peilun-conflux:cleanup/pos-shrink-nodeconfig-and-storage

Conversation

@peilun-conflux
Copy link
Copy Markdown
Contributor

@peilun-conflux peilun-conflux commented Apr 24, 2026

Some NodeConfig fields were silently overwritten

ConsensusConfig.chain_id and four SafetyRulesConfig fields (test, export_consensus_key, vrf_private_key, vrf_proposal_threshold) were read from YAML and immediately replaced with values from the main Conflux TOML / network at startup. Operators could set them and get them silently ignored — a footgun. Those values now thread through as runtime arguments via two new structs (PosNodeKeys for per-node identity/secrets, PosChainParams for chain-wide consensus params), matching the Diem separation of ValidatorSigner vs ValidatorVerifier / EpochState.

Secure-storage collapses to one backend

SecureBackend was a multi-variant enum inherited from Diem. Vault was deleted in 603c1d1d3, NamespacedStorage is unused, and InMemoryStorage existed only so tests could skip file I/O — but PersistentSafetyStorage::cached_safety_data already owns the one hot read (SAFETY_DATA per vote), and the test speedup is immaterial. OnDiskStorage becomes the sole backend; Storage enum, enum_dispatch dep, the _ => Err("unsupported") branches in replace_with_suffix / save_to_suffix, and the NamespacedStorage module all go away.

⚠️ Breaking config change

NodeConfig uses deny_unknown_fields, so any pos_config.yaml setting one of the removed fields will now fail to load:

  • top-level: metrics, test
  • consensus.chain_id
  • safety_rules.{logger, test, export_consensus_key, vrf_private_key, vrf_proposal_threshold}

Audited run/pos_config/pos_config.yaml on both the mainnet and testnet branches — neither sets any of these fields.


This change is Reviewable

peilun-conflux and others added 5 commits April 24, 2026 22:11
…ogger, LeaderReputationConfig)

Four never-read pieces of the Diem-era NodeConfig tree:

- `MetricsConfig` + `NodeConfig.metrics`: unused since the diem_metrics
  observability layer was removed in Conflux-Chain#3451; the only remaining reference
  was a `set_data_dir` propagation that wrote to a field no code reads.

- Outer `TestConfig` + `NodeConfig.test`: the struct has a single
  `#[serde(skip)]` field and is never accessed by the conflux binary.
  (Not to be confused with `SafetyRulesTestConfig`, which is still used.)

- `SafetyRulesConfig.logger`: safety-rules never reads it; the top-level
  `NodeConfig.logger` is the only logger actually configured.

- `LeaderReputationConfig`: struct defined in consensus_config.rs but
  never referenced from any code path.

Note: `NodeConfig` uses `deny_unknown_fields`, so any operator YAML that
happens to contain `metrics:` or `test:` at the top level will now fail
to parse. The mainnet PoS template (hard_fork_tool.py) and both Python
test-framework YAML writers (tests/test_framework/util.py and
integration_tests/test_framework/util/__init__.py) do not emit these
keys, so no known config is affected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… SecureBackend

Two changes that reduce the "operator thinks they can configure this" surface
of the PoS YAML:

1. Lift always-overwritten NodeConfig fields to constructor args.

   Five fields were populated from the main TOML / network at startup and
   immediately written over whatever the YAML said, making them
   footguns — operators could set them and silently have them ignored:

     - SafetyRulesConfig.test
     - SafetyRulesConfig.export_consensus_key  (always `true` in production)
     - SafetyRulesConfig.vrf_private_key
     - SafetyRulesConfig.vrf_proposal_threshold
     - ConsensusConfig.chain_id

   Replaced with a new PosRuntimeKeys struct threaded through
   start_pos_consensus -> setup_pos_environment -> start_consensus ->
   EpochManager::new. `create_safety_rules` gains explicit author /
   consensus_private_key / vrf_private_key / export_consensus_key args.

   Drops the SafetyRulesTestConfig struct (its last user was the lifted
   `test` field; `execution_key` was never read).

2. Collapse SecureBackend + drop NamespacedStorage.

   SecureBackend was a two-variant enum but the InMemoryStorage variant
   was never constructed from config (tests use InMemoryStorage::new()
   directly). Replace SafetyRulesConfig.backend with OnDiskStorageConfig
   and delete the enum.

   NamespacedStorage (the Storage-wrapper implementing on-disk namespacing
   for multi-tenant secure_storage.json) was only reachable via
   OnDiskStorageConfig.namespace, which defaulted to None and was never set
   by any template. Delete the module + the Storage::NamespacedStorage
   variant + the .namespace field.

   Storage still has an InMemoryStorage variant because test code
   instantiates Storage::from(InMemoryStorage::new()); that path is kept.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PosRuntimeKeys bundled per-node identity/secrets (author + BLS +
VRF private keys) with chain-wide consensus parameters (chain_id +
vrf_proposal_threshold). These have distinct lifecycles and different
security properties, and the codebase already separates them
elsewhere — ValidatorSigner / ValidatorVerifier / EpochState are the
existing Diem idiom for this split.

Split into two types threaded through
`start_pos_consensus → setup_pos_environment → start_consensus →
EpochManager::new`:

  PosNodeKeys      — author, consensus_private_key, vrf_private_key
  PosChainParams   — chain_id, vrf_proposal_threshold

EpochManager's internal field layout is left flat; the typed boundary
at function signatures is what carries the semantic distinction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… backend

Before: `Storage` was an enum over `InMemoryStorage` and `OnDiskStorage`,
dispatched via `#[enum_dispatch]`. `InMemoryStorage` existed purely so
tests and benches could skip filesystem I/O. But the perf delta is
immaterial — unit tests do a handful of ops (ms), and the
`benches/safety_rules.rs` in-memory baseline isn't exercised in CI
(`cargo bench --no-run` only). Keeping a second backend to serve use
cases that don't care about the saving was scaffolding.

Delete the enum, the in-memory backend, the `NamespacedStorage` wrapper
(already unused after the SecureBackend collapse earlier in this PR),
the `#[enum_dispatch]` attributes on the `KVStorage`/`CryptoStorage`
traits, and the `enum_dispatch` crate dep. `OnDiskStorage` keeps its
pre-refactor shape: a JSON file on disk that is the source of truth,
read on every `get` and rewritten atomically on every `set`. The hot
per-vote read (`SAFETY_DATA`) is cached one layer up by
`PersistentSafetyStorage::cached_safety_data`, which is unchanged.

Tests and benches switch to `OnDiskStorage::new(TempPath / NamedTempFile)`.
The `benches/safety_rules.rs::InMemory` baseline is removed — with only
one backend, a separate in-memory baseline has nothing to compare
against. `PersistentSafetyStorage::{replace_with_suffix, save_to_suffix}`
lose their match-on-`Storage::OnDiskStorage` boilerplate and the
runtime `"unsupported secure storage type"` error branch that went
with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0c4d749 lifted the consensus / VRF private keys out of NodeConfig and
stored them bare on EpochManager. But safety_rules, ProposalGenerator,
VrfProposer, RoundManager, and the force_* handlers all take owned keys,
so the stored form still needs to be cloneable. BLSPrivateKey's Clone
impl is behind the diem-crypto `cloneable-private-keys` feature, only
activated transitively via pos-types's fuzzing dev-dependency. CI's
`cargo clippy --release --all` (no --all-targets) doesn't activate
dev-deps, so the feature is off and cfxcore fails with five
"no method named `clone` found for struct `BLSPrivateKey`" errors in
epoch_manager.rs.

Wrap the stored keys in ConfigKey<K> like they were before. ConfigKey's
Clone and private_key() round-trip via BCS serde on the inner key, so
they don't require K: Clone — the same trick NodeConfig used. Owned-key
consumers switch from `.clone()` to `.private_key()`; the RoundManager
pass-through becomes a bare `.clone()` on the ConfigKey.

Same per-call cost as before the refactor (identical BCS round-trip),
and the avoided alternative — turning on cloneable-private-keys in
release builds — would silently hand a freely-cloneable private key to
anyone in cfxcore, which is exactly what ConfigKey was designed to
prevent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@peilun-conflux peilun-conflux force-pushed the cleanup/pos-shrink-nodeconfig-and-storage branch from 122779a to 27c3ee6 Compare April 24, 2026 18:24
ef96de3 removed enum_dispatch from safety-rules and added diem-temppath;
the main workspace Cargo.lock was updated in that commit but the tool
sub-workspace Cargo.lock files weren't, causing CI's --locked builds in
tools/consensus_bench and tools/evm-spec-tester to fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Pana Pana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pana reviewed 31 files and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on ChenxingLi).

@peilun-conflux peilun-conflux merged commit 92db0e2 into Conflux-Chain:master Apr 27, 2026
11 checks passed
@peilun-conflux peilun-conflux deleted the cleanup/pos-shrink-nodeconfig-and-storage branch April 27, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants