Skip to content

fix(devnet5): don't register block-attestation data at import (broke Type-2 merge)#395

Draft
MegaRedHand wants to merge 4 commits into
devnet5from
fix/store-placeholders-instead-of-empty-proofs
Draft

fix(devnet5): don't register block-attestation data at import (broke Type-2 merge)#395
MegaRedHand wants to merge 4 commits into
devnet5from
fix/store-placeholders-instead-of-empty-proofs

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

@MegaRedHand MegaRedHand commented May 27, 2026

Summary

Fixes the devnet error Failed to merge Type-1s into Type-2 ... child proof deserialization failed at index N.

Root cause: block import seeded the known payload pool with TypeOneMultiSignature::empty(bits) placeholders (participant bits, empty proof bytes) so block-borne votes carried fork-choice weight. Block building read the same pool and fed those empty bytes into the Type-2 merge, where lean-multisig decompression failed.

Fix: stop registering block-attestation data in the pool at import altogether.

A first pass mirrored leanSpec's on_block — registering each attestation's data key with an empty proof set instead of a proof-less placeholder. That removed the crash, but the empty entries turned out to be inert in our weight model and introduced an unbounded-growth path (see below), so the final version drops the registration entirely.

Why removing it is safe

ethlambda derives fork-choice weight from proof participant bits, not from data-key presence:

  • extract_latest_attestations iterates entry.proofs; an empty proof set yields zero voters.
  • the block builder scores candidates by voters-from-proofs, so an empty entry is never selected and never reaches merge_type_1s_into_type_2.

So a registered-but-empty entry contributed zero weight and zero merge input. Its only observable effect was reserving an insertion-order slot for same-slot equivocation tie-breaking, which reaggregation re-establishes in block-attestation order anyway.

Block-borne vote weight is still recovered: reaggregation SNARK-splits the block's merged Type-2 into per-attestation Type-1s (split_type_2_by_message) into the new pool, and gossip delivers them; both migrate to the known pool at the next acceptance tick. This matches the spec's documented one-slot deferral.

Bonus: closes an unbounded-growth path

The empty-proof entries bypassed the payload buffer's only capacity limit. PayloadBuffer evicts when total_proofs > capacity, but the bare-key insertion never incremented total_proofs, so a non-finalizing chain could grow known_payloads past its bound. Removing the registration eliminates this by construction (flagged by all four PR reviewers).

Changes

  • crates/blockchain/src/store.rs: on_block_core no longer registers block-attestation data keys (keeps the per-validator attestation-valid metric).
  • crates/storage/src/store.rs: delete the now-dead PayloadBuffer::register_data and Store::register_known_aggregated_data_batch.
  • crates/blockchain/tests/forkchoice_spectests.rs: the harness deconstructs each imported block into per-attestation Type-1s (structurally, from aggregation_bits) into the known pool and recomputes the head — reproducing the proposer-view store the fixtures encode, the role leanSpec's harness fills via its proposer-build simulation. These entries go through push, so they are properly accounted and bounded.

Notes / follow-ups

  • The spec routes block-reaggregated votes to the new pool (deferred), which a leanSpec reviewer flagged as leaving receiving nodes with zero block-vote weight (Aggregated block proof - devnet5 leanEthereum/leanSpec#717, discussion_r3259006576). The new-vs-known question remains open upstream.
  • Backfilling nodes don't run reaggregation (it is synced-gated in lib.rs), so block-borne votes during sync rely on gossip re-delivery. Accepted as an OK tradeoff for now; tracked as a follow-up.

Test plan

  • cargo test --workspace --release — full suite passes (forkchoice 84, stf 51, signature 13, storage 38, ssz 118, …; 0 failures)
  • cargo fmt --all -- --check clean
  • cargo clippy --workspace --release -- -D warnings clean

…laceholders

Block import previously inserted T1::empty(bits) placeholders (participant
bits, empty proof bytes) into the known payload pool. Block building read the
same pool and fed those empty bytes into the Type-2 merge, where lean-multisig
decompression failed ("child proof deserialization failed at index N").

Match the spec's on_block: register each block attestation's data key with an
empty proof set instead of a proof-less placeholder. No empty-byte proof ever
enters the pool, so the merge can no longer be handed one. Block-borne votes
now contribute zero fork-choice weight at import; the weight returns once
reaggregation recovers real Type-1 proofs into the pool (deferred by up to one
slot), matching the spec.

Known follow-up: ethlambda's import-only forkchoice harness expects block votes
in "known" immediately after import (the spec's harness simulates the proposer
build to populate them). 13 forkchoice spec tests now fail on this expectation;
the harness/reaggregation wiring will be addressed separately.
Option A made on_block spec-faithful: it registers each block attestation's
data key with an empty proof set, so block-borne votes contribute no
fork-choice weight at import. A real node recovers that weight by
SNARK-splitting the block's merged Type-2 proof into per-attestation Type-1s
(reaggregation); leanSpec's forkchoice harness gets the same effect by
simulating the proposer build straight into the known pool.

ethlambda's harness only imports blocks (on_block_without_verification) with
blank proofs, so it did neither and 13 head/reorg/safe-target fixtures lost
their vote source. Reconstruct structurally after import: for each block
attestation, insert a Type-1 carrying its aggregation_bits into the known
pool (fork choice reads participants, not proof bytes; the harness never
feeds these to a Type-2 merge). Recompute the head afterward, since on_block
ran its head update before these votes existed.

Expose update_head for the harness recompute. All 84 forkchoice tests pass.
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Critical Consensus Considerations

  1. Deferred Fork-Choice Weight (Safety/Liveness Implications)
    In crates/blockchain/src/store.rs (lines 498–528), block attestations are now registered with empty proofs via register_known_aggregated_data_batch, contributing zero weight to the fork-choice head computed during on_block. The comment explicitly acknowledges this defers weight by "up to one slot" until SNARK-splitting recovers Type-1 proofs.

    • Verify: Confirm this aligns precisely with leanSpec's on_block logic. Standard LMD GHOST expects block attestations to count immediately; deferring weight could theoretically lead to a "head oscillation" scenario if a competing block imports within the same slot before reaggregation completes. If the spec mandates this behavior, ensure the test harness in forkchoice_spectests.rs (lines 102–130) perfectly mirrors the proposer-view store assumptions to avoid false positives in spec tests.
  2. Test Harness Fidelity
    crates/blockchain/tests/forkchoice_spectests.rs (lines 115–130) manually reconstructs TypeOneMultiSignature::empty entries to simulate post-import reaggregation.

    • Risk: If production SNARK-splitting produces proofs with additional metadata (or non-empty signature bytes) that affect fork-choice bookkeeping (e.g., participant bitfield interpretation), the test may pass while production fails. Ensure TypeOneMultiSignature::empty is structurally identical to the output of split_type_2_by_message.

Security Observations

  1. Mutex Poisoning
    crates/storage/src/store.rs lines 1228 and 207 use lock().unwrap() on known_payloads. In a consensus client, a panic in one thread (e.g., during SSZ deserialization) could poison this lock, rendering the store unusable. Consider using parking_lot (which doesn't poison) or explicit MutexGuard error handling to prevent denial-of-service via panic propagation.

  2. Buffer Capacity Check
    PayloadBuffer::register_data (lines 215–225) pushes to self.order without checking against MAX_ATTESTATIONS_DATA or any buffer limit. While on_block caps attestations at 8, repeated edge cases (e.g., duplicate data from different blocks) could grow order unbounded if pruning logic elsewhere fails. Add a defensive capacity check or assert that self.order.len() < limit before push.

Code Quality and Correctness

  1. Idiomatic Rust Patterns
    • Good: Use of Vec::with_capacity in on_block_core (line 512) and into_parts() (line 216) to avoid unnecessary clones.
    • Naming: The variable known_data in on_block_core (line 512) contains HashedAttestationData, not full payloads. Consider renaming to data_keys or attestation_data to

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: proofless known_payloads entries bypass the buffer’s only capacity limit. In crates/storage/src/store.rs, register_data() inserts a new entry without incrementing total_proofs and without any eviction path; crates/blockchain/src/store.rs now calls that for every block attestation. Since pruning only happens once an attestation target is finalized, a peer can keep feeding fresh unfinalized attestation-data roots and grow known_payloads unbounded in RAM. For consensus code, that is a straightforward memory-DoS surface. Count data-only entries toward capacity or add a separate entry-count cap/eviction path.

  2. High: syncing nodes can now permanently miss block-borne fork-choice votes. crates/blockchain/src/store.rs switched block import from inserting participant-bearing pseudo-Type-1s to registering data only, then immediately recomputes head at line 526. The only recovery path is post-import reaggregation, but that is gated on synced in crates/blockchain/src/lib.rs, where synced means head_slot >= current_slot at line 496. So any block imported while backfilling never gets its attestation participants into new/known later, and its votes are absent from LMD-GHOST unless they reappear via gossip. I’d keep gossip publication gated on sync status, but still run the store update for all imported blocks.

I didn’t run Rust tests here because the environment blocks rustup from creating temp files (/home/runner/.rustup/tmp is read-only). Outside those two points, the test-only adjustments look coherent.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough context to write a thorough review.


PR Review: fix: drop proof-less Type-1 placeholders that broke block-proof merge

Overview

This PR fixes a crash where TypeOneMultiSignature::empty(bits) entries (participant bits, empty proof bytes) inserted at block import were fed into the Type-2 merge, causing deserialization failures. The fix correctly mirrors the spec: register only the attestation data key with an empty proof set, so the merge never sees zero-byte proof material. The test harness is updated separately to reconstruct structural Type-1s for fork-choice weight, matching the "proposer-view" store that fixtures encode.

The root-cause diagnosis is sound and the fix is logically correct.


Issues

1. register_data bypasses capacity enforcement (crates/storage/src/store.rs:215)

PayloadBuffer evicts entries only when total_proofs > capacity. register_data inserts into data and order without touching total_proofs, so the eviction loop in push() never fires for bare-key entries. With AGGREGATED_PAYLOAD_CAP = 512 and up to MAX_ATTESTATIONS_DATA = 8 entries per block, ~64 blocks without finalization would exceed the intended bound silently.

Bare keys do get pruned by prune() on finalization, but in a long non-finalizing chain the buffer grows unboundedly. The fix could be to count bare keys against capacity (e.g. a separate data_count field, or using total_proofs with a virtual increment of 1), or document explicitly that capacity applies only to proof slots, not data keys.

2. No unit tests for register_data / register_known_aggregated_data_batch

The new PayloadBuffer::register_data function has several non-obvious invariants (no-op when key present, no eviction trigger, order slot reserved, subsequent push for same key works correctly) and none are covered by tests. The existing test suite at line 1756+ is thorough for push and push_batch — comparable coverage for the new method would prevent regressions, especially around the capacity bypass noted above.

3. Double to_block() call in the test harness (crates/blockchain/tests/forkchoice_spectests.rs:89,117)

When block_data.block_root_label is set, to_block() is called at line 89 (for label registration) and again at line 117 (for the attestation deconstruction). If the deserialization is non-trivial, the Block from line 89 could be reused inside the if import_ok block. Minor, but easy to tighten.

4. update_head made pub instead of pub(crate) (crates/blockchain/src/store.rs:40)

Integration tests in tests/ compile as a separate crate, so pub(crate) cannot reach them — pub is technically required here. That said, making an internal bookkeeping function part of the public API of ethlambda-blockchain is worth noting. A #[doc(hidden)] attribute would at least signal it's not intended for external consumers:

#[doc(hidden)]
pub fn update_head(store: &mut Store, log_tree: bool) { ... }

5. Removed MAX_ATTESTATIONS_DATA cross-reference comment (crates/common/types/src/block.rs:126)

The deleted comment (See: leanSpec PR #717, which lowered the cap from 16 to 8...) provided useful historical context for future readers debugging cap mismatches with other clients. Worth keeping or moving into the git commit message, not silently dropping.


Minor observations

  • The test harness correctly uses insert_known_aggregated_payloads_batch with TypeOneMultiSignature::empty(bits) entries (not register_known_aggregated_data_batch). The empty-proof TypeOneMultiSignature carries the participant bitfield that extract_latest_attestations needs to assign fork-choice weight. This intentional divergence from production is well-commented.
  • register_data's no-op guard (if self.data.contains_key) correctly prevents a later bare key from overwriting a real-proof entry — the ordering is safe.
  • The drain() function silently drops empty-proof entries (iterates entry.proofs which is empty). This is benign since register_data only targets known_payloads, but it means an empty-key entry in new_payloads (if ever created by future callers) would vanish on promote_new_aggregated_payloads. Worth a comment in drain() if the invariant is intentional.

Summary

The core fix is correct and the spec alignment is well-justified. The most actionable items before merge are:

  • Item 1 (capacity bypass) is a latent correctness issue that could surface in stress scenarios or non-finalizing chains.
  • Item 2 (missing tests) is a coverage gap for new code with subtle semantics.
  • Items 3–5 are polish.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@MegaRedHand MegaRedHand changed the title fix: drop proof-less Type-1 placeholders that broke block-proof merge fix(devnet5): drop proof-less Type-1 placeholders that broke block-proof merge May 27, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR fixes a devnet crash where block import seeded the known payload pool with proof-less TypeOneMultiSignature::empty placeholders, which then broke the Type-2 merge when lean-multisig tried to decompress the empty proof bytes. The fix mirrors the spec's on_block by registering each block attestation's data key with an empty proof set instead, so no empty-byte proof can ever reach the merge path.

  • crates/blockchain/src/store.rs / crates/storage/src/store.rs: on_block_core now calls register_known_aggregated_data_batch / PayloadBuffer::register_data to record bare data keys (no proofs), matching the spec; block-borne vote weight is intentionally deferred until reaggregation delivers real Type-1 proofs; update_head is made pub for the test harness.
  • crates/blockchain/tests/forkchoice_spectests.rs: After each successful block import, the harness reconstructs per-attestation TypeOneMultiSignature::empty entries from aggregation_bits and inserts them directly into the known pool, then recomputes the head — reproducing the proposer-view store the fixtures encode and restoring 13 previously broken head/reorg/safe-target fixtures.

Confidence Score: 4/5

The production fix is correct and spec-faithful; the test harness reconstruction is a reasonable workaround for blank-proof fixtures. One edge case in the new register_data path deserves a follow-up.

The fix correctly eliminates empty-byte proofs from the merge path and all 84 fork-choice spectests pass. The one issue worth a follow-up is that bare-key entries registered via register_data don't increment total_proofs, so they're invisible to the buffer's FIFO eviction trigger. In a degraded scenario with no reaggregation, the known_payloads data map can grow past the intended 512-entry bound. This doesn't affect correctness of fork choice, but it's a departure from the previous eviction behaviour that's worth tracking.

crates/storage/src/store.rs — specifically the interaction between register_data and the eviction loop in push.

Important Files Changed

Filename Overview
crates/storage/src/store.rs Adds register_data (no-proof bare-key insertion) and register_known_aggregated_data_batch; bare entries skip eviction accounting since total_proofs is not incremented, potentially allowing data.len() to exceed AGGREGATED_PAYLOAD_CAP between prune calls.
crates/blockchain/src/store.rs Replaces empty-proof TypeOneMultiSignature placeholder insertion with bare data-key registration in on_block_core, matching the spec's on_block; update_head promoted to pub for test-harness use.
crates/blockchain/tests/forkchoice_spectests.rs Reconstructs per-attestation Type-1s from imported blocks' aggregation_bits and recomputes head after each successful block import, reproducing the proposer-view store that the fixtures encode.
crates/common/types/src/block.rs Removes an outdated doc comment referencing leanSpec PR #717 on MAX_ATTESTATIONS_DATA; no logic change.

Sequence Diagram

sequenceDiagram
    participant Importer
    participant on_block_core
    participant known_payloads as known_payloads (PayloadBuffer)
    participant Reaggregator
    participant ForkChoice

    Note over Importer,ForkChoice: Before fix (broken)
    Importer->>on_block_core: on_block_without_verification(signed_block)
    on_block_core->>known_payloads: insert_known_aggregated_payloads_batch([empty-proof TypeOneMultiSig])
    on_block_core->>ForkChoice: update_head()
    Reaggregator->>ForkChoice: build_block() → merge Type-2
    ForkChoice-->>Reaggregator: ❌ child proof deserialization failed (empty bytes)

    Note over Importer,ForkChoice: After fix (spec-faithful)
    Importer->>on_block_core: on_block_without_verification(signed_block)
    on_block_core->>known_payloads: register_known_aggregated_data_batch([bare data keys, no proofs])
    on_block_core->>ForkChoice: update_head() — zero block-vote weight until reaggregation
    Reaggregator->>known_payloads: push(real Type-1 proofs via gossip / SNARK-split)
    known_payloads-->>ForkChoice: proofs available for fork-choice weight
    Reaggregator->>ForkChoice: build_block() → merge Type-2
    ForkChoice-->>Reaggregator: ✅ valid merged proof
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/storage/src/store.rs:215-228
**Empty entries bypass eviction, data map can grow beyond capacity**

`register_data` adds entries to `data` and `order` without incrementing `total_proofs`. The eviction loop in `push` is gated on `total_proofs > self.capacity` (512), so bare-key entries are invisible to the eviction trigger. If a node imports many blocks but reaggregation is slow or absent, the `data` map can grow well beyond the stated 512-entry bound — up to 8 new entries per slot (per `MAX_ATTESTATIONS_DATA`). The previous code (`TypeOneMultiSignature::empty(bits)`) incremented `total_proofs` by 1 per entry, so it did trigger eviction; this PR removes that side-effect. `prune` bounds growth for finalized entries, but active entries between finalization events accumulate without limit.

Reviews (1): Last reviewed commit: "test(blockchain): deconstruct imported b..." | Re-trigger Greptile

Comment thread crates/storage/src/store.rs Outdated
Comment on lines +215 to +228
fn register_data(&mut self, hashed: HashedAttestationData) {
let (data_root, att_data) = hashed.into_parts();
if self.data.contains_key(&data_root) {
return;
}
self.data.insert(
data_root,
PayloadEntry {
data: att_data,
proofs: Vec::new(),
},
);
self.order.push_back(data_root);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Empty entries bypass eviction, data map can grow beyond capacity

register_data adds entries to data and order without incrementing total_proofs. The eviction loop in push is gated on total_proofs > self.capacity (512), so bare-key entries are invisible to the eviction trigger. If a node imports many blocks but reaggregation is slow or absent, the data map can grow well beyond the stated 512-entry bound — up to 8 new entries per slot (per MAX_ATTESTATIONS_DATA). The previous code (TypeOneMultiSignature::empty(bits)) incremented total_proofs by 1 per entry, so it did trigger eviction; this PR removes that side-effect. prune bounds growth for finalized entries, but active entries between finalization events accumulate without limit.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 215-228

Comment:
**Empty entries bypass eviction, data map can grow beyond capacity**

`register_data` adds entries to `data` and `order` without incrementing `total_proofs`. The eviction loop in `push` is gated on `total_proofs > self.capacity` (512), so bare-key entries are invisible to the eviction trigger. If a node imports many blocks but reaggregation is slow or absent, the `data` map can grow well beyond the stated 512-entry bound — up to 8 new entries per slot (per `MAX_ATTESTATIONS_DATA`). The previous code (`TypeOneMultiSignature::empty(bits)`) incremented `total_proofs` by 1 per entry, so it did trigger eviction; this PR removes that side-effect. `prune` bounds growth for finalized entries, but active entries between finalization events accumulate without limit.

How can I resolve this? If you propose a fix, please make it concise.

@MegaRedHand MegaRedHand marked this pull request as draft May 27, 2026 19:45
on_block registered each block attestation's data in the known pool with
an empty proof set, mirroring leanSpec's on_block. That registration is
inert in our weight model: fork-choice weight is derived from proof
participant bits, so an empty proof set contributes nothing, and the block
builder skips entries with no proofs. Its only side effect was reserving an
insertion-order slot, which reaggregation re-establishes anyway.

The empty entries did, however, bypass the payload buffer's eviction cap:
register_data inserted into the data map without incrementing total_proofs,
so a non-finalizing chain could grow known_payloads unbounded.

Remove the registration entirely. Block-borne votes were already zero-weight
at import; weight still arrives once reaggregation recovers real Type-1
proofs and gossip delivers them. This eliminates the unbounded-growth path by
construction and deletes the now-dead register_data helpers.
@MegaRedHand MegaRedHand changed the title fix(devnet5): drop proof-less Type-1 placeholders that broke block-proof merge fix(devnet5): don't register block-attestation data at import (broke Type-2 merge) May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant