feat: implement asynchronous response handling and lock-free RPC serving#817
Open
akronim26 wants to merge 15 commits into
Open
feat: implement asynchronous response handling and lock-free RPC serving#817akronim26 wants to merge 15 commits into
akronim26 wants to merge 15 commits into
Conversation
* Log attestation subnet coverage in chain status * fix forkchoice aggregate proof ownership transfer * fix: pre-save new-payload coverage before merge, add late/diff/agg-start reports Per g11tech feedback on PR blockblaz#876: - Capture coverage of latest_new_aggregated_payloads BEFORE the acceptNewAttestationsUnlocked merge into SavedPreMergeCoverage. Report this as 'prev_new' in the printSlot coverage line so the previous slot's accepted payloads are always visible even after the map has been cleared. - Add 'late' section: current latest_new_aggregated_payloads at report time (payloads that arrived after the last merge — late arrivals for the slot). - Add 'diff(block-prev_new)' field: validator count present in block but not prev_new (+N) and vice-versa (-M). - Add logNewPayloadsCoverageForAggregation: called in aggregateImpl before chain.forkChoice.aggregate, logs subnet-wise coverage of current new payloads so the aggregator operator can see what payload pool is about to be aggregated. - Update test cleanup in RebaseTestContext.deinit and standalone fork_choice tests to free saved_pre_merge_new_coverage. * improve log * fix ci cargo invocation and add coverage metrics * add block proposal aggregate coverage attribution * refactor(forkchoice): deduplicate countSeen/countTrue, rename NoSubnet fn, remove dead code - Remove countTrue (identical to countSeen); update recordAggregateCoverageMetrics to use countSeen - Rename markCoverageIfParticipantNoSubnet -> markCoverageValidatorOnly with doc comment clarifying it validates subnet membership but does not track per-subnet bits - Remove dead collectCoverageFromPayloadsForData (no callers; superseded by collectSourceCoverageFromPayloadsForData which tracks source attribution) Addresses anshalshukla review feedback on PR blockblaz#876. --------- Co-authored-by: zclawz <zclawz@users.noreply.github.com> Co-authored-by: harkamal <gajinder@zeam.in> Co-authored-by: ch4r10t33r <psramanuj@gmail.com> Co-authored-by: Parthasarathy Ramanujam <1627026+ch4r10t33r@users.noreply.github.com>
…nge and blocks_by_root (blockblaz#882) * fix(reqresp): return INVALID_REQUEST for zero/over-limit blocks_by_range and over-limit blocks_by_root (closes blockblaz#881) * Continue blocks_by_range catch-up immediately * ci: disable rust toolchain bin cache * network, libp2p-glue: snappy-frame reqresp error responses and send INVALID_REQUEST RPC error responses were sent with a hardcoded result code (2) and a non-snappy-framed payload, so peers could not decode them — the framing the success path applies via buildResponseFrame was skipped entirely. Frame error responses on the Zig side in a new sendRpcErrorFramed helper that mirrors serverStreamSendResponse, and reduce send_rpc_error_response to relaying the finished frame (dropping the broken code+varint+raw assembly). serverStreamSendError now forwards the real result code instead of discarding it. Peer-input errors (unsupported protocol, malformed frame/snappy/length, failed deserialization) now return INVALID_REQUEST (1) rather than SERVER_ERROR (2): an over-limit blocks_by_root request fails SSZ deserialization before reaching the node-level length check, so the deserialize path must itself report INVALID_REQUEST. Verified against the hive lean reqresp suite: blocks_by_root/too_many_roots, blocks_by_range/zero_count and blocks_by_range/too_many_blocks all pass. * chore: remove ci.yml cache tweaks — unrelated to reqresp fix --------- Co-authored-by: zclawz <zclawz@users.noreply.github.com> Co-authored-by: Parthasarathy Ramanujam <1627026+ch4r10t33r@users.noreply.github.com> Co-authored-by: Chen Kai <281165273grape@gmail.com> Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
Author
|
Hey @anshalshukla, can you approve the CI run? |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This PR optimizes node responsiveness and synchronization efficiency by refactoring the network-to-chain synchronization pipeline into an asynchronous model.
Fixes #708
Summary
ReqRespResponseEventhandling via MPSC queue andxev.Asyncnotifier.status_snapshotinBeamChainfor atomic, contention-free status serving.BeamNode.mutexfromblocks_by_rootandstatusrequest handlers.Subject to Review
These issues were observed while testing the changes locally:
libxevtimers with a single synchronous tick loop inclock.zigto resolve SSE test flakiness.BeamSimProcessto run integration tests in isolated.zig-cachedirectories, preventing state corruption in CI.isSlotJustified) inBeamStateto fix finalization logic bugs.Verification
zig build testpassed.