Skip to content

feat: implement asynchronous response handling and lock-free RPC serving#817

Open
akronim26 wants to merge 15 commits into
blockblaz:mainfrom
akronim26:req-resp
Open

feat: implement asynchronous response handling and lock-free RPC serving#817
akronim26 wants to merge 15 commits into
blockblaz:mainfrom
akronim26:req-resp

Conversation

@akronim26

@akronim26 akronim26 commented May 1, 2026

Copy link
Copy Markdown

This PR optimizes node responsiveness and synchronization efficiency by refactoring the network-to-chain synchronization pipeline into an asynchronous model.

Fixes #708

Summary

  • Async Req-Resp: Buffered ReqRespResponseEvent handling via MPSC queue and xev.Async notifier.
  • Lock-Free RPC: Cached status_snapshot in BeamChain for atomic, contention-free status serving.
  • Reduced Lock Contention: Dropped global BeamNode.mutex from blocks_by_root and status request handlers.

Subject to Review

These issues were observed while testing the changes locally:

  • Clock Refactor: Replaced individual libxev timers with a single synchronous tick loop in clock.zig to resolve SSE test flakiness.
  • Test Isolation: Introduced BeamSimProcess to run integration tests in isolated .zig-cache directories, preventing state corruption in CI.
  • State Fix: Corrected slot justification checks (isSlotJustified) in BeamState to fix finalization logic bugs.

Verification

  • zig build test passed.
  • New async queue and status cache tests passing.

@akronim26 akronim26 marked this pull request as draft May 1, 2026 13:51
@akronim26 akronim26 marked this pull request as draft May 1, 2026 13:51
@akronim26 akronim26 marked this pull request as ready for review May 6, 2026 17:18
akronim26 and others added 4 commits May 17, 2026 00:55
* Log attestation subnet coverage in chain status

* fix forkchoice aggregate proof ownership transfer

* fix: pre-save new-payload coverage before merge, add late/diff/agg-start reports

Per g11tech feedback on PR blockblaz#876:
- Capture coverage of latest_new_aggregated_payloads BEFORE the
  acceptNewAttestationsUnlocked merge into SavedPreMergeCoverage.
  Report this as 'prev_new' in the printSlot coverage line so the
  previous slot's accepted payloads are always visible even after
  the map has been cleared.
- Add 'late' section: current latest_new_aggregated_payloads at
  report time (payloads that arrived after the last merge — late
  arrivals for the slot).
- Add 'diff(block-prev_new)' field: validator count present in
  block but not prev_new (+N) and vice-versa (-M).
- Add logNewPayloadsCoverageForAggregation: called in aggregateImpl
  before chain.forkChoice.aggregate, logs subnet-wise coverage of
  current new payloads so the aggregator operator can see what
  payload pool is about to be aggregated.
- Update test cleanup in RebaseTestContext.deinit and standalone
  fork_choice tests to free saved_pre_merge_new_coverage.

* improve log

* fix ci cargo invocation and add coverage metrics

* add block proposal aggregate coverage attribution

* refactor(forkchoice): deduplicate countSeen/countTrue, rename NoSubnet fn, remove dead code

- Remove countTrue (identical to countSeen); update recordAggregateCoverageMetrics to use countSeen
- Rename markCoverageIfParticipantNoSubnet -> markCoverageValidatorOnly with doc comment
  clarifying it validates subnet membership but does not track per-subnet bits
- Remove dead collectCoverageFromPayloadsForData (no callers; superseded by
  collectSourceCoverageFromPayloadsForData which tracks source attribution)

Addresses anshalshukla review feedback on PR blockblaz#876.

---------

Co-authored-by: zclawz <zclawz@users.noreply.github.com>
Co-authored-by: harkamal <gajinder@zeam.in>
Co-authored-by: ch4r10t33r <psramanuj@gmail.com>
Co-authored-by: Parthasarathy Ramanujam <1627026+ch4r10t33r@users.noreply.github.com>
…nge and blocks_by_root (blockblaz#882)

* fix(reqresp): return INVALID_REQUEST for zero/over-limit blocks_by_range and over-limit blocks_by_root (closes blockblaz#881)

* Continue blocks_by_range catch-up immediately

* ci: disable rust toolchain bin cache

* network, libp2p-glue: snappy-frame reqresp error responses and send INVALID_REQUEST

RPC error responses were sent with a hardcoded result code (2) and a
non-snappy-framed payload, so peers could not decode them — the framing
the success path applies via buildResponseFrame was skipped entirely.

Frame error responses on the Zig side in a new sendRpcErrorFramed helper
that mirrors serverStreamSendResponse, and reduce send_rpc_error_response
to relaying the finished frame (dropping the broken code+varint+raw
assembly). serverStreamSendError now forwards the real result code
instead of discarding it.

Peer-input errors (unsupported protocol, malformed frame/snappy/length,
failed deserialization) now return INVALID_REQUEST (1) rather than
SERVER_ERROR (2): an over-limit blocks_by_root request fails SSZ
deserialization before reaching the node-level length check, so the
deserialize path must itself report INVALID_REQUEST.

Verified against the hive lean reqresp suite: blocks_by_root/too_many_roots,
blocks_by_range/zero_count and blocks_by_range/too_many_blocks all pass.

* chore: remove ci.yml cache tweaks — unrelated to reqresp fix

---------

Co-authored-by: zclawz <zclawz@users.noreply.github.com>
Co-authored-by: Parthasarathy Ramanujam <1627026+ch4r10t33r@users.noreply.github.com>
Co-authored-by: Chen Kai <281165273grape@gmail.com>
Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
@akronim26 akronim26 marked this pull request as draft May 19, 2026 12:37
@akronim26 akronim26 marked this pull request as ready for review May 20, 2026 11:15
@akronim26

Copy link
Copy Markdown
Author

Hey @anshalshukla, can you approve the CI run?

@akronim26

Copy link
Copy Markdown
Author
image The simtests are passing locally. Is there anything I am missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: req/resp handler holds forkchoice exclusive lock during block processing on libp2p thread

2 participants