Slave rewrite#143
Conversation
| 9. EL: tx execution + state/receipts validation + persist + payload-self-consistency check (recomputes Header.Hash() and rejects on mismatch — standard EL behavior, same as vanilla post-merge geth against its beacon CL) | ||
| ``` | ||
|
|
||
| **EVM-level note**: post-merge Ethereum redefined opcode `0x44` from `DIFFICULTY` to `PREVRANDAO` (EIP-4399). For QKC's EVM upgrade, what `block.difficulty` / `block.prevrandao` actually return is a patched-geth decision (preserve the pre-Paris numeric-difficulty semantics, or move to PREVRANDAO with mixhash as the source). Either choice has implications for existing QKC mainnet contracts that read this opcode; audited under the patched-geth workstream, not here. |
There was a problem hiding this comment.
PREVRANDAO / Mixhash Risk:
Suppose the Engine API payload is built with payloadAttributes.prevRandao. If a pending transaction reads PREVRANDAO, geth executes that transaction against this placeholder value while computing the payload state root.
If the CL later replaces prevRandao with the mined PoW mixhash before calling engine_newPayload, any transaction that read PREVRANDAO may observe a different value on re-execution. That can make the sealed payload invalid or expose a consensus mismatch, depending on how the EL validates the payload.
| |---|---| | ||
| | Slave Process (this Go binary) | `SlaveServer` ([slave.py](../quarkchain/cluster/slave.py)) | | ||
| | ShardCL | `Shard` ([shard.py:502](../quarkchain/cluster/shard.py#L502)) | | ||
| | ShardCL's geth subprocess | (new) — replaces `MinorBlockChain` + EVM + StateDB | |
There was a problem hiding this comment.
Should geth's builtin p2p capability be disabled?
There was a problem hiding this comment.
Yes, we don't use geth's p2p
| | `GET_ROOT_CHAIN_STAKES_REQUEST` | **Only the slave owning chain 0 shard 0 serves this** (master hardcodes `full_shard_id = 1` at [master.py:1777](../quarkchain/cluster/master.py#L1777)). `eth_call` against the `ROOT_CHAIN_POSW` system contract at the last-confirmed minor block's state (requires geth `--gcmode=archive` or a short-term snapshot) | | ||
| | `GET_ECO_INFO_LIST_REQUEST` | CL local stats (shard tip, difficulty, etc.) | | ||
|
|
||
| ### 4.4 Mining ops |
There was a problem hiding this comment.
Are we removing 'GET_NEXT_BLOCK_TO_MINE_REQUEST' on purpose? It is also in 'MASTER_OP_RPC_MAP'.
| Reference: [`quarkchain/protocol.py`](../quarkchain/protocol.py), [`quarkchain/cluster/protocol.py`](../quarkchain/cluster/protocol.py). | ||
|
|
||
| Field semantics: | ||
| - `branch`: shard identifier; `ROOT_BRANCH` = `0x1` for root chain / cluster control plane. |
|
|
||
| **Not stored at all** (compared to earlier drafts of this doc that proposed them): | ||
| - `td` (total difficulty) — **doesn't apply to shard chain**. Pyquarkchain's `MinorBlockHeader` ([core.py:681](../quarkchain/core.py#L681)) has per-block `difficulty` but **no `total_difficulty` field**, and `shard_state.py` never references TD on the shard side. Shard tip-update uses height + root-anchor tie-breaker, not TD (see §10 / B.3). Only `RootBlockHeader` carries `total_difficulty`, used by master's root-chain fork choice. | ||
| - Canonical / root tip pointers — reconstructed at startup the same way pyquarkchain does it (see [shard_state.py:279 `init_from_root_block`](../quarkchain/cluster/shard_state.py#L279)): master sends the current root tip via `CONNECT_TO_SLAVES_REQUEST`; CL looks up `b"r_last_m" + root_tip_hash` to find the last confirmed mheader for this shard, and that becomes the starting `header_tip`. Any unconfirmed mblocks mined after the last root checkpoint are discarded on restart — same behavior as pyquarkchain today. |
There was a problem hiding this comment.
It seems CONNECT_TO_SLAVES_REQUEST only carries slave info. root_tip is handled by PING.
|
|
||
| **Exit criteria**: a master + one new slave can boot from existing mainnet genesis state, ingest a `CONNECT_TO_SLAVES_REQUEST` + `ADD_ROOT_BLOCK_REQUEST`, mine minor blocks, and report headers back to master. Round-trip on existing wire formats. | ||
|
|
||
| ### M3 — Peer-shard P2P and sync |
There was a problem hiding this comment.
Suggest change M3 to M3 — Xshard, checkdb
Rationale: checkdb reads historical blocks from go/quarkchain db, re-inserts them into the chain, and verifies state — zero network dependency. Xshard hooks (pre/post-block cursor, deposit consumption) operate on block payload data, not real-time P2P. Implementing these first lets us validate the full execution pipeline via checkdb before any P2P work.
|
|
||
| **Exit criteria**: new slave joins a running testnet, downloads the chain from a peer, and stays at the tip with vanilla pyquarkchain slaves continuing to mine. | ||
|
|
||
| ### M4 — Xshard, RPC surface, indexer |
There was a problem hiding this comment.
I think we can put All network-related development at M4, P2P VirtualConnection, sync, real-time slave-to-slave xshard delivery, and master TCP integration belong here.
| This same field set also defines the CL↔EL Engine API extension schema (`PayloadAttributesV?QKC` / `ExecutionPayloadV?QKC`) — the contract the CL workstream and the patched-geth workstream must agree on, replacing M1's seal-only dev path. | ||
| - No p2p, no sync, no xshard yet. | ||
|
|
||
| **Exit criteria**: a master + one new slave can boot from existing mainnet genesis state, ingest a `CONNECT_TO_SLAVES_REQUEST` + `ADD_ROOT_BLOCK_REQUEST`, mine minor blocks, and report headers back to master. Round-trip on existing wire formats. |
There was a problem hiding this comment.
What is "existing mainnet genesis state" mean? Does it mean mainnet config?
| - Divergence triage tooling: when a block fails, capture pre-state, tx list, post-state diff so the patched-geth or CL bug is bisectable. | ||
| - Performance pass: long sync needs reasonable throughput; sequential `engine_newPayload` may bottleneck (open item — tracked separately in the perf workstream). | ||
|
|
||
| **Anticipated problem — EVM version divergence**: pyquarkchain's EVM is an old fork of pyethereum (Constantinople/Petersburg era), while the new EL is latest-geth. Between them, opcode gas costs were repriced (EIP-2929 access lists, EIP-3529 refund changes, EIP-1884, EIP-2200, …), opcodes were added (`PUSH0`, …), and behaviors changed. If a historical block is re-executed under the *new* EVM rules, gas accounting diverges → `gasUsed`, refunds, and coinbase amounts differ → state root won't match → replay fails. This is the single most likely cause of replay divergence. |
There was a problem hiding this comment.
I think we can reuse existing --check_db tool as a light-weight pre-test before full replay harness sync test
No description provided.