feat(cubestore/raft): hardening — propose timeout + parse-fail drop + disk runbook by agriev · Pull Request #2 · agriev/cube

agriev · 2026-05-07T21:12:23Z

PR-S2. propose() bounded by CUBESTORE_RAFT_PROPOSE_TIMEOUT_SECS (default 30s); transport recv loop drops connection after 16 consecutive protobuf decode failures; new docs/ha/RAFT-DISK-FULL-RUNBOOK.md. All 101 raft:: tests green.

…CI to ha-main Pin debian:bookworm-slim and cubejs/rust-builder:bookworm-llvm-18 by sha256 digest so a tag rewrite upstream can't silently change what we build. Add scripts/pin-base-images.sh as a tooled refresh path — intentional roll-forward becomes a reviewable diff. Also fire the Rust master workflow on ha-main pushes so the HA fork catches Cargo.lock drift on every merge instead of only at the next upstream sync.

…ail drop, disk-full runbook PR-S2 of the production hardening series. State machine: - propose() now bounded by CUBESTORE_RAFT_PROPOSE_TIMEOUT_SECS (default 30s; 0 disables). A stalled apply path or a partition where the local replica isn't actually leader anymore used to block the caller — typically an HTTP handler — forever. On timeout the propose returns a CubeError so the caller can return 503, retry against the current leader, or fail the request. Transport: - The recv loop's per-frame protobuf decode used to `continue` forever on malformed input. A peer flooding garbage frames could pin the task indefinitely. New MAX_CONSECUTIVE_PARSE_FAILS=16 trips the loop into Err on the 17th consecutive failure, which closes the TCP connection and lets the next message reconnect from a clean state. Counter resets on a successful frame so transient corruption (one bad packet, no flood) is still tolerated. Docs: - New docs/ha/RAFT-DISK-FULL-RUNBOOK.md covers the three drive_ready panic sites — they're correct by design (panic > silent commit loss) but operators need a clear recovery script. Includes PVC sizing math, single-pod recovery, all-routers wedged recovery, and the slow-fsync (vs hard-fail) symptom guide. Tests: full raft:: suite still 101/101 green. Build: protobuf@21 toolchain on macOS, debian:bookworm-slim on CI.

agriev added 2 commits May 7, 2026 22:19

agriev merged commit 406fef6 into ha-main May 7, 2026
26 of 28 checks passed

github-actions Bot added cube store rust labels May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cubestore/raft): hardening — propose timeout + parse-fail drop + disk runbook#2

feat(cubestore/raft): hardening — propose timeout + parse-fail drop + disk runbook#2
agriev merged 2 commits into
ha-mainfrom
prod-hardening-s2

agriev commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

agriev commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant