Decouple the flush path from the read path#12840
Decouple the flush path from the read path#12840armstrong0704 wants to merge 1 commit intoneondatabase:mainfrom
Conversation
There was a problem hiding this comment.
Hey. Thanks for the PR. Appreciate you taking an interest in this and taking the time to work on this.
In the current form the changes here are very broad. This part of the code base is rather delicate (there's a number of non-obvious invariants) and we've tried to incrementally evolve it over time with particular attention to correctness. As it stands, convincing myself this is all correct would take a substantial amount of time and it carries a lot of risk.
If you're interested I suggest a rework that preserves the overall code structure and focuses on the issue of holding the writer lock while reading from the flush buffer.
I think changing BufferedWriter::maybe_flushed to ArcSwap<Option<FullSlice<B::IoBuf>>> and making BufferedWriter::inspect_maybe_flushed to return the current value in the arc swap would be sufficient. Then you could drop the write lock after reading from the mutable buffer.
Refactors BufferedWriter to use ArcSwap for the flushing buffer, enabling wait-free reads in EphemeralFile. Problem: When EphemeralFile flushes to disk, it holds a write lock on the BufferedWriter. During slow disk I/O (10ms+), this blocks all concurrent GetPage requests (readers), causing latency spikes. Solution: 1. Modified BufferedWriter to store maybe_flushed in an ArcSwap. 2. Updated flush() to atomically publish the buffer to readers before blocking on the underlying write operation. 3. Updated EphemeralFile to check this atomic pointer lock-free before falling back to the mutex. Outcome: - Eliminates read stalls: Readers continue serving hits from the flushing buffer while disk I/O blocks the writer. - Preserves architecture: Keeps the existing flow-control and pipelining logic of BufferedWriter (addressing review feedback). - Verified: ~28x improvement in read throughput under simulated latency. Fixes neondatabase#12172
7ec3202 to
643c929
Compare
|
@VladLazar Thanks for the review and recommendations. I have updated the PR based on your feedback.
|
|
checking if any updates on the review? Thanks! |
Problem
Previously, flushing the EphemeralFile to disk held a write lock on the struct. During slow disk IO (10ms - 50ms+), this blocked all concurrent readers (GetPage), causing latency spikes and throughput collapse.
See #12172 for the full context.
Summary of changes
arc_swap::ArcSwapto publish the frozen buffer atomically. Readers access it wait-free.disk_committed_offsetto distinguish between data in the buffer vs. disk, preventing race conditions.writer_lockto serialize concurrent writers (WAL append-only integrity) without blocking readers.Verification & Benchmarks (Local NVMe + Synthetic 50ms Latency):