spock_apply: do not send local LSN as publisher feedback on sync-standby#491
spock_apply: do not send local LSN as publisher feedback on sync-standby#491ibrarahmad wants to merge 3 commits into
Conversation
The synchronous-standby hold queue stored the subscriber's local XactLastCommitEnd in RemoteSyncPosition.recvpos and later sent it to the publisher as the slot's confirmed flush position. The two LSNs belong to different WAL streams, so the publisher wrote a meaningless value into confirmed_flush_lsn -- typically far ahead of its own pg_current_wal_lsn. After a CNPG failover the promoted standby resumed from that bogus LSN, the publisher waited until its own WAL caught up, then started streaming from there. Every commit between the real origin position and the bogus LSN was silently dropped. Split RemoteSyncPosition into local_commit_lsn (gating) and remote_recvpos / remote_writepos / remote_flushpos (feedback). append_feedback_position now takes both LSNs explicitly. get_feedback_position gates the release on local_commit_lsn <= WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH] and returns the remote positions, so the publisher sees values from its own WAL stream.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Drop the trailing paragraph and sentence that described the pre-fix behaviour in the comments before append_feedback_position() and get_feedback_position(); they belong in the commit message, not the source. Add tests/tap/t/020_sync_standby_feedback_lsn.pl, which attaches a synchronous physical standby to the subscriber and asserts the publisher's confirmed_flush_lsn never exceeds its own pg_current_wal_lsn.
Promote the subscriber's sync standby after the publisher commits and assert every publisher row is present on the new primary. The original bug surfaced as silently dropped writes after promotion; the LSN symptom check alone does not exercise that path.
The synchronous-standby hold queue stored the subscriber's local XactLastCommitEnd in RemoteSyncPosition.recvpos and later sent it to the publisher as the slot's confirmed flush position. The two LSNs belong to different WAL streams, so the publisher wrote a meaningless value into confirmed_flush_lsn -- typically far ahead of its own pg_current_wal_lsn. After a CNPG failover the promoted standby resumed from that bogus LSN, the publisher waited until its own WAL caught up, then started streaming from there. Every commit between the real origin position and the bogus LSN was silently dropped.
Split RemoteSyncPosition into local_commit_lsn (gating) and remote_recvpos / remote_writepos / remote_flushpos (feedback). append_feedback_position now takes both LSNs explicitly. get_feedback_position gates the release on local_commit_lsn <= WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH] and returns the remote positions, so the publisher sees values from its own WAL stream.