catchup receive: fallback to full backup when empy dir given#2221
Open
x4m wants to merge 7 commits intowal-g:masterfrom
Open
catchup receive: fallback to full backup when empy dir given#2221x4m wants to merge 7 commits intowal-g:masterfrom
x4m wants to merge 7 commits intowal-g:masterfrom
Conversation
If the receiver's data directory has no pg_control and is empty, catchup-receive now sends a zero-value PgControlData sentinel and an empty file list instead of fataling. catchup-send detects the zero SystemIdentifier, skips the identity/timeline/checkpoint checks, and sends all files as full copies — effectively performing a pg_basebackup- style initial population of the standby directory. If the directory has files but no pg_control, catchup-receive fatals with a clear message, consistent with the check in backup-fetch (NonEmptyDBDataDirectoryError). The directory state classification uses utility.IsDirectoryEmpty, matching the approach in checkDBDirectoryForUnwrap. Made-with: Cursor
Test classifyDataDirectory with an empty directory, a directory containing pg_control, and a directory with files but no pg_control. Also test that sendControlAndFileList encodes a zero-value PgControlData and an empty BackupFileList when the target directory is empty. Made-with: Cursor
Add a new scenario that creates a completely empty PGDATA directory, runs catchup-receive on it, then runs catchup-send from the primary. After the transfer, the standby is started and its pg_dump output is compared to the primary dump to verify a successful full copy. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
wait_while_pg_not_ready.sh loops until transaction_read_only is off, which never happens on a hot standby. pg_ctl -w already waits for the server to be ready to accept connections. Made-with: Cursor
…ucket
Protect the receiver's data directory from accidental PostgreSQL startup
while a catchup is in progress:
- At the start of HandleCatchupReceive, pg_control is renamed to
pg_control.catchup (or an empty sentinel is created for a fresh empty
directory). Without pg_control PostgreSQL refuses to start.
- The sender's pg_control arrives as an IsBinContents command. Instead
of writing it directly, the receiver stores it as pg_control.new so
the directory remains unbootable even after all data files are received.
- Only when IsDone is received are the final two atomic steps performed:
1. pg_control.new -> pg_control (database becomes startable)
2. pg_control.catchup is removed (sentinel cleanup)
- classifyDataDirectory gains a new pgDataStateInterrupted state: if
pg_control.catchup exists but pg_control does not, the receiver fatals
with a clear message asking the operator to remove the directory and
retry from scratch. Partial retry is not safe because already-written
paged files may receive incorrect incremental applies.
- receiveFileList now skips pg_control.catchup by name so the sender
never treats the sentinel as a file to delete.
Also add the missing minio bucket mariadb_pitr_mariabackup_full to
docker-compose.yml; the bucket was omitted when the test was introduced
in commit 7751b10, causing the test to fail with NoSuchBucket.
Unit tests: TestClassifyDataDirectory_Interrupted,
TestClassifyDataDirectory_NormalBeatsInterrupted.
Made-with: Cursor
vadv
reviewed
Mar 26, 2026
Contributor
vadv
left a comment
There was a problem hiding this comment.
Missing MkdirAll in IsFull branch
| targetFileName = utility.SanitizePath(PgControlNewPath) | ||
| } | ||
| tracelog.InfoLogger.Printf("Writing file %v", targetFileName) | ||
| err := os.MkdirAll(path.Dir(path.Join(directory, targetFileName)), 0700) |
Contributor
There was a problem hiding this comment.
The IsFull branch below (line 343) needs the same MkdirAll — on an empty receiver the parent directories won't exist yet.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe what this PR fixes
catchup-receive used to fatal immediately when the target directory had no pg_control, making it impossible to bootstrap a new standby from scratch without first running pg_basebackup.
This PR adds a three-way check on the receiver side (using utility.IsDirectoryEmpty, consistent with backup-fetch):
pg_control exists — normal incremental catchup, behaviour unchanged
directory is empty — receiver sends a zero sentinel to the sender; sender detects it, skips identity/timeline/checkpoint validation, and transfers all files as full copies — equivalent to an initial base backup over the catchup protocol
directory has files but no pg_control — fatal with a clear message, avoids silently overwriting an unrelated data directory
Please provide steps to reproduce (if it's a bug)
mkdir /var/lib/postgresql/new_standby
wal-g catchup-receive /var/lib/postgresql/new_standby 1337 &
before this PR: FATAL: open .../global/pg_control: no such file or directory
wal-g catchup-send /var/lib/postgresql/data localhost:1337
Please add config and wal-g stdout/stderr logs for debug purpose
Before (fatal on empty directory)
INFO: Receiving /var/lib/postgresql/new_standby on port 1337
FATAL: open /var/lib/postgresql/new_standby/global/pg_control: no such file or directory
After (full copy succeeds)
INFO: Receiving /var/lib/postgresql/new_standby on port 1337
INFO: Data directory is empty, requesting full copy from sender
... sender transfers all files ...
INFO: Receive done