fix(airflow-client): retry writes on transient 5xx and make copy operations idempotent#2171
Open
sakethsomaraju wants to merge 5 commits into
Open
fix(airflow-client): retry writes on transient 5xx and make copy operations idempotent#2171sakethsomaraju wants to merge 5 commits into
sakethsomaraju wants to merge 5 commits into
Conversation
Coverage Report for CI Build 28068617702Coverage increased (+0.02%) to 45.301%Details
Uncovered Changes
Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
wendtek
reviewed
Jun 15, 2026
schnie
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes two related issues that could leave a preview deployment in a permanently broken, partially-configured state after a transient network failure during airflow-variable copy, connection copy, or pool copy.
Problem 1: No retry on transient write failures
The CLI used retryablehttp for GET requests but not for write operations. A single transient 503 mid-copy caused the entire command to fail with no recovery, leaving the destination with only a partial set of variables/connections/pools.
Problem 2: Copy command not idempotent on retry
On manual retry after a partial failure, the CLI encountered already-copied resources in the destination and attempted to PATCH them. For variables with a leading / in the key, this produced a malformed URL (/api/v1/variables//var-name), causing Airflow to return 404. The command would then fail permanently on every subsequent retry attempt.
Changes
Retry policy (checkRetryPolicy):
GET requests: unchanged (default policy)
Non-GET (writes): retry only on 502, 503, 429, cases where the server provably did not apply the write. Transport errors, 500, and 504 are not retried as these are ambiguous (write may have already been applied)
Write retries use a separate budget: 3 attempts with 5s backoff
Operation-layer idempotency (CreateVariable, CreateConnection, CreatePool):
Each create operation is now an upsert: on 409 Conflict, falls back to the corresponding PATCH update
Makes the entire copy command safe to re-run after any partial failure, existing resources get updated, missing ones get created
🎟 Issue(s)
Closes #2172
Reduces the likelyhood of encountering, but not fully remediates #2165
🧪 Functional Testing
📸 Screenshots
To reproduce the 503 responses, I force-deleted the webserver and confirmed that the webserver container came back online within 15 seconds.
In the 1.42.1 version.
After fix:
📋 Checklist
make testbefore taking out of draftmake lintbefore taking out of draft