fix: commit request queue dedup cache only after batch_add_requests succeeds#975
Draft
vdusek wants to merge 5 commits into
Draft
fix: commit request queue dedup cache only after batch_add_requests succeeds#975vdusek wants to merge 5 commits into
vdusek wants to merge 5 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #975 +/- ##
==========================================
+ Coverage 89.90% 90.07% +0.16%
==========================================
Files 49 49
Lines 3091 3143 +52
==========================================
+ Hits 2779 2831 +52
Misses 312 312
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…-on-success # Conflicts: # tests/unit/storage_clients/test_apify_request_queue_client.py
…form writes Commit-after-success regressed concurrent deduplication. Overlapping producers re-sent the same request and multiplied platform writes, which the parallel-dedup integration test caught. In-flight adds are now tracked as per-request futures that concurrent producers await instead of re-sending. A producer learns the request is present only once the platform accepts it. If the original add fails, awaiters are told it was not committed and report it unprocessed so it gets retried, never falsely succeeded.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Both Apify request queue clients (
ApifyRequestQueueSingleClient,ApifyRequestQueueSharedClient) cached new requests locally (the single client also updates_head_requests) before callingbatch_add_requests. That caused two bugs:was_already_presentbefore the first call finished. If that call then failed, the producer had already reported success for a request that never reached the platform.Now the cache and head are committed only after
batch_add_requestssucceeds, and only for requests the platform accepted (unprocessed_requestsare skipped). A failed call commits nothing.To keep deduplicating concurrent producers (so overlapping batches do not multiply platform writes), each in-flight add is tracked by a per-request future. A concurrent producer of the same request awaits it instead of re-sending. It is told the request is present only if the original add committed it. If the original add failed, the producer reports the request unprocessed so Crawlee retries it, rather than receiving false success.
Within-batch deduplication is unaffected, since Crawlee already collapses a batch by
unique_key(via_transform_requests) before it reaches these clients.Tests (parametrized over both clients, plus integration):
batch_add_requestsleaves no cached entry, and the retry reaches the platform.test_request_queue_parallel_deduplicationconfirms overlapping concurrent producers write each request to the platform exactly once.