Apply circulation-only updates during the Bibliotheca circulation sweep (PP-4666)#3510
Conversation
|
Claude finished @dbernstein's task in 2m 27s —— View job Code Review
SummaryThis is a correct, surgical fix for the regression. I traced the full path and the gate composes cleanly with Regarding @jonathangreen's note about fixing the doc comment — the docstrings on |
Greptile SummaryThis PR fixes a regression introduced in #3403 where the Bibliotheca circulation sweep silently discarded availability-only updates. The
Confidence Score: 5/5Safe to merge — the change is a one-clause OR extension to an existing predicate, no pre-existing code paths are altered, and the fix is validated by a passing end-to-end regression test that reproduced the production symptom. The fix is minimal: one new compound condition in No files require special attention. Important Files Changed
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3510 +/- ##
==========================================
- Coverage 93.44% 93.44% -0.01%
==========================================
Files 511 511
Lines 46467 46468 +1
Branches 6340 6340
==========================================
Hits 43423 43423
- Misses 1968 1969 +1
Partials 1076 1076 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Before merging you should fix the doc comment as the claude code review suggests. |
5a0df43 to
46f5859
Compare
The sweep gated its apply on BibliographicData.needs_apply(), whose hash deliberately excludes circulation. Any title whose metadata was unchanged -- nearly every title on nearly every sweep -- had its fresh availability silently dropped, so the sweep ran cleanly while updating nothing. This regressed when the sweep moved to Celery (#3403); the previous IdentifierSweepMonitor applied unconditionally, so apply()'s circulation-only fallback always ran. Extend the gate to also consult CirculationData.needs_apply(), which hashes the LicensePool (availability included). When only circulation changed, BibliographicData.apply() takes its circulation-only fallback and updates availability alone. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address PR review: the _process_batch and process_identifiers docstrings still described applying changes only when BibliographicData.needs_apply() returns True -- the stale, incomplete description that masked this bug. Note that the apply also fires on CirculationData.needs_apply(), and why (the bibliographic hash excludes circulation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
46f5859 to
5770b49
Compare
Description
The Bibliotheca circulation sweep (
BibliothecaCirculationUpdater._process_batch) gated its apply onBibliographicData.needs_apply(). That method hashes the bibliographic data, andBibliographicData.fields_excluded_from_hash()deliberately excludescirculation. So for any title whose metadata (title/author/cover/etc.) was unchanged — i.e. nearly every title on nearly every sweep —needs_apply()returnedFalse, thebibliographic_applytask was never queued, and the fresh availability counts read from Bibliotheca were silently discarded. The sweep loggedhandled N identifier(s)with no failures while updating nothing.The fix extends the gate to also consult
CirculationData.needs_apply(session, collection), which hashes theLicensePool(availability included). When only circulation changed,BibliographicData.apply()takes its existing circulation-only fallback and applies the availability alone.Motivation and Context
JIRA (PP-4666)
Bibliotheca availability stopped updating from the periodic circulation sweep: identifiers were processed 25 at a time with no errors, but copy/hold counts never changed in the database.
This is a regression introduced when the sweep was migrated to Celery in #3403. The previous
BibliothecaCirculationSweep(anIdentifierSweepMonitor) calledbibliographic.apply()unconditionally for every title the API returned, soapply()'s circulation-only fallback always ran. The Celery migration added aneeds_apply()short-circuit as a dedup optimization, but keyed it on the bibliographic hash while it was guarding the circulation apply — the one field that hash excludes.The fix preserves the skip-genuinely-unchanged optimization (it does not blindly fan out an apply task per identifier) while no longer being blind to availability changes. Both the async sweep path and the synchronous on-demand path (
process_identifiers→BibliothecaAPI.update_availability) sat behind the same gate and are both fixed.How Has This Been Tested?
test_circulation_applied_when_metadata_unchanged— an end-to-end regression test: it applies metadata once, then feeds a second lookup with identical metadata but fewer available copies, assertsneeds_apply()isFalsefor it (pinning the exact bug scenario), and asserts the pool availability still updates.test_circulation_change_queues_bibliographic_apply— covers the async/sweep path dispatching when only circulation changed.available=5instead of0, matching the production symptom) and pass with the fix.test_bibliotheca_circulation_updater.pyandtest_bibliotheca.py(95) and the bibliotheca Celery task tests (47) pass.mypyclean; pre-commit hooks pass on the changed files.Checklist