Skip to content

Apply circulation-only updates during the Bibliotheca circulation sweep (PP-4666)#3510

Merged
dbernstein merged 2 commits into
mainfrom
bugfix/bibliotheca-circulation-sweep-availability
Jun 26, 2026
Merged

Apply circulation-only updates during the Bibliotheca circulation sweep (PP-4666)#3510
dbernstein merged 2 commits into
mainfrom
bugfix/bibliotheca-circulation-sweep-availability

Conversation

@dbernstein

@dbernstein dbernstein commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Description

The Bibliotheca circulation sweep (BibliothecaCirculationUpdater._process_batch) gated its apply on BibliographicData.needs_apply(). That method hashes the bibliographic data, and BibliographicData.fields_excluded_from_hash() deliberately excludes circulation. So for any title whose metadata (title/author/cover/etc.) was unchanged — i.e. nearly every title on nearly every sweep — needs_apply() returned False, the bibliographic_apply task was never queued, and the fresh availability counts read from Bibliotheca were silently discarded. The sweep logged handled N identifier(s) with no failures while updating nothing.

The fix extends the gate to also consult CirculationData.needs_apply(session, collection), which hashes the LicensePool (availability included). When only circulation changed, BibliographicData.apply() takes its existing circulation-only fallback and applies the availability alone.

needs_apply = bibliographic.needs_apply(self._session) or (
    bibliographic.circulation is not None
    and bibliographic.circulation.needs_apply(self._session, self._collection)
)

Motivation and Context

JIRA (PP-4666)
Bibliotheca availability stopped updating from the periodic circulation sweep: identifiers were processed 25 at a time with no errors, but copy/hold counts never changed in the database.

This is a regression introduced when the sweep was migrated to Celery in #3403. The previous BibliothecaCirculationSweep (an IdentifierSweepMonitor) called bibliographic.apply() unconditionally for every title the API returned, so apply()'s circulation-only fallback always ran. The Celery migration added a needs_apply() short-circuit as a dedup optimization, but keyed it on the bibliographic hash while it was guarding the circulation apply — the one field that hash excludes.

The fix preserves the skip-genuinely-unchanged optimization (it does not blindly fan out an apply task per identifier) while no longer being blind to availability changes. Both the async sweep path and the synchronous on-demand path (process_identifiersBibliothecaAPI.update_availability) sat behind the same gate and are both fixed.

How Has This Been Tested?

  • Added test_circulation_applied_when_metadata_unchanged — an end-to-end regression test: it applies metadata once, then feeds a second lookup with identical metadata but fewer available copies, asserts needs_apply() is False for it (pinning the exact bug scenario), and asserts the pool availability still updates.
  • Added test_circulation_change_queues_bibliographic_apply — covers the async/sweep path dispatching when only circulation changed.
  • Updated the two pre-existing tests that encoded the broken behavior so they hold circulation unchanged too.
  • Verified the regression tests fail on the pre-fix code (the end-to-end one left the pool stuck at available=5 instead of 0, matching the production symptom) and pass with the fix.
  • Full local runs (against throwaway Postgres/Redis): all tests in test_bibliotheca_circulation_updater.py and test_bibliotheca.py (95) and the bibliotheca Celery task tests (47) pass. mypy clean; pre-commit hooks pass on the changed files.
  • I will test on minotaur once merged.

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@claude

claude Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Claude finished @dbernstein's task in 2m 27s —— View job


Code Review

  • Read the PR diff
  • Review the core fix in bibliotheca_circulation_updater.py
  • Review the test changes
  • Examine needs_apply / apply semantics in surrounding code
  • Post review findings

Summary

This is a correct, surgical fix for the regression. I traced the full path and the gate composes cleanly with apply(): BibliographicData.fields_excluded_from_hash does exclude circulation, so gating only on BibliographicData.needs_apply() did drop availability-only changes — exactly as described. Widening the gate to also consult CirculationData.needs_apply() is safe because should_apply_to() (base/mutable.py:134) is a pure read with no side effects, so evaluating the gate and then re-evaluating it inside apply() is idempotent. ReplacementPolicy.from_license_source() leaves even_if_not_apparently_updated=False, so for an unchanged-metadata title apply() falls through to its circulation-only branch (bibliographic.py:819-826) and the new counts still land. Both the async sweep and the synchronous on-demand path go through the same gate and are both fixed, the genuinely-unchanged dedup is preserved, and the tests pin the exact bug scenario (needs_apply() asserted False while availability still updates). I found no bugs to flag.

Regarding @jonathangreen's note about fixing the doc comment — the docstrings on _process_batch and process_identifiers in the latest commit (5770b49) now accurately describe the dual bibliographic + circulation check, so that request appears addressed.
· bugfix/bibliotheca-circulation-sweep-availability

@dbernstein dbernstein added the bug Something isn't working label Jun 26, 2026
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a regression introduced in #3403 where the Bibliotheca circulation sweep silently discarded availability-only updates. The _process_batch gate previously relied solely on BibliographicData.needs_apply(), whose hash deliberately omits the circulation field — so a title with unchanged metadata but different copy/hold counts would always be skipped.

  • Core fix: the gate is now bibliographic.needs_apply(session) or (circulation is not None and circulation.needs_apply(session, collection)), correctly consulting the LicensePool hash for availability changes.
  • Tests added: an end-to-end regression test (test_circulation_applied_when_metadata_unchanged) pins the exact failure — unchanged metadata hash + changed availability — and verifies the pool updates; a sweep-path unit test (test_circulation_change_queues_bibliographic_apply) checks task dispatch; two pre-existing tests are tightened to also hold circulation unchanged.
  • Docstrings for _process_batch and process_identifiers are updated to reflect the dual-hash deduplication.

Confidence Score: 5/5

Safe to merge — the change is a one-clause OR extension to an existing predicate, no pre-existing code paths are altered, and the fix is validated by a passing end-to-end regression test that reproduced the production symptom.

The fix is minimal: one new compound condition in _process_batch. CirculationData.needs_apply() and BibliographicData.needs_apply() are both well-understood helpers, and should_apply_to(None) safely returns True for new pools. The PR includes an end-to-end regression test that fails on pre-fix code and passes with the fix, plus unit tests covering every branch of the new gate on both the async sweep and synchronous on-demand paths. No other callers are affected.

No files require special attention.

Important Files Changed

Filename Overview
src/palace/manager/integration/license/bibliotheca_circulation_updater.py Adds CirculationData.needs_apply() to the deduplication gate in _process_batch; docstrings updated to match. Logic is minimal, targeted, and correct — short-circuit OR preserves performance, circulation is not None guard is safe.
tests/manager/integration/license/test_bibliotheca_circulation_updater.py Adds three new tests: end-to-end regression proof, async-path dispatch test, and two corrected existing tests. Coverage now exercises all four gate combinations (bib×circ both true/false) on both synchronous and async paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_process_batch iterates bibliographic results] --> B{bibliographic.needs_apply\nsession}
    B -- True --> E[needs_apply = True]
    B -- False --> C{bibliographic.circulation\nis not None?}
    C -- No --> F[needs_apply = False]
    C -- Yes --> D{circulation.needs_apply\nsession, collection}
    D -- True --> E
    D -- False --> F
    E --> G{synchronous?}
    G -- True --> H[bibliographic.apply\nin-band]
    G -- False --> I[apply.bibliographic_apply\n.delay Celery task]
    F --> J[skip — no DB write]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[_process_batch iterates bibliographic results] --> B{bibliographic.needs_apply\nsession}
    B -- True --> E[needs_apply = True]
    B -- False --> C{bibliographic.circulation\nis not None?}
    C -- No --> F[needs_apply = False]
    C -- Yes --> D{circulation.needs_apply\nsession, collection}
    D -- True --> E
    D -- False --> F
    E --> G{synchronous?}
    G -- True --> H[bibliographic.apply\nin-band]
    G -- False --> I[apply.bibliographic_apply\n.delay Celery task]
    F --> J[skip — no DB write]
Loading

Reviews (3): Last reviewed commit: "Clarify circulation-gate behavior in upd..." | Re-trigger Greptile

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.44%. Comparing base (ff3405d) to head (5770b49).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3510      +/-   ##
==========================================
- Coverage   93.44%   93.44%   -0.01%     
==========================================
  Files         511      511              
  Lines       46467    46468       +1     
  Branches     6340     6340              
==========================================
  Hits        43423    43423              
- Misses       1968     1969       +1     
  Partials     1076     1076              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbernstein dbernstein requested a review from a team June 26, 2026 05:47
@dbernstein dbernstein changed the title Apply circulation-only updates during the Bibliotheca circulation sweep Apply circulation-only updates during the Bibliotheca circulation sweep (PP-4666) Jun 26, 2026
@jonathangreen

Copy link
Copy Markdown
Member

Before merging you should fix the doc comment as the claude code review suggests.

@dbernstein dbernstein force-pushed the bugfix/bibliotheca-circulation-sweep-availability branch from 5a0df43 to 46f5859 Compare June 26, 2026 13:47
@dbernstein dbernstein enabled auto-merge (squash) June 26, 2026 13:48
dbernstein and others added 2 commits June 26, 2026 06:48
The sweep gated its apply on BibliographicData.needs_apply(), whose hash
deliberately excludes circulation. Any title whose metadata was unchanged
-- nearly every title on nearly every sweep -- had its fresh availability
silently dropped, so the sweep ran cleanly while updating nothing.

This regressed when the sweep moved to Celery (#3403); the previous
IdentifierSweepMonitor applied unconditionally, so apply()'s circulation-only
fallback always ran.

Extend the gate to also consult CirculationData.needs_apply(), which hashes
the LicensePool (availability included). When only circulation changed,
BibliographicData.apply() takes its circulation-only fallback and updates
availability alone.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address PR review: the _process_batch and process_identifiers docstrings
still described applying changes only when BibliographicData.needs_apply()
returns True -- the stale, incomplete description that masked this bug. Note
that the apply also fires on CirculationData.needs_apply(), and why (the
bibliographic hash excludes circulation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dbernstein dbernstein force-pushed the bugfix/bibliotheca-circulation-sweep-availability branch from 46f5859 to 5770b49 Compare June 26, 2026 13:48
@dbernstein dbernstein disabled auto-merge June 26, 2026 13:51
@dbernstein dbernstein enabled auto-merge (squash) June 26, 2026 13:51
@dbernstein dbernstein merged commit e9110ae into main Jun 26, 2026
21 checks passed
@dbernstein dbernstein deleted the bugfix/bibliotheca-circulation-sweep-availability branch June 26, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants