Skip to content

Git provider: retry more#751

Closed
eliska-n wants to merge 11 commits intomasterfrom
fix/git-retry-more
Closed

Git provider: retry more#751
eliska-n wants to merge 11 commits intomasterfrom
fix/git-retry-more

Conversation

@eliska-n
Copy link
Copy Markdown
Collaborator

@eliska-n eliska-n commented Apr 30, 2026

I tested on localhost that :

clone is retried every 60s when git server is not ready
pull is retried every 60s unless successful
pull interval can be configured via #pull=10m URL fragment

I added a check that the initialition task does not run mutliple times at once.

I rearranged the code (parse_url_fragment, error flow in init) to improve readability.

I have separated periodic_pull method to periodic_check (with check interval 60s) and periodic_pull (with pull interval 12h if not configured). Only to improve readability.

@eliska-n eliska-n self-assigned this Apr 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Warning

Rate limit exceeded

@eliska-n has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 6 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: afb73764-70de-48a2-94a5-46aaf2d7a4bb

📥 Commits

Reviewing files that changed from the base of the PR and between 851261c and a44044c.

📒 Files selected for processing (1)
  • asab/library/providers/git.py
📝 Walkthrough

Walkthrough

Replaces the periodic 60s pull handler with _periodic_check that schedules repo initialization when missing, enforces pull locking and PullInterval throttling, refactors pull into a lock-scoped _periodic_pull, and centralizes synchronous repo setup in _init_task() with readiness gating and targeted error handling.

Changes

Git Library Provider Initialization & Pull Refactoring

Layer / File(s) Summary
PubSub wiring
asab/library/providers/git.py
60-second PubSub subscription now invokes _periodic_check instead of _periodic_pull.
Periodic control / Throttling
asab/library/providers/git.py
Adds _periodic_check(event_name) to schedule initialization when GitRepository is None, prevent duplicate init with InitInProgress, short-circuit if PullLock is locked, and enforce LastPull + PullInterval before pulling.
Pull execution
asab/library/providers/git.py
Refactors _periodic_pull() signature (removes event_name), acquires PullLock, runs _do_pull() inside ProactorService.execute, updates LastPull, logs success, and publishes Library.change! for each affected path; preserves pygit2.GitError handling.
Synchronous initialization task
asab/library/providers/git.py
Adds _init_task() executed on proactor thread: clone into RepoPath if discovery fails or open existing repo; validate remotes and that working tree contains non-.git files; if empty, remove repo dir, clear GitRepository, and raise RuntimeError for retry.
Initialization orchestration & errors
asab/library/providers/git.py
initialize_git_repository() now runs _init_task() once via ProactorService.execute, sets/clears InitInProgress, catches and logs KeyError/pygit2.GitError/RuntimeError/generic exceptions, calls _set_ready() only after success, and triggers immediate pull when repo was opened (not freshly cloned).
Manifest updates
pyproject.toml
Project manifest touched (lines changed present in diff).

Sequence Diagram(s)

sequenceDiagram
  participant PubSub as PubSub (60s tick)
  participant Provider as GitLibraryProvider
  participant Proactor as ProactorService
  participant Git as pygit2 Repo
  participant Pub as Library Publisher

  PubSub->>Provider: tick / event
  Provider->>Provider: _periodic_check()
  alt Repo missing
    Provider->>Provider: set InitInProgress
    Provider->>Proactor: execute _init_task()
    Proactor->>Git: clone or open repo & validate
    Git-->>Proactor: repo object or error
    Proactor->>Provider: init result (fresh_clone?)
    Provider->>Provider: _set_ready() on success
  end
  Provider->>Provider: check PullLock & PullInterval
  alt allowed to pull
    Provider->>Proactor: execute _periodic_pull()
    Proactor->>Provider: _do_pull() results (changed paths)
    Provider->>Pub: publish Library.change! for each path
    Provider->>Provider: update LastPull and log
  else skip (locked/throttled)
    Provider-->>PubSub: no-op
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • mejroslav
  • ateska

Poem

🐰 I hopped where timers softly tick,
I nudged the roots when clones were sick,
Locks held fast so pulls don't race,
Init waits patient, finds its place,
Then songs of changes light the stick.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is vague and generic, using the non-descriptive phrase 'retry more' without clarifying what specific retry behavior is being enhanced or changed. Consider a more specific title that describes the actual change, such as 'Git provider: add periodic init retry and immediate pull on startup' or 'Git provider: refactor pull timing with init retry logic'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/git-retry-more

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 16 minutes and 6 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@asab/library/providers/git.py`:
- Around line 365-389: The docstring for _init_task is inaccurate and calling
self._do_pull() on existing repositories also leaves LastPull unset causing an
immediate duplicate pull; update the _init_task docstring to state that existing
repositories are pulled during init (or remove the self._do_pull() call if you
prefer to defer pulls entirely to _periodic_pull), and if you keep the
_do_pull() call then set LastPull to the current timestamp right after
self._do_pull() (use the same time representation used elsewhere) so the
subsequent _periodic_pull() doesn’t immediately run a second pull; references:
_init_task, _do_pull, _periodic_pull, LastPull, GitRepository, RepoPath.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f82c589a-4efa-4593-8641-a123f3cc89b8

📥 Commits

Reviewing files that changed from the base of the PR and between bd01f5a and 796066c.

📒 Files selected for processing (1)
  • asab/library/providers/git.py

Comment thread asab/library/providers/git.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
asab/library/providers/git.py (1)

468-469: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Initialize LastPull when setup succeeds.

After a successful clone or init-time pull, LastPull is still None, so the next Application.tick/60! bypasses PullInterval and fetches again immediately. Seed it here once initialization completes so the throttle applies from startup as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@asab/library/providers/git.py` around lines 468 - 469, After a successful
repo setup (the block that calls await self._set_ready()), initialize the
provider's LastPull timestamp so the PullInterval throttle applies immediately
after startup; set self.LastPull to the current time (e.g., utcnow/timestamp)
right before or after calling _set_ready() in the same initialization path where
clone/init pull succeeds. This change should be made in the git provider setup
code that currently only calls _set_ready() so subsequent Application.tick/60
checks respect PullInterval from startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@asab/library/providers/git.py`:
- Around line 419-436: The init flow can leave self.GitRepository set when
_init_task() fails during _do_pull(), which prevents _periodic_check() from
retrying since it only checks for None; modify _init_task() so that any early
returns in the except handlers clear self.GitRepository (or better: only assign
self.GitRepository after the entire init/pull succeeds) — update the KeyError
and pygit2.GitError handlers (and any similar early-return error handlers around
the _do_pull/_init sequence) to unset self.GitRepository before returning or
defer the assignment until after _set_ready() completes.
- Around line 437-440: The log call in L.exception inside
asab/library/providers/git.py leaks credentials by including self.DeployToken in
struct_data; remove self.DeployToken from struct_data (or replace it with a
redacted marker like "deploy_token_redacted": True) so the deploy token is not
logged, keeping other fields (self.Layer, self.URLPath, self.User) unchanged and
preserving the existing error message in the L.exception call.
- Around line 331-333: The periodic checker is scheduling
initialize_git_repository() every tick without deduplication, causing
overlapping inits; add an in-flight boolean (e.g., self.InitInProgress
initialized in the class __init__) and use it in _periodic_check to skip
scheduling when True, and in initialize_git_repository set self.InitInProgress =
True at start and clear it in a finally block (on both success and failure) so
only one init runs or is enqueued for a given RepoPath; reference symbols:
GitRepository, _periodic_check, initialize_git_repository, TaskService.schedule,
and self.InitInProgress.

---

Duplicate comments:
In `@asab/library/providers/git.py`:
- Around line 468-469: After a successful repo setup (the block that calls await
self._set_ready()), initialize the provider's LastPull timestamp so the
PullInterval throttle applies immediately after startup; set self.LastPull to
the current time (e.g., utcnow/timestamp) right before or after calling
_set_ready() in the same initialization path where clone/init pull succeeds.
This change should be made in the git provider setup code that currently only
calls _set_ready() so subsequent Application.tick/60 checks respect PullInterval
from startup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d0967d1-d103-458a-9295-cbd6b9ce9f99

📥 Commits

Reviewing files that changed from the base of the PR and between 796066c and 313cae0.

📒 Files selected for processing (1)
  • asab/library/providers/git.py

Comment thread asab/library/providers/git.py
Comment thread asab/library/providers/git.py Outdated
Comment thread asab/library/providers/git.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@asab/library/providers/git.py`:
- Around line 397-399: The current validation only checks for the existence of
the remotes attribute (hasattr(self.GitRepository, "remotes")) but does not
ensure the 'origin' remote is present, which can lead to a KeyError later in
_do_fetch(); replace that check with an explicit membership check for "origin"
in self.GitRepository.remotes and raise a RuntimeError with a message mentioning
the missing 'origin' remote if it is not found so the repository is validated
correctly before _do_fetch() uses remotes["origin"].
- Around line 435-470: The catch blocks logging KeyError/pygit2.GitError
indicate permanent config problems (missing branch, 404, auth failure) but leave
GitRepository None so _periodic_check keeps retrying; mark these as terminal by
setting a persistent instance flag (e.g. self._permanent_init_failure = True)
inside the specific exception branches where you call L.exception (use the same
branches that inspect self.Branch/self.URLPath/User), and ensure _periodic_check
first skips rescheduling if self._permanent_init_failure is True (and still
allows a successful init to clear that flag or provide a config-change hook to
reset it). This prevents re-enqueuing known-permanent failures until config
changes or a restart.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8072f79f-50b0-429d-b63b-ab6f4eb29045

📥 Commits

Reviewing files that changed from the base of the PR and between 313cae0 and 033759a.

📒 Files selected for processing (1)
  • asab/library/providers/git.py

Comment thread asab/library/providers/git.py
Comment thread asab/library/providers/git.py Outdated
@eliska-n eliska-n changed the title WIP: Git provider: retry more Git provider: retry more May 4, 2026
@eliska-n eliska-n requested review from ateska and mejroslav May 4, 2026 10:30
@eliska-n
Copy link
Copy Markdown
Collaborator Author

eliska-n commented May 4, 2026

If the microservice starts with existing git repo, it reads it as it is and sets library ready. It pulls immediately after. Even if pull is not possible, the library is still ready. Even though it might be old.

We don't use this behaviour currently in the deployments (git repo persists only inside the container, is deleted on restart)

@eliska-n
Copy link
Copy Markdown
Collaborator Author

eliska-n commented May 4, 2026

Generally, failed pull doesn't flag the library NOT ready. That is convenient when the library is not changing but connection to the server is unreliable.
However, it can lead to a situation when services have different versions of the library ->
-> e.g. when a branch is distributed (e.g. master) and some services pull successfully but others don't.

Should I set the Library not ready when pull fails? @mejroslav @ateska ?
(Personally, I would keep it as it was - failure in pull does not set the library not ready - the files are still there and readable, although they might be outdated)

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@asab/library/providers/git.py`:
- Around line 486-491: After a successful fresh clone you must initialize the
provider's LastPull so PullInterval is honored; modify the code in the branch
after await self._set_ready() to set LastPull (the attribute used by
_periodic_pull) to the current time when fresh_clone is True instead of leaving
it None, so subsequent Application.tick uses the configured PullInterval before
triggering a pull. Update the code path that currently only calls _periodic_pull
when not fresh_clone (around the await self._set_ready() / if not fresh_clone
block) to set LastPull appropriately for the fresh-clone case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6895c899-095a-4dae-82f4-2f2b71f54d8a

📥 Commits

Reviewing files that changed from the base of the PR and between 033759a and 851261c.

📒 Files selected for processing (1)
  • asab/library/providers/git.py

Comment thread asab/library/providers/git.py Outdated
@eliska-n
Copy link
Copy Markdown
Collaborator Author

eliska-n commented May 4, 2026

I tested on localhost that :

  • clone is retried every 60s when git server is not ready
  • pull is retried every 60s unless successful
  • pull interval can be configured via #pull=10m URL fragment

I added a check that the initialition task does not run mutliple times at once.

I rearranged the code (parse_url_fragment, error flow in init) to improve readability.

I have separated periodic_pull method to periodic_check (with check interval 60s) and periodic_pull (with pull interval 12h if not configured). Only to improve readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant