Skip to content

feat: add agent catalog/auth API and safer orchestrator switching#276

Open
nikhilachale wants to merge 9 commits into
aoagents:mainfrom
nikhilachale:agent-switcher
Open

feat: add agent catalog/auth API and safer orchestrator switching#276
nikhilachale wants to merge 9 commits into
aoagents:mainfrom
nikhilachale:agent-switcher

Conversation

@nikhilachale

@nikhilachale nikhilachale commented Jun 17, 2026

Copy link
Copy Markdown

Summary

This PR adds a daemon-backed agent catalog, exposes installed/authorized agent state to the frontend, and uses that data in project settings so users can choose worker/orchestrator agents more safely.

It also adds orchestrator replacement handling: when the saved orchestrator agent changes, AO starts the replacement first and only retires the previous orchestrator after the new one is up, so a failed replacement does not cause downtime.

A key caveat is that agent-auth/login flows can interfere with replacement startup. If switching agents triggers the agent’s own bootstrap path, the replacement may come up outside AO’s normal orchestrator initialization path and miss the AO orchestrator system prompt.

What Changed

Backend

  • Added agent inventory service for:

    • supported agents
    • installed agents
    • authorized agents
    • counts for each
  • Added optional AgentAuthChecker capability on adapters.

  • Added shared CLI auth probing helper for adapters with cheap local auth checks.

  • Added GET /api/v1/agents.

  • Extended registry inventory entries to carry adapter manifest metadata for user-facing labels.

  • Added orchestrator replacement flow in the session service:

    • spawn replacement first
    • retire previous orchestrator only after successful replacement
    • preserve previous orchestrator when replacement startup fails
  • Added backend tests for agent catalog, controller responses, session replacement behavior, and related project/service wiring.

Frontend

  • Regenerated API types for the new agents endpoint/DTOs.

  • Updated ProjectSettingsForm to:

    • load agent catalog from the daemon
    • show authorized agent options
    • handle installed-but-not-authorized states
    • surface orchestrator replacement pending state
    • allow retry once replacement is safe to perform
  • Added/updated tests for the new settings behavior.

Why

Before this change, the UI did not have a daemon-backed view of which agents are actually installed and authenticated on the local machine, and changing orchestrator agent config did not have a clear replacement flow.

This PR makes agent selection more grounded in local runtime state and reduces the chance of downtime during orchestrator switches.

Risks / Caveats

  • A large part of the file count comes from:

    • generated API artifacts
    • frontend tests
    • small per-adapter auth probe shims
  • The CLI/runtime/session model is unchanged outside the new inventory/auth and orchestrator replacement paths.

  • Generated files are included intentionally:

    • backend/internal/httpd/apispec/openapi.yaml
    • frontend/src/api/schema.ts
  • Auth/login flows remain a review risk. If switching agents triggers the agent’s own login/bootstrap flow, that flow can spawn a fresh native session outside AO’s normal orchestrator startup path.

  • In that case, the replacement session may miss AO’s expected initialization, including the orchestrator system prompt.

  • The old orchestrator is intentionally preserved on replacement failure, but reviewers should pay close attention to whether
    replacement startup still guarantees AO system-prompt delivery.

Closes #275

- Implemented AgentsController to handle /agents endpoint, returning a list of supported and installed agents.
- Created agent inventory service to manage agent data and detect installed agents.
- Updated ProjectSettingsForm to fetch and display agent information, including installed and supported agents.
- Enhanced error handling for agent detection and orchestrator restarts.
- Added tests for agent catalog and service to ensure correct functionality and error handling.
…flect changes

- Added `AuthStatus` method to various agent plugins to check authorization status using CLI probes.
- Introduced `authprobe` package to handle common CLI command checks for agent authorization.
- Updated backend tests to include scenarios for authorized and unauthorized agents.
- Modified frontend API schema to include `authorized` counts and `authStatus` for agents.
- Enhanced `ProjectSettingsForm` to display authorized agents and their statuses, including prompts for login when necessary.
- Adjusted agent selection logic to prioritize authorized agents and provide feedback for unauthorized or uninstalled agents.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@nikhilachale

Copy link
Copy Markdown
Author

Comment thread backend/internal/session_manager/manager.go
return nil
}
sort.Sort(sort.Reverse(sort.StringSlice(matches)))
return matches

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`nvmNodeBinCandidates` is missing its closing brace — the function body never terminates before `func resolveNativeWindowsCodex(...)` starts on the next line, which nests a named function declaration inside another function. This is not valid Go and the package will not compile as currently pushed:

```go
func nvmNodeBinCandidates(home, binary string) []string {
matches, err := filepath.Glob(filepath.Join(home, ".nvm", "versions", "node", "*", "bin", binary))
if err != nil || len(matches) == 0 {
return nil
}
sort.Sort(sort.Reverse(sort.StringSlice(matches)))
return matches
func resolveNativeWindowsCodex(path string) string { // <-- missing } above


Needs a `}` after `return matches` to close `nvmNodeBinCandidates` before `resolveNativeWindowsCodex` starts. Given `TestResolveCodexBinaryFindsNVMInstallWhenPathIsSparse` was added in this same PR, this file couldn't have been built/tested in its current state — worth double-checking the push matches what was actually tested locally.

}

// Service reports supported and locally runnable agent adapters.
type Service struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit: every other internal/service/* package names its main file after either the package itself (review.go in package review) or its primary type (manager.goManager, action_service.goActionService). Here the file is catalog.go but the type is Service, not Catalog — so the file/type pairing doesn't match the convention elsewhere in internal/service/.

Since this is a brand-new single-file package, either renaming this file to service.go (matching project/session/review's "main file" convention) or renaming the type to Catalog (to match this file) would read more consistently with the rest of the tree. Not blocking, just a naming consistency nit.

if _, err := s.Kill(ctx, orch.ID); err != nil {
return domain.Session{}, err
}
sess, err := s.Spawn(ctx, ports.SpawnConfig{ProjectID: projectID, Kind: domain.KindOrchestrator})

@neversettle17-101 neversettle17-101 Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No locking around read-existing → spawn-new → retire-old in SpawnOrchestrator(clean=true). Two concurrent calls for the same project will both snapshot the same existing orchestrators, both spawn a new one, and both retire the same old one — leaving two new orchestrators alive at once, breaking the "only one live coordinator" invariant this flag exists to guarantee. Needs a per-project lock (or a DB-level uniqueness check) around this sequence.

if err != nil {
m.logger.Warn("session manager: old orchestrator probe failed after runtime destroy",
"session", id, "err", err)
} else if alive {

@neversettle17-101 neversettle17-101 Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If IsAlive still reports true after recordRetiredTermination already succeeded, the loop retries Destroy/recordRetiredTermination again, and on final failure returns ErrSessionStillAlive — but the DB already says terminated. Since List(Active: true) won't surface a terminated row, nothing will ever retry killing this runtime again: a zombie orchestrator can leak silently while the system believes it's fully retired. Don't re-record termination once it has already succeeded; surface a distinct "terminated but still alive" signal so recovery tooling can retry the kill.

@neversettle17-101

Copy link
Copy Markdown
Collaborator

Please resolve merge conflicts and test the flow end to end locally as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants