feat: add agent catalog/auth API and safer orchestrator switching#276
feat: add agent catalog/auth API and safer orchestrator switching#276nikhilachale wants to merge 9 commits into
Conversation
- Implemented AgentsController to handle /agents endpoint, returning a list of supported and installed agents. - Created agent inventory service to manage agent data and detect installed agents. - Updated ProjectSettingsForm to fetch and display agent information, including installed and supported agents. - Enhanced error handling for agent detection and orchestrator restarts. - Added tests for agent catalog and service to ensure correct functionality and error handling.
…flect changes - Added `AuthStatus` method to various agent plugins to check authorization status using CLI probes. - Introduced `authprobe` package to handle common CLI command checks for agent authorization. - Updated backend tests to include scenarios for authorized and unauthorized agents. - Modified frontend API schema to include `authorized` counts and `authStatus` for agents. - Enhanced `ProjectSettingsForm` to display authorized agents and their statuses, including prompts for login when necessary. - Adjusted agent selection logic to prioritize authorized agents and provide feedback for unauthorized or uninstalled agents.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
…references from API and frontend
| return nil | ||
| } | ||
| sort.Sort(sort.Reverse(sort.StringSlice(matches))) | ||
| return matches |
There was a problem hiding this comment.
`nvmNodeBinCandidates` is missing its closing brace — the function body never terminates before `func resolveNativeWindowsCodex(...)` starts on the next line, which nests a named function declaration inside another function. This is not valid Go and the package will not compile as currently pushed:
```go
func nvmNodeBinCandidates(home, binary string) []string {
matches, err := filepath.Glob(filepath.Join(home, ".nvm", "versions", "node", "*", "bin", binary))
if err != nil || len(matches) == 0 {
return nil
}
sort.Sort(sort.Reverse(sort.StringSlice(matches)))
return matches
func resolveNativeWindowsCodex(path string) string { // <-- missing } above
Needs a `}` after `return matches` to close `nvmNodeBinCandidates` before `resolveNativeWindowsCodex` starts. Given `TestResolveCodexBinaryFindsNVMInstallWhenPathIsSparse` was added in this same PR, this file couldn't have been built/tested in its current state — worth double-checking the push matches what was actually tested locally.
| } | ||
|
|
||
| // Service reports supported and locally runnable agent adapters. | ||
| type Service struct { |
There was a problem hiding this comment.
Naming nit: every other internal/service/* package names its main file after either the package itself (review.go in package review) or its primary type (manager.go ↔ Manager, action_service.go ↔ ActionService). Here the file is catalog.go but the type is Service, not Catalog — so the file/type pairing doesn't match the convention elsewhere in internal/service/.
Since this is a brand-new single-file package, either renaming this file to service.go (matching project/session/review's "main file" convention) or renaming the type to Catalog (to match this file) would read more consistently with the rest of the tree. Not blocking, just a naming consistency nit.
| if _, err := s.Kill(ctx, orch.ID); err != nil { | ||
| return domain.Session{}, err | ||
| } | ||
| sess, err := s.Spawn(ctx, ports.SpawnConfig{ProjectID: projectID, Kind: domain.KindOrchestrator}) |
There was a problem hiding this comment.
No locking around read-existing → spawn-new → retire-old in SpawnOrchestrator(clean=true). Two concurrent calls for the same project will both snapshot the same existing orchestrators, both spawn a new one, and both retire the same old one — leaving two new orchestrators alive at once, breaking the "only one live coordinator" invariant this flag exists to guarantee. Needs a per-project lock (or a DB-level uniqueness check) around this sequence.
| if err != nil { | ||
| m.logger.Warn("session manager: old orchestrator probe failed after runtime destroy", | ||
| "session", id, "err", err) | ||
| } else if alive { |
There was a problem hiding this comment.
If IsAlive still reports true after recordRetiredTermination already succeeded, the loop retries Destroy/recordRetiredTermination again, and on final failure returns ErrSessionStillAlive — but the DB already says terminated. Since List(Active: true) won't surface a terminated row, nothing will ever retry killing this runtime again: a zombie orchestrator can leak silently while the system believes it's fully retired. Don't re-record termination once it has already succeeded; surface a distinct "terminated but still alive" signal so recovery tooling can retry the kill.
|
Please resolve merge conflicts and test the flow end to end locally as well. |
Summary
This PR adds a daemon-backed agent catalog, exposes installed/authorized agent state to the frontend, and uses that data in project settings so users can choose worker/orchestrator agents more safely.
It also adds orchestrator replacement handling: when the saved orchestrator agent changes, AO starts the replacement first and only retires the previous orchestrator after the new one is up, so a failed replacement does not cause downtime.
A key caveat is that agent-auth/login flows can interfere with replacement startup. If switching agents triggers the agent’s own bootstrap path, the replacement may come up outside AO’s normal orchestrator initialization path and miss the AO orchestrator system prompt.
What Changed
Backend
Added agent inventory service for:
Added optional AgentAuthChecker capability on adapters.
Added shared CLI auth probing helper for adapters with cheap local auth checks.
Added GET /api/v1/agents.
Extended registry inventory entries to carry adapter manifest metadata for user-facing labels.
Added orchestrator replacement flow in the session service:
Added backend tests for agent catalog, controller responses, session replacement behavior, and related project/service wiring.
Frontend
Regenerated API types for the new agents endpoint/DTOs.
Updated ProjectSettingsForm to:
Added/updated tests for the new settings behavior.
Why
Before this change, the UI did not have a daemon-backed view of which agents are actually installed and authenticated on the local machine, and changing orchestrator agent config did not have a clear replacement flow.
This PR makes agent selection more grounded in local runtime state and reduces the chance of downtime during orchestrator switches.
Risks / Caveats
A large part of the file count comes from:
The CLI/runtime/session model is unchanged outside the new inventory/auth and orchestrator replacement paths.
Generated files are included intentionally:
Auth/login flows remain a review risk. If switching agents triggers the agent’s own login/bootstrap flow, that flow can spawn a fresh native session outside AO’s normal orchestrator startup path.
In that case, the replacement session may miss AO’s expected initialization, including the orchestrator system prompt.
The old orchestrator is intentionally preserved on replacement failure, but reviewers should pay close attention to whether
replacement startup still guarantees AO system-prompt delivery.
Closes #275