feat(hitl): generic ask_user_via_form capability for selected reasoners#77
Merged
Conversation
Docker pip cache keys on the exact constraint string; >=0.2.0 keeps restoring whatever was first resolved (0.2.0). Bumping the floor forces layer invalidation so downstream Docker builds pick up patch fixes. FormBuilder + create_form_request already exist in 0.2.0; the bump is about cache invalidation, not new functionality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dget)
New swe_af/hitl/ module that lets reasoners pause the workflow and ask
the user a structured question via the Hax SDK FormBuilder. Same
pause/resume mechanism as the existing Phase 1.5 plan-approval gate —
generalized so any reasoner can opt in.
- AskUserForm / AskUserFormField: typed Pydantic spec the LLM emits.
Covers all FormBuilder field types (input, textarea, number, slider,
select, radio, checkbox, checkbox_group, switch, date).
- build_form_builder(): translates an AskUserForm into hax.FormBuilder.
- request_user_input_and_pause(): wraps create_request(type=form-builder)
with the same 120s hard timeout the plan-approval gate uses, then
awaits app.pause() and parses the response back into AskUserResponse.
- run_with_ask_user(): generic reasoner wrapper that loops on the
LLM's ask_user_form output, threading prior_user_responses back into
each subsequent invocation. Budget-capped and max-iteration-capped.
- format_prior_user_responses(): renders accumulated answers as a
markdown block for inclusion in the LLM prompt — keeps the LLM from
re-asking questions already answered.
- build_hax_client_from_env() / approval_webhook_url(): env-driven
plumbing so each reasoner can self-configure without depending on
build()'s setup.
17 unit tests cover form-builder round-trip for all field types,
ApprovalResult parsing for submitted/timeout/cancelled/error decisions,
and the wrapper's no-ask / one-ask / budget-exhausted /
max-iteration / hax-disabled paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… ambiguous Wires the ask_user_via_form substrate into run_product_manager: PRD schema gains an optional ask_user_form field, the prompt grows a 'when to ask' section, and the reasoner now runs through run_with_ask_user so an emitted ask_user_form triggers a real app.pause() until the human responds. Bounded to 2 ask iterations per PM invocation. Falls through to the existing behavior when HAX_API_KEY is unset (no behavioral change for deployments that don't set it). Use case: the goal references multiple features/pages and priority is unclear, or two architecturally different interpretations are plausible and choosing one forecloses the other. Style preferences / details that can be documented as assumptions stay agent-decided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the ask_user_via_form substrate into the two execution-time
reasoners that face the highest-stakes ambiguities:
- run_issue_advisor: choosing between RETRY_MODIFIED and
ACCEPT_WITH_DEBT, which failing acceptance criteria are acceptable
as debt, and whether to ESCALATE_TO_REPLAN — all of these hinge on
user judgment that the agent can't infer from failure context alone.
- run_replanner: ABORT is a project-level decision the user almost
always wants to weigh in on; REDUCE_SCOPE vs MODIFY_DAG hinges on
the user's appetite for partial delivery.
Each reasoner's structured-output schema (IssueAdvisorDecision /
ReplanDecision) gains an optional ask_user_form field. Each prompt
grows a 'when to ask' section. The reasoners now invoke
router.harness() through run_with_ask_user with a per-invocation
budget of 2 asks.
Backwards-compatible: with HAX_API_KEY unset, build_hax_client_from_env
returns None and the wrapper short-circuits the field — behavior is
identical to before.
The existing replanner parse-retry path (2 attempts on unparseable
output) is preserved inside the closure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
Collaborator
Author
Manual validation (local, Tier 3 — live LLM)Exercised the real LLM path on a local control plane + Hax:
Budget cap (2 asks/reasoner) + max-iteration behavior remain covered by the 17 unit tests. CI green. |
AbirAbbas
added a commit
that referenced
this pull request
May 28, 2026
…ness env
Adds the new reasoner that runs once between PM and Architect when HAX is
enabled. The scout reads the PRD + repo, identifies third-party services
whose absence would block the work, and asks the user for scoped /
temporary tokens via a single Hax mega-form. Submitted values are stashed
in the in-memory credentials store keyed by run_id; the scout's return
payload OMITS scoped_credentials so the secrets never reach the control-
plane workflow_execution row.
- swe_af/prompts/environment_scout.py — system prompt + task-prompt
builder. Strong guidance on when NOT to ask (purely local PRD, prior
answers already cover the question, no genuine PRD-blocking
requirement).
- swe_af/reasoners/pipeline.py — @router.reasoner async def
run_environment_scout. Same wrapper shape as the three reasoners
from PR #77; uses run_with_ask_user with budget=2.
- swe_af/app.py:
* plan() — Phase 1.5 calls run_environment_scout via app.call BETWEEN
PM and architect; guarded so it runs only when HAX_API_KEY is set.
* build() body wrapped in try/finally so clear_scoped_credentials
ALWAYS runs on exit (success or exception). Eliminates secret
leakage across builds within the same agent process.
* app.harness is monkey-patched once at module load to auto-inject
stored credentials as env vars on EVERY harness call across the
pipeline. Avoids touching the 25+ existing call sites.
Backwards-compatible: with HAX_API_KEY unset, plan() skips the scout and
the monkey-patched harness passes os.environ through unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbirAbbas
added a commit
that referenced
this pull request
May 28, 2026
…rchitecture (#78) * feat(hitl): substrate for the environment scout (services + creds store + schema) Three new modules under swe_af/hitl/: - services.py — knowledge base of 9 common third-party services (Railway, Fly.io, Vercel, Supabase, Sentry, Datadog, GitHub, OpenAI, Anthropic) with their env var conventions, mint URLs, permissions hints, and signal files. Plus detect_services_from_repo() for a deterministic static pre-pass the LLM scout can build on. - credentials_store.py — process-local, execution-scoped dict for the credentials the scout negotiates. Keyed by run_id, thread-safe, isolates concurrent builds, NEVER persists. The full discussion of why this is in-memory (not BuildConfig, not app.memory, not the filesystem) lives in the module docstring. - scout_schema.py — ScoutResult Pydantic model used as the harness schema. Includes an explicit "scoped_credentials must NEVER round- trip through model_dump unless excluded" comment for callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(hitl): run_environment_scout reasoner + wire into plan() and harness env Adds the new reasoner that runs once between PM and Architect when HAX is enabled. The scout reads the PRD + repo, identifies third-party services whose absence would block the work, and asks the user for scoped / temporary tokens via a single Hax mega-form. Submitted values are stashed in the in-memory credentials store keyed by run_id; the scout's return payload OMITS scoped_credentials so the secrets never reach the control- plane workflow_execution row. - swe_af/prompts/environment_scout.py — system prompt + task-prompt builder. Strong guidance on when NOT to ask (purely local PRD, prior answers already cover the question, no genuine PRD-blocking requirement). - swe_af/reasoners/pipeline.py — @router.reasoner async def run_environment_scout. Same wrapper shape as the three reasoners from PR #77; uses run_with_ask_user with budget=2. - swe_af/app.py: * plan() — Phase 1.5 calls run_environment_scout via app.call BETWEEN PM and architect; guarded so it runs only when HAX_API_KEY is set. * build() body wrapped in try/finally so clear_scoped_credentials ALWAYS runs on exit (success or exception). Eliminates secret leakage across builds within the same agent process. * app.harness is monkey-patched once at module load to auto-inject stored credentials as env vars on EVERY harness call across the pipeline. Avoids touching the 25+ existing call sites. Backwards-compatible: with HAX_API_KEY unset, plan() skips the scout and the monkey-patched harness passes os.environ through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(hitl): 17 unit tests for the environment-scout substrate Three pillars covered: - services.py — KNOWN_SERVICES inventory bounds, missing-path safety, file + directory signal detection, prompt-summary rendering. - credentials_store.py — round-trip, blank/None filtering, isolation between execution_ids, get-returns-copy, concurrent thread safety, inject-into-env layering rules. - scout closure round-trip — pass 1 emits ask_user_form via the wrapper, pass 2 sees prior_user_responses and returns scoped_credentials; no-services-detected short-circuits the pause; model_dump(exclude={"scoped_credentials"}) actually strips the field. All tests mock HaxClient + app.pause; no real network, no real harness. Pin a baseline of 8+ services so future trimming is visible in diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Generalizes the existing Phase 1.5 plan-approval gate into a reusable
ask_user_via_formsubstrate. Three reasoners (run_product_manager,run_issue_advisor,run_replanner) can now emit anask_user_formfield in their structured output; when populated, the workflow pauses
on the control plane (real
app.pause, hours/days if needed) until theuser submits, then re-invokes the reasoner with the answers in
prior_user_responses.Why
The existing HITL surface is exactly one moment — Phase 1.5 plan approval.
Everywhere else the agent silently picks a default when the right
answer hinges on user judgment (which failing acceptance criteria are
acceptable as debt? abort or reduce scope? which of multiple plausible
goal interpretations?). This adds an opt-in way for the LLM itself to
escalate those cases.
What's in this PR
New substrate —
swe_af/hitl/:AskUserForm/AskUserFormField— Pydantic spec the LLM emits. Coversall FormBuilder field types (input, textarea, number, slider, select,
radio, checkbox, checkbox_group, switch, date).
build_form_builder(spec)— translates the spec intohax.FormBuilder.request_user_input_and_pause(...)— sends form viacreate_request(type="form-builder")(wrapped with the same 120s hard timeout the plan-approval gate uses),
then
await app.pause(...), then parses the response intoAskUserResponse.run_with_ask_user(...)— generic reasoner wrapper that loops onask_user_formoutput, threadingprior_user_responsesback into eachsubsequent invocation. Budget-capped (
AskUserBudget) and max-iteration-capped.format_prior_user_responses(prior)— renders accumulated answers asa markdown block so the LLM doesn't re-ask questions already answered.
build_hax_client_from_env()/approval_webhook_url(app)— env-drivenplumbing so each reasoner self-configures.
Initial allowlist (each reasoner gets
ask_user_formschema field + prompt guidance):run_product_manager— for fundamentally ambiguous goals where twointerpretations would yield very different PRDs.
run_issue_advisor— for RETRY_MODIFIED vs ACCEPT_WITH_DEBT trade-offsand which failing acceptance criteria are acceptable as debt.
run_replanner— for ABORT (project-level judgment) and REDUCE_SCOPEvs MODIFY_DAG (user's appetite for partial delivery).
Each reasoner caps itself at 2 ask iterations per invocation. Across a
build, total asks are bounded by call-site count (each reasoner
invocation has its own budget — cross-reasoner sharing wasn't feasible
because
run_issue_advisor/run_replannerare invoked acrossreasoner boundaries via
app.call()).Dependency bump:
hax-sdk>=0.2.0→>=0.2.4inrequirements.txt,requirements-docker.txt, andpyproject.toml.Docker pip cache keys on the constraint string; without the floor bump,
cached layers keep installing whatever was first resolved.
FormBuilderand
create_form_requestwere already in 0.2.0; this is purely cacheinvalidation.
What this PR does NOT change
type="plan-review-v2").Stays as-is; runs alongside the new substrate.
HAX_API_KEYis unset.build_hax_client_from_envreturns
None; the wrapper short-circuits. Pipeline behavior isidentical to
main.(LLM emits
ask_user_formin its structured response) rather thanmigrating to mid-turn tool calls, because the workflow pause is
durable (hours/days) and a mid-turn tool would have to hold the LLM
conversation open across that interval.
Test plan
python -m pytest tests/test_ask_user.py— 17/17 pass locallypython -m pytest tests/test_hax_create_request_timeout.py— 52/52 still passruff check swe_af/hitl/ tests/test_ask_user.py <touched reasoner/prompt/schema files>— cleanorigin/main— net zero new findingsHAX_API_KEYset: LLM emitsask_user_form, form renders in Hub, submit, reasoner resumes with answers inprior_user_responses(deferred to follow-up — requires a live Hax + control plane)Files touched
requirements.txt,requirements-docker.txt,pyproject.toml— pin bumpswe_af/hitl/{__init__,ask_user,wrapper}.py— newtests/test_ask_user.py— newswe_af/reasoners/schemas.py—PRD.ask_user_formfieldswe_af/execution/schemas.py—IssueAdvisorDecision.ask_user_form,ReplanDecision.ask_user_formswe_af/reasoners/pipeline.py—run_product_managerruns throughrun_with_ask_userswe_af/reasoners/execution_agents.py—run_issue_advisorandrun_replannerlikewiseswe_af/prompts/{product_manager,issue_advisor,replanner}.py— when-to-ask guidance +prior_user_responsesthreading🤖 Generated with Claude Code