Skip to content

Commit 2c582a7

Browse files
committed
fix(orchestrator): comprehensive workflow engine overhaul - 8 major fixes
Complete redesign of agent orchestrator workflow engine to fix critical bugs in todo tracking, message alternation, and continuation guidance. Evolved from rigid "force tools" approach to intelligent context-aware orchestration that follows the orchestrator.txt flow diagram correctly. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SUMMARY OF FIXES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ **Fix #1: Todo Workflow Infinite Loops & Update-Before-Create** Agent tried to update todos before creating list, causing crashes and loops. Graduated interventions gave contradictory instructions. Solution: - Added CRITICAL ERROR TO AVOID section in todo_operations tool - Visual step-by-step workflow added to tool description - Rewrote all 3 graduated intervention levels with consistent guidance - Implemented proper Continue Flag pattern matching orchestrator.txt - Added shouldContinueAfterChecks flag for correct workflow flow Result: No more update-before-create crashes, workflow matches design exactly **Fix #2: Response Loops (Agent Repeating Same Text)** Agent stuck repeating same response infinitely when todos incomplete. pendingAutoContinueMessage was set but never injected. Solution: - Inject pendingAutoContinueMessage at iteration start - Call injectAutoContinueIfTodosIncomplete() when no tools + active todos - Graduated intervention (Level 1 → 2 → 3) now works as designed - Remove last assistant message to prevent loops Result: No more infinite response loops, graduated intervention working **Fix #3: Tool Result Infinite Loop (read_tool_result Stuck)** Agent stuck seeing SAME tool result chunk repeatedly for 15+ iterations. TOOL_RESULT_CHUNK messages preserved across iterations incorrectly. Solution: - Removed preservation logic for TOOL_RESULT_CHUNK messages - Chunks now appear once when tool executes - Agent must call read_tool_result to get more chunks - Proper pagination flow restored Result: No more chunk re-injection loops, clean pagination behavior **Fix #4: Message Alternation Violations** Multiple consecutive assistant messages broke Claude API compatibility. Evolved through 3 iterations: Rigid → Binary → Flexible → Context-Aware Final Solution - Todo-Aware Continuation Guidance: - 4 guidance variants: (has todos YES/NO) × (tools used YES/NO) - With todos + tools: "MANDATORY TODO WORKFLOW: mark → work → complete" - With todos + no tools: "You have incomplete todos - MUST follow workflow" - Without todos + tools: "Need more data? → tools. Have enough? → respond" - Without todos + no tools: "Already answered? Use tools for follow-up" Result: Fixes consecutive messages, allows flexibility, enforces discipline **Fix #5: Planning Loop False Positives** Planning loop detector flagged normal workflow (mark todo → work → complete) as infinite loop because it saw consecutive todo_operations calls. Solution: - Added isTodoCompletionCall() helper function - Marking todos complete now counts as progress - Only flag as loop if NO work tools AND NO todo completions Result: Normal workflow allowed, actual loops still detected **Fix #6: Stale Todo List (Workflow Stopped Early)** Workflow stopped when todos incomplete because orchestrator saw stale "all complete" state. currentTodoList only updated after tool execution. Solution: - Read fresh todo list from MCP BEFORE every workflow check - Only if currentTodoList.count > 0 (known active list exists) - TodoReminderInjector: Clarified workflow guidance wording - Makes explicit: mark in-progress → DO THE WORK → mark completed Result: Workflows continue correctly, no premature stops, fresh state accurate **Fix #7: Duplicate Tool Cards in Streaming Mode** Tool cards appeared twice in UI - once from streaming, once from main loop. Solution: - Skip tool message creation in main loop when streaming active - Check streamContinuation - if present, streaming created messages - Non-streaming unchanged Result: Clean UI, no duplicate cards **Fix #8: Web Research Error Card Clutter** Red error cards for expected situations (empty pages, no results). VectorRAGError helpful messages wrapped with confusing text. Solution: - Pass through VectorRAGError messages without wrapping - Handle partial failures gracefully (some sources succeed = green card) - Only show error if ALL sources fail Result: Clean UI, helpful guidance preserved ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TESTING RESULTS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✅ Simple workflow (3 stories): All tracked, completed, brief summary ✅ Complex workflow (3 research tasks): All tracked, completed, brief summary ✅ Fresh todo reads detect incomplete todos correctly ✅ Agent doesn't repeat work in final summary ✅ No more response loops or chunk re-injection ✅ Planning loop detector allows normal workflow ✅ Context-aware continuation guidance works ✅ Streaming mode: no duplicate tool cards ✅ Web research: clean error handling ✅ Build: PASS (all commits) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ DOCUMENTATION ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Added comprehensive documentation: - project-docs/AGENT_ORCHESTRATOR.md (complete architecture) * Complete workflow flow diagram (Mermaid) * Detailed 8-step decision tree * Fresh todo state read documentation * Continuation priority table * All fixes documented with root causes * Known Issues updated with resolutions - ai-assisted/2026-01-04/workflow-alternation-fix/ * CONTINUATION_PROMPT.md (session handoff) * AGENT_PLAN.md (remaining work breakdown) - .github/copilot-instructions.md * Added isBackground=false requirement (CRITICAL) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FILES MODIFIED ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Primary: - Sources/APIFramework/AgentOrchestrator.swift * Fresh todo reads before workflow checks * Context-aware continuation guidance system * Graduated intervention injection fixed * Planning loop detection improvements * Duplicate tool card fix for streaming - Sources/MCPFramework/TodoReminderInjector.swift * Todo-aware workflow guidance (4 variants) * Clear final message guidance when all tasks complete - Sources/MCPFramework/Tools/TodoOperationsTool.swift * CREATE FIRST requirement documentation * Visual step-by-step workflow added - Sources/ConfigurationSystem/SimpleSystemPromptManager.swift * Todo workflow discipline documentation Supporting: - Sources/MCPFramework/Tools/WebResearchTool.swift * VectorRAGError pass-through without wrapping - Sources/ConversationEngine/WebResearchService.swift * Partial failure handling (some sources succeed) Documentation: - project-docs/AGENT_ORCHESTRATOR.md (new + updated) - ai-assisted/2026-01-04/workflow-alternation-fix/* (new) - .github/copilot-instructions.md (updated) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ARCHITECTURAL IMPROVEMENTS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Context-Aware Guidance System: - 4 guidance variants adapt to workflow state - Based on: (has incomplete todos?) × (tools called?) - Each variant provides specific, actionable instructions - Prevents workflow violations through clear communication Fresh State Reads: - Todo list read fresh from MCP before every workflow check - Prevents stale cache bugs that caused premature stops - Only reads when active todo list exists (performance optimization) Graduated Intervention Pressure: - Level 1: Polite reminder about incomplete todos - Level 2: Warning about loop behavior - Level 3: Final warning before failure - Escalates pressure if agent keeps ignoring todos Proper Continue Flag Pattern: - Matches orchestrator.txt flow diagram exactly - Can be set by: tool execution, incomplete todos, workflow mode - Priority ordering ensures correct continuation behavior Unified Todo Workflow Discipline: - All guidance sources enforce same workflow - CREATE list first, THEN mark in-progress - Mark in-progress → DO THE WORK → mark completed - Never skip status updates Code Quality: - Eliminated ~100 lines of redundant code - Cleaner separation: streaming vs non-streaming paths - Better error handling and logging Result: Intelligent orchestrator that adapts guidance based on workflow context while enforcing todo discipline and preventing loops.
1 parent 8a54afe commit 2c582a7

10 files changed

Lines changed: 1571 additions & 680 deletions

.github/copilot-instructions.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,33 @@ Ready to end session? Press Enter:"
155155

156156
---
157157

158+
## CRITICAL: run_in_terminal MUST NEVER USE isBackground=true
159+
160+
**❌ NEVER DO THIS:**
161+
```bash
162+
run_in_terminal(command: "make build", isBackground: true) # WRONG! Causes silent failures
163+
run_in_terminal(command: "git commit", isBackground: true) # WRONG! Command gets cancelled
164+
```
165+
166+
**✅ ALWAYS DO THIS:**
167+
```bash
168+
run_in_terminal(command: "make build", isBackground: false) # CORRECT
169+
run_in_terminal(command: "git commit", isBackground: false) # CORRECT
170+
```
171+
172+
**WHY:**
173+
- `isBackground=true` causes commands to be cancelled/interrupted
174+
- You won't see output or know if command succeeded
175+
- Git commits, builds, and all other commands REQUIRE `isBackground=false`
176+
- This is a HARD REQUIREMENT - violations cause session failure
177+
178+
**THE RULE:**
179+
- ALWAYS set `isBackground: false` for ALL commands
180+
- NEVER use `isBackground: true` for ANY command
181+
- If unsure, default to `false`
182+
183+
---
184+
158185
## SAM-SPECIFIC DEVELOPMENT
159186

160187
### Build System

Sources/APIFramework/AgentOrchestrator.swift

Lines changed: 557 additions & 585 deletions
Large diffs are not rendered by default.

Sources/ConfigurationSystem/SimpleSystemPromptManager.swift

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -229,11 +229,29 @@ public class SimpleSystemPromptManager: ObservableObject {
229229
230230
## WORKFLOW TRACKING
231231
- **For multi-step tasks** (series of stories, multiple edits, sequential analysis, debugging, development, etc.):
232-
* Create todos: `{"name":"todo_operations","arguments":{"operation":"write","todoList":[{"id":1,"title":"Task 1","description":"...","status":"not-started"}]}}`
233-
* Mark complete: `{"name":"todo_operations","arguments":{"operation":"update","todoUpdates":[{"id":1,"status":"completed"}]}}`
234-
* This allows the system to track your progress and remind you of remaining items
235-
* Mark each todo as completed IMMEDIATELY after finishing it - do not continue same task
236-
* Example: User requests 7 stories → create 7 todos, write each story, mark each complete, then move to next
232+
233+
**STEP 1 - Create the todo list (first time only):**
234+
* `{"name":"todo_operations","arguments":{"operation":"write","todoList":[{"id":1,"title":"Task 1","description":"...","status":"not-started"}]}}`
235+
* Set ALL todos as "not-started" when creating the list
236+
237+
**STEP 2 - Mark one todo as in-progress:**
238+
* `{"name":"todo_operations","arguments":{"operation":"update","todoUpdates":[{"id":1,"status":"in-progress"}]}}`
239+
* Only do this AFTER the list exists (after STEP 1)
240+
241+
**STEP 3 - Do the work:**
242+
* Execute the actual task using appropriate tools
243+
244+
**STEP 4 - Mark todo complete:**
245+
* `{"name":"todo_operations","arguments":{"operation":"update","todoUpdates":[{"id":1,"status":"completed"}]}}`
246+
* Do this IMMEDIATELY after finishing each task
247+
248+
**STEP 5 - Repeat:**
249+
* Go back to STEP 2 for next todo
250+
251+
**CRITICAL - Common mistake:**
252+
* ❌ WRONG: Try to mark a todo in-progress before creating the list
253+
* ✅ CORRECT: Create list with 'write' operation FIRST, then update with 'update'
254+
237255
- **Todo list format**: Array of objects with id, title, description, status ("not-started", "in-progress", "completed")
238256
- **Why use todos**: Enables progress tracking across long workflows, prevents stopping early
239257
"""

Sources/ConfigurationSystem/SystemPromptConfiguration.swift

Lines changed: 26 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -520,59 +520,32 @@ private static func buildSAMSpecificPatterns() -> String {
520520
521521
**Sequential Lists:** One item per message, emit continue after each (except last → complete).
522522
523-
MULTI-STEP REQUESTS - TODO LIST WORKFLOW (MANDATORY):
524-
525-
For multi-step tasks, you MUST use the todo_operations tool to plan and track progress:
526-
527-
**STEP 1 - CREATE TODO LIST:**
528-
- Use todo_operations(write) to create a structured plan
529-
- Break work into actionable, trackable steps
530-
- Set the FIRST task as "in-progress"
531-
532-
**STEP 2 - WORK ON EACH TODO:**
533-
- Before starting ANY todo: Ensure it is marked "in-progress"
534-
- Execute work tools (web_operations, file_operations, terminal_operations)
535-
- Produce tangible results (lists, files, charts, data)
536-
- Mark the todo "completed" IMMEDIATELY after finishing
537-
- Move to next todo and repeat
538-
539-
**CRITICAL TODO WORKFLOW RULES:**
540-
- ALWAYS mark exactly ONE todo "in-progress" before starting work on it
541-
- ALWAYS mark a todo "completed" immediately after finishing (not in batches)
542-
- NEVER work on a task without first marking it "in-progress"
543-
- NEVER leave multiple todos in "in-progress" state
544-
- Update todos frequently - the user sees your progress through the todo list
545-
546-
**CORRECT TODO SEQUENCE:**
547-
1. todo_operations(write) → create plan with first item in-progress
548-
2. Execute work tool → produce tangible result
549-
3. todo_operations(update: completed) → mark current done
550-
4. todo_operations(update: in-progress) → mark next task started
551-
5. Repeat until all complete
552-
553-
**FAILURE PATTERNS TO AVOID:**
554-
- Creating todos but never calling todo_operations(update) to mark them complete = FAILURE
555-
- Writing "Task 1 complete" in your response instead of calling the tool = FAILURE
556-
- Doing work without calling todo_operations(update) afterward = FAILURE
557-
- Restating the todo list in plain text instead of calling the tool = FAILURE
558-
- Describing progress verbally but not updating the actual todo list = FAILURE
559-
560-
**CRITICAL ANTI-PATTERN:**
561-
Saying "I've completed brainstorming" or "Task 1 is done" in your text response
562-
is NOT the same as calling todo_operations(update) to mark it completed.
563-
You MUST call the tool - the system cannot infer status from your text.
564-
565-
**PLANNING LOOP DETECTION:**
566-
- If you've outlined the same plan 2+ times, you are stuck
567-
- STOP planning and immediately execute a work tool
568-
569-
**TANGIBLE OUTPUT REQUIRED:**
570-
- Each step must produce tool-generated results (lists, files, charts, data)
571-
- Text summaries alone are NOT progress - use tools to produce deliverables
572-
573-
**Collaboration Override:** If user asks to "check with me first" or "collaborate", wait for their response before proceeding.
574-
575-
**Tool Results in History:** Previous tool outputs are YOUR results - use them, don't re-call tools.
523+
MULTI-STEP REQUESTS - TODO LIST WORKFLOW:
524+
525+
**When to use todos:** Multi-step tasks that benefit from visible progress tracking
526+
527+
**Starting fresh (no todos yet):**
528+
1. FIRST: Create todo list with todo_operations(operation: "write", todoList: [...])
529+
- Set first todo: "in-progress"
530+
- Set remaining todos: "not-started"
531+
2. Then proceed with workflow below
532+
533+
**Working with existing todos:**
534+
1. Do the work for current in-progress todo
535+
2. Mark it completed: todo_operations(operation: "update", todoUpdates: [{"id": X, "status": "completed"}])
536+
3. Mark next todo in-progress: todo_operations(operation: "update", todoUpdates: [{"id": Y, "status": "in-progress"}])
537+
4. Repeat until all complete
538+
539+
**CRITICAL RULES:**
540+
- ALWAYS create todos FIRST before trying to update them (NEVER call update when no todos exist)
541+
- You MUST call todo_operations(update) to change todo status - the system cannot infer status from your text
542+
- When completing a todo: Mark it done with the tool, then move to next todo - do NOT repeat the work in your response
543+
- Each todo gets ONE completion response - mark done and move forward
544+
545+
**Anti-duplication:**
546+
- After completing Todo 1: Mark complete, start Todo 2, work on Todo 2
547+
- Do NOT: Complete Todo 1, mark done, then re-summarize Todo 1's results again
548+
- Tool call = progress indicator, not invitation to repeat output
576549
577550
**Before Complete:** Verify ALL requested items delivered. If user asked for N things, confirm N things done.
578551
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
@@ -360,20 +360,6 @@
2+
- **Not give up unless request genuinely cannot be fulfilled**
3+
4+
**Continue working until the user's request is completely resolved. Only stop when certain the task is complete. Do not stop when encountering uncertainty — research or deduce the most reasonable approach and continue.**
5+
-
6+
- **Handling Partial Success:**
7+
- - When some tool calls succeed and others fail, USE THE SUCCESSFUL DATA
8+
- - Partial data is often sufficient to make progress - do not wait for perfect information
9+
- - Do not focus on failures; extract maximum value from successes
10+
- - Multiple failed attempts do not mean the tool is broken - adapt your approach
11+
-
12+
- **Adaptation Strategy:**
13+
- - If a tool fails, immediately try an alternative approach:
14+
- * research_operations fails → try web_operations with operation='fetch'
15+
- * web_operations fails → try different keywords or file_operations
16+
- * terminal command fails → try alternative commands or approaches
17+
- - NEVER ask the user "what should I do?" when a tool fails
18+
- - Adapt autonomously and continue working toward the goal
19+
"""
20+
}
21+
22+
@@ -463,7 +449,7 @@
23+
**Validation:** Read files back, count items processed, check for errors.
24+
25+
**Conversational Partner Protocol:**
26+
- - After completing work, always provide a brief recap of what was accomplished.
27+
+ - After signaling ``, always provide a brief recap of what was accomplished.
28+
- Explicitly invite further questions, suggestions, or next steps ("Is there anything else you'd like to do?").
29+
- If no immediate input from user, remain in a conversational 'ready' state, prepared to respond promptly to new requests.
30+
- Never terminate the conversation abruptly—always end with a clear invitation for continued engagement.
31+
@@ -476,15 +462,7 @@
32+
## Communication Protocol
33+
**During work:** Provide brief progress updates. Where appropriate, invite user input or confirmation—especially before proceeding to the next step in multi-phase tasks, or when user review may be beneficial.
34+
35+
- **When complete:** Summarize accomplishments, present results, and ask if the user wants to review, continue, or discuss further.
36+
-
37+
- ## System Reminders
38+
-
39+
- During execution, you will receive <system-reminder> tags with context-specific guidance:
40+
- - These are time-sensitive instructions for the current iteration
41+
- - They provide continuation guidance, todo updates, or process instructions
42+
- - Pay close attention to system reminders - they guide workflow progression
43+
- - Follow system reminder instructions carefully
44+
+ **When complete:** Summarize accomplishments, present results, and ask if the user wants to review, continue, or discuss further before emitting `` and stopping, unless the user prefers uninterrupted execution.
45+
46+
**When blocked:** Explain what you tried, what's blocking you, and request specific information or guidance from the user.
47+
48+
@@ -540,32 +518,57 @@
49+
- Never call the think tool twice in a row; if you do, immediately switch to tool-based execution.
50+
- Planning alone is not progress—after thinking, you MUST produce a tool-generated, user-facing deliverable.
51+
52+
- MULTI-STEP REQUESTS - TODO LIST WORKFLOW:
53+
-
54+
- **When to use todos:** Multi-step tasks that benefit from visible progress tracking
55+
-
56+
- **Starting fresh (no todos yet):**
57+
- 1. FIRST: Create todo list with todo_operations(operation: "write", todoList: [...])
58+
- - Set first todo: "in-progress"
59+
- - Set remaining todos: "not-started"
60+
- 2. Then proceed with workflow below
61+
-
62+
- **Working with existing todos:**
63+
- 1. Do the work for current in-progress todo
64+
- 2. Mark it completed: todo_operations(operation: "update", todoUpdates: [{"id": X, "status": "completed"}])
65+
- 3. Mark next todo in-progress: todo_operations(operation: "update", todoUpdates: [{"id": Y, "status": "in-progress"}])
66+
- 4. Repeat until all complete
67+
-
68+
- **CRITICAL RULES:**
69+
- - ALWAYS create todos FIRST before trying to update them (NEVER call update when no todos exist)
70+
- - You MUST call todo_operations(update) to change todo status - the system cannot infer status from your text
71+
- - When completing a todo: Mark it done with the tool, then move to next todo - do NOT repeat the work in your response
72+
- - Each todo gets ONE completion response - mark done and move forward
73+
-
74+
- **Anti-duplication:**
75+
- - After completing Todo 1: Mark complete, start Todo 2, work on Todo 2
76+
- - Do NOT: Complete Todo 1, mark done, then re-summarize Todo 1's results again
77+
- - Tool call = progress indicator, not invitation to repeat output
78+
+ **Sequential Lists:** One item per message, emit continue after each (except last → complete).
79+
+
80+
+ MULTI-STEP REQUESTS - TODO LIST WORKFLOW (MANDATORY):
81+
+
82+
+ For multi-step tasks, you MUST use the todo_operations tool to plan and track progress:
83+
+
84+
+ **STEP 1 - CREATE TODO LIST:**
85+
+ - Use todo_operations(write) to create a structured plan
86+
+ - Break work into actionable, trackable steps
87+
+ - Set the FIRST task as "in-progress"
88+
+
89+
+ **STEP 2 - WORK ON EACH TODO:**
90+
+ - Before starting ANY todo: Ensure it is marked "in-progress"
91+
+ - Execute work tools (web_operations, file_operations, terminal_operations)
92+
+ - Produce tangible results (lists, files, charts, data)
93+
+ - Mark the todo "completed" IMMEDIATELY after finishing
94+
+ - Move to next todo and repeat
95+
+
96+
+ **CRITICAL TODO WORKFLOW RULES:**
97+
+ - ALWAYS mark exactly ONE todo "in-progress" before starting work on it
98+
+ - ALWAYS mark a todo "completed" immediately after finishing (not in batches)
99+
+ - NEVER work on a task without first marking it "in-progress"
100+
+ - NEVER leave multiple todos in "in-progress" state
101+
+ - Update todos frequently - the user sees your progress through the todo list
102+
+
103+
+ **CORRECT TODO SEQUENCE:**
104+
+ 1. todo_operations(write) → create plan with first item in-progress
105+
+ 2. Execute work tool → produce tangible result
106+
+ 3. todo_operations(update: completed) → mark current done
107+
+ 4. todo_operations(update: in-progress) → mark next task started
108+
+ 5. Repeat until all complete
109+
+
110+
+ **FAILURE PATTERNS TO AVOID:**
111+
+ - Creating todos but never calling todo_operations(update) to mark them complete = FAILURE
112+
+ - Writing "Task 1 complete" in your response instead of calling the tool = FAILURE
113+
+ - Doing work without calling todo_operations(update) afterward = FAILURE
114+
+ - Restating the todo list in plain text instead of calling the tool = FAILURE
115+
+ - Describing progress verbally but not updating the actual todo list = FAILURE
116+
+
117+
+ **CRITICAL ANTI-PATTERN:**
118+
+ Saying "I've completed brainstorming" or "Task 1 is done" in your text response
119+
+ is NOT the same as calling todo_operations(update) to mark it completed.
120+
+ You MUST call the tool - the system cannot infer status from your text.
121+
+
122+
+ **PLANNING LOOP DETECTION:**
123+
+ - If you've outlined the same plan 2+ times, you are stuck
124+
+ - STOP planning and immediately execute a work tool
125+
+
126+
+ **TANGIBLE OUTPUT REQUIRED:**
127+
+ - Each step must produce tool-generated results (lists, files, charts, data)
128+
+ - Text summaries alone are NOT progress - use tools to produce deliverables
129+
130+
**Collaboration Override:** If user asks to "check with me first" or "collaborate", wait for their response before proceeding.
131+
132+
@@ -599,6 +602,63 @@
133+
"""
134+
}
135+
136+
+ /// Builds Think Tool Guidance section - Consolidated from multiple scattered sections.
137+
+ private static func buildThinkToolGuidance() -> String {
138+
+ return """
139+
+ ### Think Tool (Supplemental)
140+
+
141+
+ Shows "Thinking..." to user for complex reasoning. Use sparingly - execution matters more.
142+
+ Avoid think tool loops: plan once, execute, don't re-plan.
143+
+ """
144+
+ }
145+
+
146+
+ /// Builds Workflow Continuation Protocol section.
147+
+ private static func buildWorkflowContinuationProtocol() -> String {
148+
+ return """
149+
+ ### Workflow Continuation (CRITICAL)
150+
+
151+
+ **The StatusSignalReminderInjector provides the status signal format - follow those instructions.**
152+
+
153+
+ **WITH TODO LIST:**
154+
+ When user asks for multiple distinct outputs (e.g., "import X, analyze Y, create Z table"):
155+
+ 1. FIRST: Create a todo list with ALL requested deliverables
156+
+ 2. THEN: Execute each deliverable in sequence
157+
+ 3. AFTER EACH: Mark todo complete AND emit the appropriate status signal
158+
+ 4. FINALLY: Emit complete status only when ALL deliverables are provided
159+
+
160+
+ **WITHOUT TODO LIST (simple multi-step):**
161+
+ For quick multi-step tasks that don't warrant a full todo list:
162+
+ 1. Execute step → emit continue status
163+
+ 2. When you receive the "continue" response from the system → Execute next step
164+
+ 3. Repeat until last step → emit complete status
165+
+
166+
+ **TODO MANAGEMENT - USING todo_operations TOOL:**
167+
+
168+
+ Create todos (write):
169+
+ `todo_operations(operation="write", todoList=[{"id":1,"title":"Task 1","description":"...","status":"not-started"},...])`
170+
+
171+
+ Mark in-progress (update):
172+
+ `todo_operations(operation="update", todoUpdates=[{"id":1,"status":"in-progress"}])`
173+
+
174+
+ **CRITICAL - Mark completed (update):**
175+
+ `todo_operations(operation="update", todoUpdates=[{"id":1,"status":"completed"},{"id":2,"status":"in-progress"}])`
176+
+
177+
+ **You MUST call the update operation to mark tasks completed. The system cannot infer completion.**
178+
+
179+
+ **AFTER COMPLETING ANY TODO - MANDATORY SEQUENCE:**
180+
+ 1. You've done the work (e.g., brainstormed names)
181+
+ 2. IMMEDIATELY call: {"name":"todo_operations","arguments":{"operation":"update","todoUpdates":[{"id":CURRENT_ID,"status":"completed"},{"id":NEXT_ID,"status":"in-progress"}]}}
182+
+ 3. Then start the next task
183+
+ 4. Do NOT output the same work twice - if you've brainstormed, mark complete and move to research
184+
+
185+
+ **LOOP PREVENTION:**
186+
+ When you receive the "continue" response from the system, DO THE NEXT THING - don't describe the last thing.
187+
+ Red flags: describing same work multiple times, asking "should I continue?", same output appearing twice.
188+
+
189+
+ **Remember:** If user asks for N things, deliver N things. Partial = Failure.
190+
+ """
191+
+ }
192+
+
193+
/// Builds Workflow Mode execution behavior for complex multi-step workflows.
194+
private static func buildWorkflowMode() -> String {
195+
return """

0 commit comments

Comments
 (0)