fix(system-prompt): restore todo workflow section - agents need explicit guidance

fewtarius · fewtarius · commit 9097d373cc2b · 2026-01-04T17:35:06.000-05:00
**Problem:**
After removing MULTI-STEP REQUESTS - TODO LIST WORKFLOW section as redundant
with orchestrator, user testing revealed agents no longer followed correct todo
workflow sequence. While orchestrator provides runtime reminders, agents rely on
the static instructions as educational foundation.

**Solution:**
Reverted todo workflow section removal. Agents need both:
1. Static instructions (system prompt) - educational foundation &amp; reference
2. Runtime reminders (orchestrator) - reinforcement during execution

**Changes:**
✅ RESTORED: MULTI-STEP REQUESTS - TODO LIST WORKFLOW section
✅ KEPT: Dead code removals (buildWorkflowContinuationProtocol, buildThinkToolGuidance)
✅ KEPT: Simplifications (tool responsibility, think tool, multi-step handling)
✅ UPDATED: Version comment to reflect lesson learned
✅ UPDATED: SYSTEM_PROMPT_EVOLUTION.md with reversion rationale

**Lesson Learned:**
Runtime enforcement supplements but does NOT replace static educational guidance.
Even with perfect runtime reminders, agents need reference documentation in the
base prompt for workflow understanding.

**Token Impact:**
- Original estimate: ~1150 token savings
- After reversion: ~720 token savings (~15% reduction)
- Restored: ~430 tokens (todo workflow section)

**Testing:**
✅ Build: PASS
✅ User validation: Agents now follow correct workflow sequence
diff --git a/Sources/ConfigurationSystem/SystemPromptConfiguration.swift b/Sources/ConfigurationSystem/SystemPromptConfiguration.swift
@@ -57,9 +57,9 @@ public struct SystemPromptConfiguration: Codable, Identifiable, Hashable, Sendab
     public var autoEnableDynamicIterations: Bool
 
     /// Current version of the prompt system (increment when making breaking changes).
-    /// Version 16: Removed redundant behavioral instructions now handled by AgentOrchestrator's
-    /// context-aware continuation guidance system. System prompt now focuses on identity and
-    /// capabilities; orchestrator handles workflow enforcement at runtime.
+    /// Version 16: Simplified verbose sections (tool responsibility, think tool, multi-step handling).
+    /// Removed dead code (buildWorkflowContinuationProtocol, buildThinkToolGuidance).
+    /// NOTE: Kept todo workflow instructions - agents need explicit guidance despite orchestrator.
     public static let currentVersion = 16
 
     public init(
@@ -509,6 +509,36 @@ private static func buildSAMSpecificPatterns() -> String {
 
     **Sequential Lists:** One item per message, emit continue after each (except last → complete).
 
+    MULTI-STEP REQUESTS - TODO LIST WORKFLOW:
+
+    **When to use todos:** Multi-step tasks that benefit from visible progress tracking
+
+    **Starting fresh (no todos yet):**
+    1. FIRST: Create todo list with todo_operations(operation: "write", todoList: [...])
+       - Set first todo: "in-progress"
+       - Set remaining todos: "not-started"
+    2. Then proceed with workflow below
+
+    **Working with existing todos:**
+    1. Do the work for current in-progress todo
+    2. Mark it completed: todo_operations(operation: "update", todoUpdates: [{"id": X, "status": "completed"}])
+    3. Mark next todo in-progress: todo_operations(operation: "update", todoUpdates: [{"id": Y, "status": "in-progress"}])
+    4. Repeat until all complete
+
+    **CRITICAL RULES:**
+    - ALWAYS create todos FIRST before trying to update them (NEVER call update when no todos exist)
+    - You MUST call todo_operations(update) to change todo status - the system cannot infer status from your text
+    - When completing a todo: Mark it done with the tool, then move to next todo - do NOT repeat the work in your response
+    - Each todo gets ONE completion response - mark done and move forward
+
+    **Anti-duplication:**
+    - After completing Todo 1: Mark complete, start Todo 2, work on Todo 2
+    - Do NOT: Complete Todo 1, mark done, then re-summarize Todo 1's results again
+    - Tool call = progress indicator, not invitation to repeat output
+
+    **Before Complete:** Verify ALL requested items delivered. If user asked for N things, confirm N things done.
+
+
     ## Data Visualization Protocol (CRITICAL)
 
     **Mermaid Diagram Types:** flowchart, sequenceDiagram, classDiagram, stateDiagram, erDiagram, gantt, pie, bar, journey, mindmap, timeline, quadrantChart, requirementDiagram, gitGraph, xychart-beta (bar/line charts).
diff --git a/project-docs/SYSTEM_PROMPT_EVOLUTION.md b/project-docs/SYSTEM_PROMPT_EVOLUTION.md
@@ -8,36 +8,32 @@
 
 ### Version 16 (January 4, 2026)
 
-**Change:** Removed redundant behavioral instructions now handled by AgentOrchestrator
+**Change:** Simplified verbose sections, removed dead code
 
 **Rationale:**
-The AgentOrchestrator now provides intelligent, context-aware continuation guidance at runtime through:
-1. 4 adaptive guidance variants based on (todos exist × tools used last iteration)
-2. TodoReminderInjector for todo-specific workflow reminders
-3. Fresh todo state reads before every workflow decision
+Initial plan was to remove todo workflow instructions as redundant with AgentOrchestrator.
+**However, testing revealed agents need explicit todo workflow guidance in system prompt.**
+While orchestrator provides runtime reminders, the static instructions serve as educational
+foundation that agents rely on. Reverted todo workflow section after user testing.
 
-This makes static behavioral instructions in the system prompt redundant. The system prompt should define WHO SAM is and WHAT SAM can do; the orchestrator handles HOW to behave during workflow execution.
+**Final Changes:**
 
-**Removals:**
+**KEPT (After Reversion):**
 1. **MULTI-STEP REQUESTS - TODO LIST WORKFLOW section** (buildSAMSpecificPatterns)
-   - Entire todo workflow instructions removed (~350 tokens)
-   - **Why:** AgentOrchestrator Variants 1 & 2 provide identical workflow guidance at runtime
-   - **Coverage:** Orchestrator enforces "mark in-progress → do work → mark completed" every iteration
+   - **Initially removed** as redundant with AgentOrchestrator
+   - **REVERTED** after user testing showed agents need explicit guidance  
+   - **Lesson:** Runtime reminders supplement but don't replace educational foundation
+   - Static instructions serve as reference agents rely on for workflow understanding
 
-2. **buildWorkflowContinuationProtocol() function** (dead code)
+**REMOVED (Dead Code):**
+1. **buildWorkflowContinuationProtocol() function**
    - Entire function removed (~500 tokens)
    - **Why:** Never called, redundant with orchestrator's 4 continuation variants
-   - **Coverage:** Orchestrator provides context-aware guidance for WITH/WITHOUT todos scenarios
 
-3. **buildThinkToolGuidance() function** (dead code)
+2. **buildThinkToolGuidance() function**
    - Entire function removed (~80 tokens)
    - **Why:** Never called, guidance already in buildSAMSpecificPatterns
 
-4. **Anti-duplication guidance** (buildSAMSpecificPatterns)
-   - Removed duplicate output warnings (~80 tokens)
-   - **Why:** TodoReminderInjector enforces "DO NOT repeat the work"
-   - **Coverage:** Runtime reminder system handles this
-
 **Simplifications:**
 
 1. **Tool Responsibility** (buildToolUsage)
@@ -58,7 +54,8 @@ This makes static behavioral instructions in the system prompt redundant. The sy
    - **Why:** Educational value remains, but enforcement language removed
    - **Token Savings:** ~40 tokens
 
-**Total Token Savings:** ~1150 tokens (~20-25% reduction from affected sections)
+**Total Token Savings:** ~720 tokens (~15% reduction from affected sections)
+**Note:** Original estimate was ~1150 tokens, but todo workflow section was reverted (~430 tokens restored)
 
 **What Remains (Preserved Components):**
 - ✅ Core Identity (WHO SAM is)
@@ -74,18 +71,19 @@ This makes static behavioral instructions in the system prompt redundant. The sy
 - ✅ Dynamic Iterations (iteration monitoring, only when enabled)
 - ✅ Two-Phase Workflow (pattern recommendation)
 - ✅ Sequential Lists (pattern guidance)
+- ✅ **Todo Workflow Instructions (KEPT after reversion)**
 
 **Behavioral Impact:**
-- **No loss of functionality** - Orchestrator enforces all removed instructions at runtime
-- **Better separation of concerns** - Static prompt = identity/capabilities, Runtime guidance = workflow behavior
-- **More maintainable** - Single source of truth for behavioral rules (orchestrator)
-- **Context-aware** - Orchestrator adapts guidance based on current workflow state
+- **Todo workflow preserved** - Agents need explicit static guidance despite runtime reminders
+- **Dead code removed** - Cleaner codebase
+- **Verbose sections simplified** - Clearer, more concise
+- **No breaking changes** - All functionality intact
 
 **Testing Results:**
 - ✅ Build: PASS (`make build-debug`)
 - ✅ No compile errors
 - ✅ System prompt generates correctly
-- Manual testing pending (Test Cases 1-5 from analysis document)
+- ✅ User testing: Agents work correctly with todo workflow restored
 
 **Files Modified:**
 - `Sources/ConfigurationSystem/SystemPromptConfiguration.swift`