fix(orchestrator): Fix Claude context loss + infinite loops + double alternation

fewtarius · fewtarius · commit ab35ee568609 · 2026-01-07T01:41:40.000-05:00
**Problem:**
Three critical bugs affecting agent workflow quality:
1. Claude models received only tool results without conversation history
2. Agents looped infinitely calling same tool
3. Double alternation merging lost tool results

**Root Causes:**
1. Delta mode sent only 4 internal messages when marker not found, instead of full 19-message history
2. 16KB payload limit trimmed tool results before reaching LLM
3. Message alternation happened twice (orchestrator + provider)

**Solutions:**
1. Added useDeltaMode flag - only enable delta when marker successfully found
2. Apply payload limit only to Claude models (32KB), preserve message pairs
3. Removed duplicate alternation from GitHubCopilotProvider
4. Disabled Claude batching for GitHub Copilot (proxy handles conversion)

**Testing:**
✅ Claude receives full conversation context
✅ No infinite tool loops
✅ Tool results reach LLM properly
✅ Task completion quality improved
✅ Reddit comments test: Complete analysis on first attempt

**Impact:**
- Claude 'memory loss' eliminated
- Agent workflow stability improved
- GitHub Copilot + Claude format conflicts resolved
- Better diagnostic logging for future debugging
diff --git a/Info.plist b/Info.plist
@@ -19,9 +19,9 @@
 	<key>CFBundlePackageType</key>
 	<string>APPL</string>
 	<key>CFBundleShortVersionString</key>
-	<string>20260106.1</string>
+	<string>20260107.1</string>
 	<key>CFBundleVersion</key>
-	<string>20260106.1</string>
+	<string>20260107.1</string>
 	<key>LSApplicationCategoryType</key>
 	<string>public.app-category.productivity</string>
 	<key>LSMinimumSystemVersion</key>
diff --git a/Resources/whats-new.json b/Resources/whats-new.json
@@ -1,5 +1,88 @@
 {
   "releases": [
+    {
+      "version": "20260107.1",
+      "release_date": "January 7, 2026",
+      "introduction": "Critical release fixing Claude model context loss and infinite loops. This release resolves three major bugs affecting agent workflow quality: tool result trimming causing infinite loops, delta mode sending incomplete context to Claude, and duplicate message alternation losing tool results. Additionally fixes false 'no tools' warnings and improves workflow guidance.",
+      "highlights": [
+        {
+          "id": "claude-context-loss-fix",
+          "icon": "brain.fill",
+          "title": "Claude Context Loss Fixed",
+          "description": "Fixed critical bug where Claude models received only tool results without conversation history. When stateful marker wasn't found, delta mode incorrectly sent only 4 internal messages instead of full conversation history (e.g., 19 messages). Claude now receives complete context, eliminating 'this is the FIRST message' errors and memory issues."
+        },
+        {
+          "id": "infinite-loop-elimination",
+          "icon": "arrow.triangle.2.circlepath.circle.fill",
+          "title": "Infinite Tool Loop Eliminated",
+          "description": "Fixed infinite loop where agents repeatedly called the same tool. Root cause: 16KB payload limit was trimming tool results before they reached the LLM, causing the agent to retry endlessly. Payload limit now only applies to Claude models (32KB) and preserves message pairs together."
+        },
+        {
+          "id": "github-copilot-claude-batching",
+          "icon": "chevron.left.forwardslash.chevron.right",
+          "title": "GitHub Copilot + Claude Format Handling",
+          "description": "Disabled Claude-specific tool result batching for GitHub Copilot provider. GitHub Copilot's API handles Claude conversion internally and expects OpenAI format. Only direct Anthropic provider uses batching, preventing format conflicts and alternation issues."
+        }
+      ],
+      "bugfixes": [
+        {
+          "id": "delta-mode-context-loss",
+          "icon": "chevron.left.chevron.right",
+          "title": "Delta Mode Context Loss Fixed",
+          "description": "When stateful marker wasn't found in conversation, system logged 'sending full history' but still used delta-only mode. Added useDeltaMode flag that only enables delta mode when marker is successfully found, preventing context loss."
+        },
+        {
+          "id": "payload-limit-tool-trimming",
+          "icon": "scissors",
+          "title": "Payload Size Limit Tool Trimming Fixed",
+          "description": "16KB payload limit was applied to ALL models, causing tool results to be trimmed before reaching the LLM. Now only enforced for Claude models (32KB limit) and improved trimming logic keeps message pairs together (assistant+toolcalls with corresponding tool_result)."
+        },
+        {
+          "id": "double-alternation-merging",
+          "icon": "arrow.left.arrow.right",
+          "title": "Double Message Alternation Eliminated",
+          "description": "Messages were being alternated/merged twice: once in AgentOrchestrator.ensureMessageAlternation() and again in GitHubCopilotProvider.enforceMessageAlternation(). Removed duplicate alternation from provider, keeping orchestrator version with better logging."
+        },
+        {
+          "id": "claude-batching-provider-specific",
+          "icon": "server.rack",
+          "title": "Claude Tool Batching Made Provider-Specific",
+          "description": "batchToolResultsForClaude now only runs for direct Anthropic provider (anthropic/*). GitHub Copilot provider (github_copilot/claude-*) skips batching since the proxy handles conversion internally, preventing marker burial in alternation."
+        },
+        {
+          "id": "workflow-guidance-improvement",
+          "icon": "text.quote",
+          "title": "Workflow Guidance Improved",
+          "description": "Changed aggressive guidance from 'If tool results contain enough information → RESPOND NOW' to 'ANALYZE the data to address the user's specific request. Complete that task with the data you have.' Encourages proper analysis instead of premature responses."
+        },
+        {
+          "id": "false-no-tools-warning",
+          "icon": "exclamationmark.triangle",
+          "title": "False 'No Tools' Warning Eliminated",
+          "description": "Fixed premature reset of lastIterationHadToolResults causing false 'Agent executed no tools in iteration X' warnings. System now correctly tracks tool execution state across iterations, only warning when genuinely stuck."
+        },
+        {
+          "id": "operation-deduplication",
+          "icon": "arrow.triangle.merge",
+          "title": "Web Operations Deduplication Added",
+          "description": "WebOperationsTool now prevents calling the same operation with identical parameters multiple times. Reduces wasted API calls and improves efficiency when agents retry unnecessarily."
+        },
+        {
+          "id": "diagnostic-message-logging",
+          "icon": "doc.text.magnifyingglass",
+          "title": "Comprehensive Diagnostic Logging Added",
+          "description": "Added detailed logging to ensureMessageAlternation showing input/output message arrays with roles, sizes, tool info, and each filtering/merging decision. Critical for debugging message transformation bugs."
+        }
+      ],
+      "known_issues": [
+        {
+          "id": "mindmap-children-not-rendering",
+          "icon": "diagram.tree",
+          "title": "Mindmap Children Not Rendering",
+          "description": "Mindmap shows only root node. Children are parsed correctly but have layout/positioning issues preventing display. Recursive rendering exists but child nodes render outside visible frame bounds."
+        }
+      ]
+    },
     {
       "version": "20260106.1",
       "release_date": "January 6, 2026",
diff --git a/Sources/APIFramework/AgentOrchestrator.swift b/Sources/APIFramework/AgentOrchestrator.swift
@@ -200,6 +200,18 @@ public class AgentOrchestrator: ObservableObject, IterationController {
                 continue
             }
 
+            /// CRITICAL: Preserve Claude batched tool results - these must NOT be merged
+            /// The __CLAUDE_BATCHED_TOOL_RESULTS__ marker MUST be at the start of content
+            /// for AnthropicMessageConverter to detect and convert to tool_result blocks
+            if message.role == "user",
+               let content = message.content,
+               content.hasPrefix("__CLAUDE_BATCHED_TOOL_RESULTS__") {
+                fixed.append(message)
+                lastRole = message.role
+                logger.debug("ALTERNATION_PRESERVE_BATCHED_TOOLS: Preserved Claude batched tool results (contentLen=\(content.count))")
+                continue
+            }
+
             /// Merge consecutive same-role messages
             if message.role == lastRole {
                 /// Can only merge user and assistant messages (not system or tool)
@@ -4310,6 +4322,8 @@ public class AgentOrchestrator: ObservableObject, IterationController {
         /// 1. statefulMarker + hasToolResults = delta-only mode (workflow iteration) - skip conversation history
         /// 2. statefulMarker + NO tool results = subsequent user message - send FULL conversation history
         /// 3. NO statefulMarker = first message or fresh start - send FULL conversation history
+        var useDeltaMode = false  /// Track whether we should use delta-only mode
+        
         if let marker = statefulMarker, hasToolResults {
             /// Delta-only mode: This is a workflow iteration with tool results
             /// Server has full history up to marker, only need to send tool execution delta
@@ -4319,33 +4333,41 @@ public class AgentOrchestrator: ObservableObject, IterationController {
                 /// Example: If marker was captured at count=3, send messages from index 3 onwards
                 let sliceIndex = markerMessageCount
                 conversationMessages = Array(conversationMessages.suffix(from: min(sliceIndex, conversationMessages.count)))
+                useDeltaMode = true  /// Successfully sliced, use delta mode
                 logger.debug("STATEFUL_MARKER_SLICING: Using message count \(markerMessageCount), sending \(conversationMessages.count) messages after marker (delta-only mode with tool results)")
             }
             /// FALLBACK: Search for marker in messages (timing-dependent, may fail if message not persisted yet)
             else if let markerIndex = conversationMessages.lastIndex(where: { $0.githubCopilotResponseId == marker }) {
                 /// Slice to only include messages AFTER the marker (marker itself is already on server)
                 conversationMessages = Array(conversationMessages.suffix(from: markerIndex + 1))
+                useDeltaMode = true  /// Successfully found marker, use delta mode
                 logger.debug("STATEFUL_MARKER_SLICING: Found marker at index \(markerIndex), sending ONLY \(conversationMessages.count) messages after marker (delta-only mode, fallback method)")
             } else {
-                logger.warning("STATEFUL_MARKER_WARNING: Marker \(marker.prefix(20))... not found in conversation AND no message count available, sending full history (\(conversationMessages.count) messages)")
+                /// CRITICAL: Marker not found - cannot use delta mode safely!
+                /// Send FULL conversation history to prevent context loss
+                useDeltaMode = false  /// Force full history mode
+                logger.warning("STATEFUL_MARKER_WARNING: Marker \(marker.prefix(20))... not found in conversation AND no message count available, FORCING FULL HISTORY MODE (safety fallback)")
             }
         } else if statefulMarker != nil && !hasToolResults {
             /// Subsequent user message scenario: statefulMarker exists but no tool results yet
             /// Do NOT slice conversation history - user needs full context for their new message!
+            useDeltaMode = false  /// Full history needed for user message
             logger.debug("SUBSEQUENT_USER_MESSAGE: StatefulMarker exists but no tool results - sending FULL conversation history (\(conversationMessages.count) messages) for user context")
         } else {
+            useDeltaMode = false  /// No marker, send full history
             logger.debug("INFO: No statefulMarker, sending all \(conversationMessages.count) conversation messages")
         }
 
-        /// When statefulMarker exists, send ONLY internalMessages (delta-only mode)
+        /// When delta mode is enabled, send ONLY internalMessages (delta-only mode)
+        /// When delta mode is disabled, send conversationMessages + internalMessages (full history)
         /// This prevents duplicate assistant messages that cause Claude 400 errors
         /// ROOT CAUSE: Assistant responses are in BOTH conversation.messages AND internalMessages
         /// GitHub Copilot approach: With statefulMarker, only send NEW messages (delta)
         /// Our approach: internalMessages IS the delta (tool calls + results from previous iteration)
         /// Do NOT inject "Please continue" into messages array
         /// GitHub Copilot API: "Please continue" is query param only, NOT a synthetic message
         var currentMarker = statefulMarker  /// Make mutable copy
-        if let marker = currentMarker, hasToolResults {
+        if useDeltaMode && hasToolResults {
             /// Delta-only mode: Server has full history up to marker, only send new tool execution context
             /// The stateful marker tells the API to continue from the previous response
             /// We send ONLY the tool results (delta), not the full conversation history
@@ -4497,12 +4519,21 @@ public class AgentOrchestrator: ObservableObject, IterationController {
 
         logger.debug("callLLMStreaming: Built complete message array with \(messages.count) messages (before alternation fix)")
 
-        /// CRITICAL: For Claude models, batch consecutive tool results into single user messages
+        /// CRITICAL: For Claude models via DIRECT Anthropic provider, batch consecutive tool results
         /// Claude Messages API requires ALL tool results from one iteration in ONE user message
         /// This fixes the tool result batching issue that caused workflow loops
-        if modelLower.contains("claude") {
+        /// 
+        /// IMPORTANT: Do NOT batch for GitHub Copilot + Claude!
+        /// GitHub Copilot's API handles Claude conversion internally and expects OpenAI format
+        /// Batching causes the marker to be buried in alternation merging
+        let isDirectAnthropicProvider = model.lowercased().hasPrefix("anthropic/")
+        let isClaudeModel = modelLower.contains("claude")
+        
+        if isClaudeModel && isDirectAnthropicProvider {
             messages = batchToolResultsForClaude(messages)
-            logger.debug("callLLMStreaming: Applied Claude tool result batching")
+            logger.debug("callLLMStreaming: Applied Claude tool result batching for direct Anthropic provider")
+        } else if isClaudeModel {
+            logger.debug("callLLMStreaming: Skipping Claude batching (not direct Anthropic provider - proxy will handle conversion)")
         }
 
         /// CRITICAL: Fix message alternation BEFORE YARN compression