Feature/configurable memory compression#454
Feature/configurable memory compression#454seanturner83 wants to merge 3 commits intousestrix:mainfrom
Conversation
Greptile SummaryThis PR makes the memory compressor thresholds configurable via three new env vars and adds a second prompt-cache breakpoint on the agent identity message. The config and cache changes are clean; the main concern is in
Confidence Score: 4/5Safe to merge once the role-filtering bug in truncate_tool_outputs is fixed; the feature is opt-in (default off) so existing deployments are unaffected, but enabling it as documented could truncate system prompts. One P1 defect: truncate_tool_outputs has no role guard, so enabling STRIX_MAX_TOOL_OUTPUT_CHARS truncates all message types including system messages. The bug only activates when the new env var is set, keeping the default path safe, but the PR is intended to be used with non-zero values in production. strix/llm/memory_compressor.py — specifically the truncate_tool_outputs method and TOOL_TRUNCATION_NOTICE constant. Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 198-221
Comment:
**`truncate_tool_outputs` mutates non-tool messages**
The loop iterates over every message regardless of role, so when `STRIX_MAX_TOOL_OUTPUT_CHARS` is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than `max_tool_output_chars` (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the `type == "text"` guard already avoids native Anthropic `tool_result` items, but the string-content branch has no role guard at all.
```suggestion
for msg in messages:
if msg.get("role") not in ("tool",):
continue
content = msg.get("content", "")
if isinstance(content, str) and len(content) > self.max_tool_output_chars:
msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
elif isinstance(content, list):
for item in content:
if (
isinstance(item, dict)
and item.get("type") == "text"
and len(item.get("text", "")) > self.max_tool_output_chars
):
item["text"] = _truncate_tool_output(
item["text"], self.max_tool_output_chars
)
```
If tool results can also arrive embedded inside `role: "user"` messages (e.g. Anthropic `type: "tool_result"` items), that branch would also need a `type == "tool_result"` check on the inner items to cover those correctly.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 16-19
Comment:
**Misleading truncation notice may deceive the security agent**
"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.
```suggestion
TOOL_TRUNCATION_NOTICE = (
"\n\n[Output truncated: showing first {head_len} and last {tail_len} characters "
"of {original_len}-character output (limit: {max_len}). "
"The middle portion has been permanently removed.]"
)
```
Note: with this template you would also need to pass `head_len` and `tail_len` into the `.format()` call in `_truncate_tool_output`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 387-395
Comment:
**Cache breakpoint 2 silently skips list-typed content**
The block only wraps the agent identity message when `isinstance(content, str)`. If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the `<agent_identity>` check still matches via `str(...)` but no `cache_control` breakpoint is applied, silently losing the caching benefit. Adding an `else` branch to append `cache_control` to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'usestrix:main' into featur..." | Re-trigger Greptile |
| def truncate_tool_outputs(self, messages: list[dict[str, Any]]) -> None: | ||
| """Truncate large tool output messages in-place. | ||
|
|
||
| This prevents oversized tool results (nmap scans, file contents, etc.) | ||
| from accumulating in the conversation history and being resent on every | ||
| subsequent LLM call. Applied at ingestion time before the history grows. | ||
| """ | ||
| if self.max_tool_output_chars <= 0: | ||
| return | ||
|
|
||
| for msg in messages: | ||
| content = msg.get("content", "") | ||
| if isinstance(content, str) and len(content) > self.max_tool_output_chars: | ||
| msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars) | ||
| elif isinstance(content, list): | ||
| for item in content: | ||
| if ( | ||
| isinstance(item, dict) | ||
| and item.get("type") == "text" | ||
| and len(item.get("text", "")) > self.max_tool_output_chars | ||
| ): | ||
| item["text"] = _truncate_tool_output( | ||
| item["text"], self.max_tool_output_chars | ||
| ) |
There was a problem hiding this comment.
truncate_tool_outputs mutates non-tool messages
The loop iterates over every message regardless of role, so when STRIX_MAX_TOOL_OUTPUT_CHARS is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than max_tool_output_chars (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the type == "text" guard already avoids native Anthropic tool_result items, but the string-content branch has no role guard at all.
| def truncate_tool_outputs(self, messages: list[dict[str, Any]]) -> None: | |
| """Truncate large tool output messages in-place. | |
| This prevents oversized tool results (nmap scans, file contents, etc.) | |
| from accumulating in the conversation history and being resent on every | |
| subsequent LLM call. Applied at ingestion time before the history grows. | |
| """ | |
| if self.max_tool_output_chars <= 0: | |
| return | |
| for msg in messages: | |
| content = msg.get("content", "") | |
| if isinstance(content, str) and len(content) > self.max_tool_output_chars: | |
| msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars) | |
| elif isinstance(content, list): | |
| for item in content: | |
| if ( | |
| isinstance(item, dict) | |
| and item.get("type") == "text" | |
| and len(item.get("text", "")) > self.max_tool_output_chars | |
| ): | |
| item["text"] = _truncate_tool_output( | |
| item["text"], self.max_tool_output_chars | |
| ) | |
| for msg in messages: | |
| if msg.get("role") not in ("tool",): | |
| continue | |
| content = msg.get("content", "") | |
| if isinstance(content, str) and len(content) > self.max_tool_output_chars: | |
| msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars) | |
| elif isinstance(content, list): | |
| for item in content: | |
| if ( | |
| isinstance(item, dict) | |
| and item.get("type") == "text" | |
| and len(item.get("text", "")) > self.max_tool_output_chars | |
| ): | |
| item["text"] = _truncate_tool_output( | |
| item["text"], self.max_tool_output_chars | |
| ) |
If tool results can also arrive embedded inside role: "user" messages (e.g. Anthropic type: "tool_result" items), that branch would also need a type == "tool_result" check on the inner items to cover those correctly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 198-221
Comment:
**`truncate_tool_outputs` mutates non-tool messages**
The loop iterates over every message regardless of role, so when `STRIX_MAX_TOOL_OUTPUT_CHARS` is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than `max_tool_output_chars` (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the `type == "text"` guard already avoids native Anthropic `tool_result` items, but the string-content branch has no role guard at all.
```suggestion
for msg in messages:
if msg.get("role") not in ("tool",):
continue
content = msg.get("content", "")
if isinstance(content, str) and len(content) > self.max_tool_output_chars:
msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
elif isinstance(content, list):
for item in content:
if (
isinstance(item, dict)
and item.get("type") == "text"
and len(item.get("text", "")) > self.max_tool_output_chars
):
item["text"] = _truncate_tool_output(
item["text"], self.max_tool_output_chars
)
```
If tool results can also arrive embedded inside `role: "user"` messages (e.g. Anthropic `type: "tool_result"` items), that branch would also need a `type == "tool_result"` check on the inner items to cover those correctly.
How can I resolve this? If you propose a fix, please make it concise.| TOOL_TRUNCATION_NOTICE = ( | ||
| "\n\n[Output truncated from {original_len} to {max_len} characters. " | ||
| "Full output was captured but condensed to reduce context size.]" | ||
| ) |
There was a problem hiding this comment.
Misleading truncation notice may deceive the security agent
"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.
| TOOL_TRUNCATION_NOTICE = ( | |
| "\n\n[Output truncated from {original_len} to {max_len} characters. " | |
| "Full output was captured but condensed to reduce context size.]" | |
| ) | |
| TOOL_TRUNCATION_NOTICE = ( | |
| "\n\n[Output truncated: showing first {head_len} and last {tail_len} characters " | |
| "of {original_len}-character output (limit: {max_len}). " | |
| "The middle portion has been permanently removed.]" | |
| ) |
Note: with this template you would also need to pass head_len and tail_len into the .format() call in _truncate_tool_output.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 16-19
Comment:
**Misleading truncation notice may deceive the security agent**
"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.
```suggestion
TOOL_TRUNCATION_NOTICE = (
"\n\n[Output truncated: showing first {head_len} and last {tail_len} characters "
"of {original_len}-character output (limit: {max_len}). "
"The middle portion has been permanently removed.]"
)
```
Note: with this template you would also need to pass `head_len` and `tail_len` into the `.format()` call in `_truncate_tool_output`.
How can I resolve this? If you propose a fix, please make it concise.| if len(result) > 1 and "<agent_identity>" in str(result[1].get("content", "")): | ||
| content = result[1]["content"] | ||
| if isinstance(content, str): | ||
| result[1] = { | ||
| **result[1], | ||
| "content": [ | ||
| {"type": "text", "text": content, "cache_control": {"type": "ephemeral"}} | ||
| ], | ||
| } |
There was a problem hiding this comment.
Cache breakpoint 2 silently skips list-typed content
The block only wraps the agent identity message when isinstance(content, str). If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the <agent_identity> check still matches via str(...) but no cache_control breakpoint is applied, silently losing the caching benefit. Adding an else branch to append cache_control to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 387-395
Comment:
**Cache breakpoint 2 silently skips list-typed content**
The block only wraps the agent identity message when `isinstance(content, str)`. If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the `<agent_identity>` check still matches via `str(...)` but no `cache_control` breakpoint is applied, silently losing the caching benefit. Adding an `else` branch to append `cache_control` to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds three environment variables to tune the memory compressor for large-scale scanning campaigns, plus an additional prompt cache breakpoint for improved cache hit rates on Anthropic models.
New environment variables
STRIX_MAX_CONTEXT_TOKENSSTRIX_MIN_RECENT_MESSAGESSTRIX_MAX_TOOL_OUTPUT_CHARSPrompt caching improvement
Adds a second
cache_controlbreakpoint on the agent identity message (<agent_identity>tag), which is stable for the lifetime of each agent. This complements the existing system prompt breakpoint. Related to #279.Motivation
Strix's agentic architecture (6 agents, ~600 LLM calls per scan, full history resent every call) can produce 1.5M–26M input tokens per scan on large repos. The memory compressor currently only triggers at 90% of 100K tokens — by which point the cost damage is already done. Oversized tool outputs (nmap scans, large file reads) accumulate in history and get resent hundreds of times.
These changes make compression tunable and add tool output truncation at ingestion to prevent context bloat at the source.
A/B test results
Tested on a production corpus of 800 repositories at a crypto infrastructure company. Settings:
MAX_CONTEXT_TOKENS=40000,MIN_RECENT_MESSAGES=10,MAX_TOOL_OUTPUT_CHARS=8000.Per-scan comparison (large TypeScript repo with confirmed critical findings):
At scale (15-scan sample): Average cost dropped from $8.66 to $4.52 per scan.
No findings were lost. The optimised configuration actually found one additional vulnerability that the stock configuration missed (likely because the stock run hit context limits and lost relevant earlier context).
Changes
strix/config/config.py— 3 new config variablesstrix/llm/memory_compressor.py— Configurable thresholds,truncate_tool_outputs()method called before compression incompress_history(),_truncate_tool_output()helper preserving head + tailstrix/llm/llm.py— Second cache breakpoint on agent identity messageAll changes are backwards compatible — default behaviour is unchanged when env vars are not set.
Test plan
_truncate_tool_output()edge cases (happy to add if wanted)