Feature/configurable memory compression by seanturner83 · Pull Request #454 · usestrix/strix

seanturner83 · 2026-04-16T12:45:33Z

Summary

Adds three environment variables to tune the memory compressor for large-scale scanning campaigns, plus an additional prompt cache breakpoint for improved cache hit rates on Anthropic models.

New environment variables

Variable	Default	Purpose
`STRIX_MAX_CONTEXT_TOKENS`	100000	Token threshold before compression triggers
`STRIX_MIN_RECENT_MESSAGES`	15	Messages preserved from compression
`STRIX_MAX_TOOL_OUTPUT_CHARS`	0 (off)	Truncate oversized tool outputs at ingestion, keeping first 60% + last 40% with notice

Prompt caching improvement

Adds a second cache_control breakpoint on the agent identity message (<agent_identity> tag), which is stable for the lifetime of each agent. This complements the existing system prompt breakpoint. Related to #279.

Motivation

Strix's agentic architecture (6 agents, ~600 LLM calls per scan, full history resent every call) can produce 1.5M–26M input tokens per scan on large repos. The memory compressor currently only triggers at 90% of 100K tokens — by which point the cost damage is already done. Oversized tool outputs (nmap scans, large file reads) accumulate in history and get resent hundreds of times.

These changes make compression tunable and add tool output truncation at ingestion to prevent context bloat at the source.

A/B test results

Tested on a production corpus of 800 repositories at a crypto infrastructure company. Settings: MAX_CONTEXT_TOKENS=40000, MIN_RECENT_MESSAGES=10, MAX_TOOL_OUTPUT_CHARS=8000.

Per-scan comparison (large TypeScript repo with confirmed critical findings):

Metric	Stock	Optimized	Delta
Findings	15 (10C/5H)	16 (9C/7H)	+1 finding
Cost	$20.62	$9.55	−54%
Input tokens	8.7M	5.5M	−37%
Cache hit rate	34%	56%	+22pp

At scale (15-scan sample): Average cost dropped from $8.66 to $4.52 per scan.

No findings were lost. The optimised configuration actually found one additional vulnerability that the stock configuration missed (likely because the stock run hit context limits and lost relevant earlier context).

Changes

strix/config/config.py — 3 new config variables
strix/llm/memory_compressor.py — Configurable thresholds, truncate_tool_outputs() method called before compression in compress_history(), _truncate_tool_output() helper preserving head + tail
strix/llm/llm.py — Second cache breakpoint on agent identity message

All changes are backwards compatible — default behaviour is unchanged when env vars are not set.

Test plan

A/B tested on production corpus (800 repos, multiple repo sizes/languages)
Verified no finding quality regression
Confirmed defaults match current behaviour (no env vars = no change)
Unit tests for _truncate_tool_output() edge cases (happy to add if wanted)

…sion

greptile-apps · 2026-04-16T12:50:07Z

Greptile Summary

This PR makes the memory compressor thresholds configurable via three new env vars and adds a second prompt-cache breakpoint on the agent identity message. The config and cache changes are clean; the main concern is in truncate_tool_outputs.

P1 — role filtering missing in truncate_tool_outputs: the method iterates every message regardless of role, so with STRIX_MAX_TOOL_OUTPUT_CHARS set, system messages, user messages, and assistant messages with string content that exceed the limit are silently truncated alongside tool outputs. This can corrupt the agent's instruction prompt on every call.
P2 — misleading truncation notice: the inserted text says "Full output was captured but condensed," implying the missing content is accessible in compressed form, when the middle portion is permanently deleted. For a security agent this could cause false confidence about information availability.

Confidence Score: 4/5

Safe to merge once the role-filtering bug in truncate_tool_outputs is fixed; the feature is opt-in (default off) so existing deployments are unaffected, but enabling it as documented could truncate system prompts.

One P1 defect: truncate_tool_outputs has no role guard, so enabling STRIX_MAX_TOOL_OUTPUT_CHARS truncates all message types including system messages. The bug only activates when the new env var is set, keeping the default path safe, but the PR is intended to be used with non-zero values in production.

strix/llm/memory_compressor.py — specifically the truncate_tool_outputs method and TOOL_TRUNCATION_NOTICE constant.

Important Files Changed

Filename	Overview
strix/config/config.py	Adds three new None-defaulted config class attributes for the new compressor env vars; tracked correctly by the existing _tracked_names machinery.
strix/llm/memory_compressor.py	Adds configurable thresholds and truncate_tool_outputs; P1 bug — the method iterates all message roles so system/user/assistant messages with long string content are also truncated when the feature is enabled. Misleading truncation notice text is a secondary P2 concern.
strix/llm/llm.py	Adds a second prompt-cache breakpoint for the agent identity message; logic is correct for string content but silently skips adding cache_control when content is already a list.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 198-221

Comment:
**`truncate_tool_outputs` mutates non-tool messages**

The loop iterates over every message regardless of role, so when `STRIX_MAX_TOOL_OUTPUT_CHARS` is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than `max_tool_output_chars` (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the `type == "text"` guard already avoids native Anthropic `tool_result` items, but the string-content branch has no role guard at all.

```suggestion
        for msg in messages:
            if msg.get("role") not in ("tool",):
                continue
            content = msg.get("content", "")
            if isinstance(content, str) and len(content) > self.max_tool_output_chars:
                msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
            elif isinstance(content, list):
                for item in content:
                    if (
                        isinstance(item, dict)
                        and item.get("type") == "text"
                        and len(item.get("text", "")) > self.max_tool_output_chars
                    ):
                        item["text"] = _truncate_tool_output(
                            item["text"], self.max_tool_output_chars
                        )
```

If tool results can also arrive embedded inside `role: "user"` messages (e.g. Anthropic `type: "tool_result"` items), that branch would also need a `type == "tool_result"` check on the inner items to cover those correctly.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 16-19

Comment:
**Misleading truncation notice may deceive the security agent**

"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.

```suggestion
TOOL_TRUNCATION_NOTICE = (
    "\n\n[Output truncated: showing first {head_len} and last {tail_len} characters "
    "of {original_len}-character output (limit: {max_len}). "
    "The middle portion has been permanently removed.]"
)
```

Note: with this template you would also need to pass `head_len` and `tail_len` into the `.format()` call in `_truncate_tool_output`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 387-395

Comment:
**Cache breakpoint 2 silently skips list-typed content**

The block only wraps the agent identity message when `isinstance(content, str)`. If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the `<agent_identity>` check still matches via `str(...)` but no `cache_control` breakpoint is applied, silently losing the caching benefit. Adding an `else` branch to append `cache_control` to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Merge branch 'usestrix:main' into featur..." | Re-trigger Greptile}

greptile-apps · 2026-04-16T12:50:10Z

+    def truncate_tool_outputs(self, messages: list[dict[str, Any]]) -> None:
+        """Truncate large tool output messages in-place.
+
+        This prevents oversized tool results (nmap scans, file contents, etc.)
+        from accumulating in the conversation history and being resent on every
+        subsequent LLM call. Applied at ingestion time before the history grows.
+        """
+        if self.max_tool_output_chars <= 0:
+            return
+
+        for msg in messages:
+            content = msg.get("content", "")
+            if isinstance(content, str) and len(content) > self.max_tool_output_chars:
+                msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
+            elif isinstance(content, list):
+                for item in content:
+                    if (
+                        isinstance(item, dict)
+                        and item.get("type") == "text"
+                        and len(item.get("text", "")) > self.max_tool_output_chars
+                    ):
+                        item["text"] = _truncate_tool_output(
+                            item["text"], self.max_tool_output_chars
+                        )


truncate_tool_outputs mutates non-tool messages

The loop iterates over every message regardless of role, so when STRIX_MAX_TOOL_OUTPUT_CHARS is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than max_tool_output_chars (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the type == "text" guard already avoids native Anthropic tool_result items, but the string-content branch has no role guard at all.

Suggested change

def truncate_tool_outputs(self, messages: list[dict[str, Any]]) -> None:

"""Truncate large tool output messages in-place.

This prevents oversized tool results (nmap scans, file contents, etc.)

from accumulating in the conversation history and being resent on every

subsequent LLM call. Applied at ingestion time before the history grows.

"""

if self.max_tool_output_chars <= 0:

return

for msg in messages:

content = msg.get("content", "")

if isinstance(content, str) and len(content) > self.max_tool_output_chars:

msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)

elif isinstance(content, list):

for item in content:

if (

isinstance(item, dict)

and item.get("type") == "text"

and len(item.get("text", "")) > self.max_tool_output_chars

):

item["text"] = _truncate_tool_output(

item["text"], self.max_tool_output_chars

)

for msg in messages:

if msg.get("role") not in ("tool",):

continue

content = msg.get("content", "")

if isinstance(content, str) and len(content) > self.max_tool_output_chars:

msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)

elif isinstance(content, list):

for item in content:

if (

isinstance(item, dict)

and item.get("type") == "text"

and len(item.get("text", "")) > self.max_tool_output_chars

):

item["text"] = _truncate_tool_output(

item["text"], self.max_tool_output_chars

)

If tool results can also arrive embedded inside role: "user" messages (e.g. Anthropic type: "tool_result" items), that branch would also need a type == "tool_result" check on the inner items to cover those correctly.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/llm/memory_compressor.py Line: 198-221 Comment: **`truncate_tool_outputs` mutates non-tool messages** The loop iterates over every message regardless of role, so when `STRIX_MAX_TOOL_OUTPUT_CHARS` is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than `max_tool_output_chars` (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the `type == "text"` guard already avoids native Anthropic `tool_result` items, but the string-content branch has no role guard at all. ```suggestion for msg in messages: if msg.get("role") not in ("tool",): continue content = msg.get("content", "") if isinstance(content, str) and len(content) > self.max_tool_output_chars: msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars) elif isinstance(content, list): for item in content: if ( isinstance(item, dict) and item.get("type") == "text" and len(item.get("text", "")) > self.max_tool_output_chars ): item["text"] = _truncate_tool_output( item["text"], self.max_tool_output_chars ) ``` If tool results can also arrive embedded inside `role: "user"` messages (e.g. Anthropic `type: "tool_result"` items), that branch would also need a `type == "tool_result"` check on the inner items to cover those correctly. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-16T12:50:11Z

+TOOL_TRUNCATION_NOTICE = (
+    "\n\n[Output truncated from {original_len} to {max_len} characters. "
+    "Full output was captured but condensed to reduce context size.]"
+)


Misleading truncation notice may deceive the security agent

"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.

Suggested change

TOOL_TRUNCATION_NOTICE = (

"\n\n[Output truncated from {original_len} to {max_len} characters. "

"Full output was captured but condensed to reduce context size.]"

)

TOOL_TRUNCATION_NOTICE = (

"\n\n[Output truncated: showing first {head_len} and last {tail_len} characters "

"of {original_len}-character output (limit: {max_len}). "

"The middle portion has been permanently removed.]"

)

Note: with this template you would also need to pass head_len and tail_len into the .format() call in _truncate_tool_output.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/llm/memory_compressor.py Line: 16-19 Comment: **Misleading truncation notice may deceive the security agent** "Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them. ```suggestion TOOL_TRUNCATION_NOTICE = ( "\n\n[Output truncated: showing first {head_len} and last {tail_len} characters " "of {original_len}-character output (limit: {max_len}). " "The middle portion has been permanently removed.]" ) ``` Note: with this template you would also need to pass `head_len` and `tail_len` into the `.format()` call in `_truncate_tool_output`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-16T12:50:12Z

+        if len(result) > 1 and "<agent_identity>" in str(result[1].get("content", "")):
+            content = result[1]["content"]
+            if isinstance(content, str):
+                result[1] = {
+                    **result[1],
+                    "content": [
+                        {"type": "text", "text": content, "cache_control": {"type": "ephemeral"}}
+                    ],
+                }


Cache breakpoint 2 silently skips list-typed content

The block only wraps the agent identity message when isinstance(content, str). If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the <agent_identity> check still matches via str(...) but no cache_control breakpoint is applied, silently losing the caching benefit. Adding an else branch to append cache_control to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/llm/llm.py Line: 387-395 Comment: **Cache breakpoint 2 silently skips list-typed content** The block only wraps the agent identity message when `isinstance(content, str)`. If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the `<agent_identity>` check still matches via `str(...)` but no `cache_control` breakpoint is applied, silently losing the caching benefit. Adding an `else` branch to append `cache_control` to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content. How can I resolve this? If you propose a fix, please make it concise.

…ntent cache

seanturner83 added 2 commits April 16, 2026 11:29

feat: configurable memory compression and tool output truncation

2c5bbed

Merge branch 'usestrix:main' into feature/configurable-memory-compres…

319034c

…sion

greptile-apps bot reviewed Apr 16, 2026

View reviewed changes

fix: address review feedback — role guard, truncation notice, list co…

ad4e28b

…ntent cache

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/configurable memory compression#454

Feature/configurable memory compression#454
seanturner83 wants to merge 3 commits intousestrix:mainfrom
seanturner83:feature/configurable-memory-compression

seanturner83 commented Apr 16, 2026

Uh oh!

greptile-apps bot commented Apr 16, 2026

Uh oh!

greptile-apps bot Apr 16, 2026

Uh oh!

greptile-apps bot Apr 16, 2026

Uh oh!

greptile-apps bot Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seanturner83 commented Apr 16, 2026

Summary

New environment variables

Prompt caching improvement

Motivation

A/B test results

Changes

Test plan

Uh oh!

greptile-apps bot commented Apr 16, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant