Skip to content

Feature/configurable memory compression#454

Open
seanturner83 wants to merge 3 commits intousestrix:mainfrom
seanturner83:feature/configurable-memory-compression
Open

Feature/configurable memory compression#454
seanturner83 wants to merge 3 commits intousestrix:mainfrom
seanturner83:feature/configurable-memory-compression

Conversation

@seanturner83
Copy link
Copy Markdown

Summary

Adds three environment variables to tune the memory compressor for large-scale scanning campaigns, plus an additional prompt cache breakpoint for improved cache hit rates on Anthropic models.

New environment variables

Variable Default Purpose
STRIX_MAX_CONTEXT_TOKENS 100000 Token threshold before compression triggers
STRIX_MIN_RECENT_MESSAGES 15 Messages preserved from compression
STRIX_MAX_TOOL_OUTPUT_CHARS 0 (off) Truncate oversized tool outputs at ingestion, keeping first 60% + last 40% with notice

Prompt caching improvement

Adds a second cache_control breakpoint on the agent identity message (<agent_identity> tag), which is stable for the lifetime of each agent. This complements the existing system prompt breakpoint. Related to #279.

Motivation

Strix's agentic architecture (6 agents, ~600 LLM calls per scan, full history resent every call) can produce 1.5M–26M input tokens per scan on large repos. The memory compressor currently only triggers at 90% of 100K tokens — by which point the cost damage is already done. Oversized tool outputs (nmap scans, large file reads) accumulate in history and get resent hundreds of times.

These changes make compression tunable and add tool output truncation at ingestion to prevent context bloat at the source.

A/B test results

Tested on a production corpus of 800 repositories at a crypto infrastructure company. Settings: MAX_CONTEXT_TOKENS=40000, MIN_RECENT_MESSAGES=10, MAX_TOOL_OUTPUT_CHARS=8000.

Per-scan comparison (large TypeScript repo with confirmed critical findings):

Metric Stock Optimized Delta
Findings 15 (10C/5H) 16 (9C/7H) +1 finding
Cost $20.62 $9.55 −54%
Input tokens 8.7M 5.5M −37%
Cache hit rate 34% 56% +22pp

At scale (15-scan sample): Average cost dropped from $8.66 to $4.52 per scan.

No findings were lost. The optimised configuration actually found one additional vulnerability that the stock configuration missed (likely because the stock run hit context limits and lost relevant earlier context).

Changes

  • strix/config/config.py — 3 new config variables
  • strix/llm/memory_compressor.py — Configurable thresholds, truncate_tool_outputs() method called before compression in compress_history(), _truncate_tool_output() helper preserving head + tail
  • strix/llm/llm.py — Second cache breakpoint on agent identity message

All changes are backwards compatible — default behaviour is unchanged when env vars are not set.

Test plan

  • A/B tested on production corpus (800 repos, multiple repo sizes/languages)
  • Verified no finding quality regression
  • Confirmed defaults match current behaviour (no env vars = no change)
  • Unit tests for _truncate_tool_output() edge cases (happy to add if wanted)

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 16, 2026

Greptile Summary

This PR makes the memory compressor thresholds configurable via three new env vars and adds a second prompt-cache breakpoint on the agent identity message. The config and cache changes are clean; the main concern is in truncate_tool_outputs.

  • P1 — role filtering missing in truncate_tool_outputs: the method iterates every message regardless of role, so with STRIX_MAX_TOOL_OUTPUT_CHARS set, system messages, user messages, and assistant messages with string content that exceed the limit are silently truncated alongside tool outputs. This can corrupt the agent's instruction prompt on every call.
  • P2 — misleading truncation notice: the inserted text says "Full output was captured but condensed," implying the missing content is accessible in compressed form, when the middle portion is permanently deleted. For a security agent this could cause false confidence about information availability.

Confidence Score: 4/5

Safe to merge once the role-filtering bug in truncate_tool_outputs is fixed; the feature is opt-in (default off) so existing deployments are unaffected, but enabling it as documented could truncate system prompts.

One P1 defect: truncate_tool_outputs has no role guard, so enabling STRIX_MAX_TOOL_OUTPUT_CHARS truncates all message types including system messages. The bug only activates when the new env var is set, keeping the default path safe, but the PR is intended to be used with non-zero values in production.

strix/llm/memory_compressor.py — specifically the truncate_tool_outputs method and TOOL_TRUNCATION_NOTICE constant.

Important Files Changed

Filename Overview
strix/config/config.py Adds three new None-defaulted config class attributes for the new compressor env vars; tracked correctly by the existing _tracked_names machinery.
strix/llm/memory_compressor.py Adds configurable thresholds and truncate_tool_outputs; P1 bug — the method iterates all message roles so system/user/assistant messages with long string content are also truncated when the feature is enabled. Misleading truncation notice text is a secondary P2 concern.
strix/llm/llm.py Adds a second prompt-cache breakpoint for the agent identity message; logic is correct for string content but silently skips adding cache_control when content is already a list.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 198-221

Comment:
**`truncate_tool_outputs` mutates non-tool messages**

The loop iterates over every message regardless of role, so when `STRIX_MAX_TOOL_OUTPUT_CHARS` is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than `max_tool_output_chars` (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the `type == "text"` guard already avoids native Anthropic `tool_result` items, but the string-content branch has no role guard at all.

```suggestion
        for msg in messages:
            if msg.get("role") not in ("tool",):
                continue
            content = msg.get("content", "")
            if isinstance(content, str) and len(content) > self.max_tool_output_chars:
                msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
            elif isinstance(content, list):
                for item in content:
                    if (
                        isinstance(item, dict)
                        and item.get("type") == "text"
                        and len(item.get("text", "")) > self.max_tool_output_chars
                    ):
                        item["text"] = _truncate_tool_output(
                            item["text"], self.max_tool_output_chars
                        )
```

If tool results can also arrive embedded inside `role: "user"` messages (e.g. Anthropic `type: "tool_result"` items), that branch would also need a `type == "tool_result"` check on the inner items to cover those correctly.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 16-19

Comment:
**Misleading truncation notice may deceive the security agent**

"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.

```suggestion
TOOL_TRUNCATION_NOTICE = (
    "\n\n[Output truncated: showing first {head_len} and last {tail_len} characters "
    "of {original_len}-character output (limit: {max_len}). "
    "The middle portion has been permanently removed.]"
)
```

Note: with this template you would also need to pass `head_len` and `tail_len` into the `.format()` call in `_truncate_tool_output`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 387-395

Comment:
**Cache breakpoint 2 silently skips list-typed content**

The block only wraps the agent identity message when `isinstance(content, str)`. If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the `<agent_identity>` check still matches via `str(...)` but no `cache_control` breakpoint is applied, silently losing the caching benefit. Adding an `else` branch to append `cache_control` to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge branch 'usestrix:main' into featur..." | Re-trigger Greptile

Comment on lines +198 to +221
def truncate_tool_outputs(self, messages: list[dict[str, Any]]) -> None:
"""Truncate large tool output messages in-place.

This prevents oversized tool results (nmap scans, file contents, etc.)
from accumulating in the conversation history and being resent on every
subsequent LLM call. Applied at ingestion time before the history grows.
"""
if self.max_tool_output_chars <= 0:
return

for msg in messages:
content = msg.get("content", "")
if isinstance(content, str) and len(content) > self.max_tool_output_chars:
msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
elif isinstance(content, list):
for item in content:
if (
isinstance(item, dict)
and item.get("type") == "text"
and len(item.get("text", "")) > self.max_tool_output_chars
):
item["text"] = _truncate_tool_output(
item["text"], self.max_tool_output_chars
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 truncate_tool_outputs mutates non-tool messages

The loop iterates over every message regardless of role, so when STRIX_MAX_TOOL_OUTPUT_CHARS is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than max_tool_output_chars (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the type == "text" guard already avoids native Anthropic tool_result items, but the string-content branch has no role guard at all.

Suggested change
def truncate_tool_outputs(self, messages: list[dict[str, Any]]) -> None:
"""Truncate large tool output messages in-place.
This prevents oversized tool results (nmap scans, file contents, etc.)
from accumulating in the conversation history and being resent on every
subsequent LLM call. Applied at ingestion time before the history grows.
"""
if self.max_tool_output_chars <= 0:
return
for msg in messages:
content = msg.get("content", "")
if isinstance(content, str) and len(content) > self.max_tool_output_chars:
msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
elif isinstance(content, list):
for item in content:
if (
isinstance(item, dict)
and item.get("type") == "text"
and len(item.get("text", "")) > self.max_tool_output_chars
):
item["text"] = _truncate_tool_output(
item["text"], self.max_tool_output_chars
)
for msg in messages:
if msg.get("role") not in ("tool",):
continue
content = msg.get("content", "")
if isinstance(content, str) and len(content) > self.max_tool_output_chars:
msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
elif isinstance(content, list):
for item in content:
if (
isinstance(item, dict)
and item.get("type") == "text"
and len(item.get("text", "")) > self.max_tool_output_chars
):
item["text"] = _truncate_tool_output(
item["text"], self.max_tool_output_chars
)

If tool results can also arrive embedded inside role: "user" messages (e.g. Anthropic type: "tool_result" items), that branch would also need a type == "tool_result" check on the inner items to cover those correctly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 198-221

Comment:
**`truncate_tool_outputs` mutates non-tool messages**

The loop iterates over every message regardless of role, so when `STRIX_MAX_TOOL_OUTPUT_CHARS` is set, system messages, user messages, and assistant messages are all subject to truncation — not just tool outputs. A system prompt larger than `max_tool_output_chars` (e.g. 8 000 chars) would be silently truncated, stripping part of the agent's instructions on every call. In the list-content branch the `type == "text"` guard already avoids native Anthropic `tool_result` items, but the string-content branch has no role guard at all.

```suggestion
        for msg in messages:
            if msg.get("role") not in ("tool",):
                continue
            content = msg.get("content", "")
            if isinstance(content, str) and len(content) > self.max_tool_output_chars:
                msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
            elif isinstance(content, list):
                for item in content:
                    if (
                        isinstance(item, dict)
                        and item.get("type") == "text"
                        and len(item.get("text", "")) > self.max_tool_output_chars
                    ):
                        item["text"] = _truncate_tool_output(
                            item["text"], self.max_tool_output_chars
                        )
```

If tool results can also arrive embedded inside `role: "user"` messages (e.g. Anthropic `type: "tool_result"` items), that branch would also need a `type == "tool_result"` check on the inner items to cover those correctly.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +16 to +19
TOOL_TRUNCATION_NOTICE = (
"\n\n[Output truncated from {original_len} to {max_len} characters. "
"Full output was captured but condensed to reduce context size.]"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading truncation notice may deceive the security agent

"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.

Suggested change
TOOL_TRUNCATION_NOTICE = (
"\n\n[Output truncated from {original_len} to {max_len} characters. "
"Full output was captured but condensed to reduce context size.]"
)
TOOL_TRUNCATION_NOTICE = (
"\n\n[Output truncated: showing first {head_len} and last {tail_len} characters "
"of {original_len}-character output (limit: {max_len}). "
"The middle portion has been permanently removed.]"
)

Note: with this template you would also need to pass head_len and tail_len into the .format() call in _truncate_tool_output.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 16-19

Comment:
**Misleading truncation notice may deceive the security agent**

"Full output was captured but condensed to reduce context size" implies the information exists somewhere in compressed form. In reality the middle portion is permanently deleted — there is no condensed version. A security agent reading this in its context window may incorrectly infer that missing findings (credentials, nmap results, etc.) are still accessible elsewhere and not re-run the relevant tool to recover them.

```suggestion
TOOL_TRUNCATION_NOTICE = (
    "\n\n[Output truncated: showing first {head_len} and last {tail_len} characters "
    "of {original_len}-character output (limit: {max_len}). "
    "The middle portion has been permanently removed.]"
)
```

Note: with this template you would also need to pass `head_len` and `tail_len` into the `.format()` call in `_truncate_tool_output`.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread strix/llm/llm.py
Comment on lines +387 to +395
if len(result) > 1 and "<agent_identity>" in str(result[1].get("content", "")):
content = result[1]["content"]
if isinstance(content, str):
result[1] = {
**result[1],
"content": [
{"type": "text", "text": content, "cache_control": {"type": "ephemeral"}}
],
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Cache breakpoint 2 silently skips list-typed content

The block only wraps the agent identity message when isinstance(content, str). If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the <agent_identity> check still matches via str(...) but no cache_control breakpoint is applied, silently losing the caching benefit. Adding an else branch to append cache_control to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 387-395

Comment:
**Cache breakpoint 2 silently skips list-typed content**

The block only wraps the agent identity message when `isinstance(content, str)`. If the content is already a list (e.g. the first cache breakpoint already converted it, or the caller built a multi-part message), the `<agent_identity>` check still matches via `str(...)` but no `cache_control` breakpoint is applied, silently losing the caching benefit. Adding an `else` branch to append `cache_control` to the last item of an existing list would make the behaviour consistent with how the Anthropic SDK expects breakpoints on list content.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant