How CLIO remembers, learns, and maintains continuity across sessions.
CLIO has a three-tier memory system designed to give AI agents the ability to learn and improve over time, maintain context during long sessions, and recover gracefully when context windows overflow.
Unlike most AI assistants that start fresh every conversation, CLIO accumulates project-specific knowledge that persists indefinitely. An agent working on your codebase today benefits from everything learned in previous sessions - discovered patterns, solved problems, and established conventions.
CLIO Memory Architecture
Within a Session Across Sessions
================== ==================
Short-Term Memory (STM) Long-Term Memory (LTM)
- Sliding window of recent - Discoveries about the codebase
messages - Problem-solution pairs
- Working context for the AI - Code patterns and conventions
- Auto-pruned when full - Persisted in .clio/ltm.json
YaRN Threads Session-Level Store
- Full conversation archive - Key-value pairs in .clio/memory/
- Compression for recovery - Investigation notes, checkpoints
- Never loses messages - Available via recall_sessions
Module: lib/CLIO/Memory/ShortTerm.pm
Short-Term Memory is the sliding window of recent messages that forms the AI's working context for the current turn. It holds the most recent conversation history used when building the API request.
- Every message (user, assistant, tool call, tool result) is added to STM
- When STM exceeds its configured maximum size, oldest messages are pruned
- The pruned messages aren't lost - they're preserved in YaRN threads and session history
- Fixed-size FIFO - Oldest messages are dropped first when the window is full
- Defensive normalization - Handles legacy formats, strips conversation markup, validates message structure
- Embedded in session files - STM state is saved as part of the session JSON, allowing seamless session resume
STM is not something users interact with directly. It operates transparently as part of the context management pipeline.
Module: lib/CLIO/Memory/YaRN.pm
YaRN is CLIO's conversation archival and compression system. While STM keeps a sliding window, YaRN keeps everything - the complete conversation history for each session, organized into threads.
When context trimming drops messages from the active window (because the AI's context limit is approaching), those messages aren't lost. YaRN preserves them. More importantly, YaRN can compress dropped messages into concise summaries that capture the essential information:
- What the user asked
- What files were read and modified
- What git commits were made
- What decisions were reached through collaboration
- What tools were used and how often
When the context window needs trimming, compress_messages() takes the messages about to be dropped and extracts:
| Category | What's Extracted |
|---|---|
| User requests | The last N user messages (truncated to ~300 chars each) |
| Current task | Most recent user message - the active work being done |
| Git commits | Commit hashes and messages from tool output |
| Files touched | File paths from tool call arguments (path, new_path, old_path) |
| Key decisions | Collaboration exchanges (question + user response) |
| Tool usage | Counts of each tool type used |
The result is a single system message wrapped in <thread_summary> tags that gets injected into the trimmed context. Critically, the <thread_summary> is preserved across multiple trim cycles - each new compression merges with the previous summary, building an accumulating record of the entire session.
After context trimming, CLIO agents continue working without announcing that context was lost. The thread_summary provides enough continuity that no recovery stumbling is needed:
- No "I've recovered context" announcements
- No re-reading handoff documents
- No asking the user what to do next
- Just continuing work as if nothing changed
The recovery injection includes neutral language ("Older conversation history has been summarized") rather than disruption signals, and explicitly instructs the agent to keep working.
After aggressive context trimming, the AI might otherwise "forget" what it was working on. YaRN compression plus the recovery injection system means the AI gets:
- A merged summary of everything dropped (accumulated across trim cycles)
- The current task anchor (most recent user message - what was being worked on NOW, not at session start)
- The current todo/task state
- Recent git activity (commits, working tree status)
This is why CLIO agents can work for hours on complex tasks across multiple topic transitions without losing track of their current objectives. In long sessions where early work is long done and the agent has moved through several task transitions, the original session-start message is intentionally NOT re-injected - it's stale and misleading. The thread_summary already captures it. The most recent user message represents the actual current work.
Module: lib/CLIO/Memory/LongTerm.pm
Storage: .clio/ltm.json (per project)
Long-Term Memory is CLIO's project-level knowledge base. It persists across all sessions and accumulates knowledge about your specific codebase and workflows.
| Type | Purpose | Example |
|---|---|---|
| Discoveries | Facts about the codebase | "CLIO uses CLIO::Util::JSON for all JSON encoding" |
| Solutions | Problem-fix pairs | "If streaming 400 errors occur, increase retry budget to 20" |
| Patterns | Coding conventions | "Always use atomic writes (temp + rename) for session files" |
Each entry includes:
- Confidence score (0.0-1.0) - Higher scores indicate well-verified knowledge
- Timestamps - When first discovered and last confirmed
- Examples - File paths demonstrating the pattern
- Application count - How many times a solution has been used
At the start of every session, LTM entries are formatted and injected into the system prompt by PromptManager. The AI sees all accumulated project knowledge before you even ask your first question.
The injection includes:
- Key Discoveries - Up to 15 high-confidence facts, newest first
- Problem Solutions - Up to 15 error/solution pairs with application counts
- Code Patterns - Up to 10 verified patterns with example file paths
This means an agent starting a new session already knows: what coding conventions your project uses, what bugs have been fixed before and how, and what patterns to follow. No re-discovery needed.
LTM injection can be disabled with --no-ltm or --incognito flags for sessions where you want a clean slate.
During a session, the AI adds new entries via the memory_operations tool:
# Discover a fact about the codebase
memory_operations(operation: "add_discovery", fact: "Config uses YAML not JSON", confidence: 0.9)
# Record a problem and its solution
memory_operations(operation: "add_solution",
error: "Session save fails with permission denied",
solution: "Check .clio/ directory ownership, must match current user",
examples: ["lib/CLIO/Session/State.pm"])
# Document a coding pattern
memory_operations(operation: "add_pattern",
pattern: "All file writes use atomic temp+rename pattern",
confidence: 0.95,
examples: ["lib/CLIO/Memory/LongTerm.pm", "lib/CLIO/Session/State.pm"])
Agents are instructed to add LTM entries when they discover something significant - a new pattern, a bug fix that could recur, or a fact about the codebase structure. This happens organically during normal work sessions.
Old or low-confidence entries are cleaned up to keep LTM focused:
memory_operations(operation: "prune_ltm", max_age_days: 90, min_confidence: 0.3)
memory_operations(operation: "ltm_stats") # Check current LTM size
LTM saves are atomic: data is written to a temporary file (with PID suffix to handle concurrent agents) and then renamed to the target path. This prevents corruption if a process is killed mid-write.
Module: lib/CLIO/Tools/MemoryOperations.pm
Storage: .clio/memory/<key>.json
The session-level store is a simple key-value system for temporary notes, investigation findings, and working data. Unlike LTM (which accumulates project knowledge), the session store is for per-task scratch data that an agent needs to reference during a session.
Agents store working notes during complex investigations:
# Store investigation findings
memory_operations(operation: "store",
key: "auth_bug_analysis",
content: "Root cause: token refresh uses return inside eval, loses result")
# Retrieve later in the session
memory_operations(operation: "retrieve", key: "auth_bug_analysis")
# Search across all stored memories
memory_operations(operation: "search", query: "token refresh")
# List everything stored
memory_operations(operation: "list")
| Operation | Description |
|---|---|
store |
Write a key-value pair to .clio/memory/ |
retrieve |
Read a stored value by key |
search |
Find memories matching a keyword |
list |
List all stored memory keys |
delete |
Remove a stored memory |
The session-level store is also used for automatic checkpoints. Before context trimming events, CLIO writes a session_progress.md checkpoint that includes the current task state, recent tool calls, and iteration count. After recovery, agents can retrieve this checkpoint to understand where they were.
Operation: memory_operations(operation: "recall_sessions")
Cross-session recall lets agents search through all previous session transcripts for relevant context. This is one of CLIO's most powerful memory features - it means knowledge isn't limited to what's in LTM. Anything discussed in any previous session is searchable.
- CLIO reads all session files from
.clio/sessions/, sorted newest-first - For each session (up to
max_sessions), it loads the message history - Messages are scored against the search query using:
- Exact match boost (+3) - Query appears verbatim in the message
- Keyword scoring (+1 per keyword) - Individual words from the query found
- Density bonus (+1.5) - High ratio of matching keywords to total content
- Title relevance (+0.5) - Session name matches the query
- Top results are returned with preview text
Agents use recall_sessions in several situations:
# After context trimming - recover lost information
memory_operations(operation: "recall_sessions",
query: "authentication refactor approach",
max_sessions: 10,
max_results: 5)
# Before starting work - check if similar work was done
memory_operations(operation: "recall_sessions",
query: "worktree implementation")
# Understanding past decisions
memory_operations(operation: "recall_sessions",
query: "why we chose atomic writes")
When aggressive context trimming occurs, the recovery injection system tells agents to use recall_sessions to fill in gaps rather than re-reading handoff documentation. This is more efficient because recall_sessions returns targeted, relevant excerpts rather than entire documents.
These memory components work together in a coordinated pipeline to keep the AI effective during long sessions.
AI models have a fixed context window (e.g., 128K tokens for Claude Sonnet, 200K for Claude Opus). A long session with many tool calls can easily exceed this. CLIO's context management prevents overflow without losing critical information.
Stage 1: Proactive Trim (before API call, every iteration)
WorkflowOrchestrator checks messages against 75% of context window
If over: MessageValidator drops oldest message units (budget-walk newest to oldest)
Dropped messages -> YaRN compression -> thread_summary injected
thread_summary is preserved and merged across successive trim cycles
Stage 2: Validation Trim (just before sending to API)
Final check against effective token limit
Smart unit-based truncation (keeps tool call/result pairs together)
Post-trim target: 50% of max prompt tokens
Stage 3: Reactive Trim (after API rejection)
If API returns token_limit_exceeded despite proactive trim:
Progressive reduction across up to 3 retry attempts (50% -> 25% -> minimal)
Each retry injects recovery context (YaRN summary + todo state + git activity)
Most recent user message preserved as the current task anchor
Module: lib/CLIO/Memory/TokenEstimator.pm
Token estimation uses a character-to-token ratio that starts at a conservative default and learns from actual API responses. Each streaming response with real usage data updates the ratio, making estimates more accurate over time.
The learned ratio is critical - an inaccurate ratio means proactive trimming either fires too aggressively (wasting context) or too late (causing API rejections).
When messages must be dropped, CLIO prioritizes keeping:
- System prompt - Always preserved
- Most recent user message - The current task anchor (newest user message, not the session-start message)
- Recent messages - Most recent conversation context (budget-walked newest to oldest)
- Tool call/result pairs - Kept together to avoid orphaned results
- Thread summary - Compressed history of dropped messages, injected before the conversation
All memory data is stored in the .clio/ directory within the project root:
.clio/
ltm.json # Long-Term Memory (project knowledge)
memory/
session_progress.md # Checkpoint written before trim events
<key>.json # Session-level key-value pairs
sessions/
<session-id>.json # Full session state (history, STM, YaRN, billing)
Each session JSON file contains:
| Field | Content |
|---|---|
history |
Complete message array (all roles) |
stm |
Short-term memory state |
yarn |
YaRN thread archive |
billing |
Token usage records per request |
working_directory |
Where the session was started |
session_name |
Human-readable session name |
created_at |
Session creation timestamp |
The .clio/ltm.json file contains:
{
"patterns": {
"discoveries": [...],
"problem_solutions": [...],
"code_patterns": [...],
"workflows": [...],
"failures": [...],
"context_rules": [...]
},
"metadata": {
"created": "timestamp",
"last_updated": "timestamp",
"version": "1.0"
}
}| Command | What It Does |
|---|---|
/memory list |
Show stored session memories |
/memory search <query> |
Search memory by keyword |
/memory stats |
LTM statistics (entry counts, ages) |
/memory prune |
Clean up old/low-confidence LTM entries |
| Command | What It Does |
|---|---|
/session show |
Current session info and usage |
/session list |
All saved sessions |
/session switch <id> |
Resume a previous session |
/session trim |
Manually trim context |
/session export <path> |
Export session to self-contained HTML |
Messages trimmed from the active context are preserved in YaRN threads and session history. The full conversation is always available on disk, even when the AI can only "see" a window of it.
When an agent discovers something about your codebase - a coding convention, a bug fix pattern, a module relationship - it stores it in LTM. Every future session benefits from that knowledge without re-discovery.
When context limits are hit, CLIO doesn't crash or lose track. It compresses what was lost into a summary, preserves the most important context, and injects recovery information. The AI continues working with reduced but coherent context.
All persistent storage (LTM, sessions, memory) uses atomic write patterns (temp file + rename) to prevent corruption from process kills or concurrent access. LTM writes use PID-suffixed temp files to handle multiple agents working in the same project.
The memory system isn't just infrastructure - it's actively used by agents throughout their work. Here's how the pieces come together in a typical session:
- LTM injection - All project knowledge is loaded into the system prompt
- The agent sees discoveries, solutions, and patterns before you type anything
- If resuming a session, YaRN threads and STM are restored from the session file
- Tool calls - Every file read, command executed, and search performed is recorded in STM, YaRN, and session history
- Investigation notes - Agents store findings in the session-level store for reference later
- Learning - When agents discover new patterns or solve novel problems, they add entries to LTM
- Todo tracking - Task state is maintained through the todo_operations tool, providing structure that survives context trims
- Proactive trim fires when approaching 75% of the model's context window
- Oldest messages are compressed via YaRN into a summary
- The summary is injected as a system message so the AI knows what was dropped
- A progress checkpoint is written to
.clio/memory/session_progress.md
The recovery injection tells the agent:
- Check LTM patterns already in the system prompt
- Use
recall_sessionsto search past sessions for specific information - Retrieve the
session_progresscheckpoint for task state - Use git log and todo state to understand current progress
- Do NOT read handoff documentation (which would waste the newly freed context space)
- LTM persists with all accumulated knowledge
- Session files contain the complete conversation archive
- Session-level memories in
.clio/memory/remain available - Next session gets all LTM entries injected automatically
Running CLIO with --incognito disables all memory persistence:
- No LTM injection into prompts
- No session saving
- No memory writes
- No user profile injection
Running with --no-ltm skips just the LTM injection while keeping session persistence. Useful when you want a fresh perspective without accumulated assumptions.
All memory data lives in the project's .clio/ directory (gitignored by default). Nothing is sent to external services - memory is purely local. The only data that leaves your machine is the conversation context sent to the AI provider for each API call.