CLIO Memory Architecture

How CLIO remembers, learns, and maintains continuity across sessions.

Overview

CLIO has a three-tier memory system designed to give AI agents the ability to learn and improve over time, maintain context during long sessions, and recover gracefully when context windows overflow.

Unlike most AI assistants that start fresh every conversation, CLIO accumulates project-specific knowledge that persists indefinitely. An agent working on your codebase today benefits from everything learned in previous sessions - discovered patterns, solved problems, and established conventions.

                       CLIO Memory Architecture

 Within a Session                    Across Sessions
 ==================                  ==================

 Short-Term Memory (STM)             Long-Term Memory (LTM)
 - Sliding window of recent          - Discoveries about the codebase
   messages                          - Problem-solution pairs
 - Working context for the AI        - Code patterns and conventions
 - Auto-pruned when full             - Persisted in .clio/ltm.json

 YaRN Threads                        Session-Level Store
 - Full conversation archive          - Key-value pairs in .clio/memory/
 - Compression for recovery          - Investigation notes, checkpoints
 - Never loses messages              - Available via recall_sessions

Short-Term Memory

Module: lib/CLIO/Memory/ShortTerm.pm

Short-Term Memory is the sliding window of recent messages that forms the AI's working context for the current turn. It holds the most recent conversation history used when building the API request.

How It Works

Every message (user, assistant, tool call, tool result) is added to STM
When STM exceeds its configured maximum size, oldest messages are pruned
The pruned messages aren't lost - they're preserved in YaRN threads and session history

Key Characteristics

Fixed-size FIFO - Oldest messages are dropped first when the window is full
Defensive normalization - Handles legacy formats, strips conversation markup, validates message structure
Embedded in session files - STM state is saved as part of the session JSON, allowing seamless session resume

STM is not something users interact with directly. It operates transparently as part of the context management pipeline.

YaRN (Yet another Recurrence Navigation)

Module: lib/CLIO/Memory/YaRN.pm

YaRN is CLIO's conversation archival and compression system. While STM keeps a sliding window, YaRN keeps everything - the complete conversation history for each session, organized into threads.

Why YaRN Matters

When context trimming drops messages from the active window (because the AI's context limit is approaching), those messages aren't lost. YaRN preserves them. More importantly, YaRN can compress dropped messages into concise summaries that capture the essential information:

What the user asked
What files were read and modified
What git commits were made
What decisions were reached through collaboration
What tools were used and how often

Compression

When the context window needs trimming, compress_messages() takes the messages about to be dropped and extracts:

Category	What's Extracted
User requests	The last N user messages (truncated to ~300 chars each)
Current task	Most recent user message - the active work being done
Git commits	Commit hashes and messages from tool output
Files touched	File paths from tool call arguments (path, new_path, old_path)
Key decisions	Collaboration exchanges (question + user response)
Tool usage	Counts of each tool type used

The result is a single system message wrapped in <thread_summary> tags that gets injected into the trimmed context. Critically, the <thread_summary> is preserved across multiple trim cycles - each new compression merges with the previous summary, building an accumulating record of the entire session.

Seamless Recovery

After context trimming, CLIO agents continue working without announcing that context was lost. The thread_summary provides enough continuity that no recovery stumbling is needed:

No "I've recovered context" announcements
No re-reading handoff documents
No asking the user what to do next
Just continuing work as if nothing changed

The recovery injection includes neutral language ("Older conversation history has been summarized") rather than disruption signals, and explicitly instructs the agent to keep working.

Session Recovery

After aggressive context trimming, the AI might otherwise "forget" what it was working on. YaRN compression plus the recovery injection system means the AI gets:

A merged summary of everything dropped (accumulated across trim cycles)
The current task anchor (most recent user message - what was being worked on NOW, not at session start)
The current todo/task state
Recent git activity (commits, working tree status)

This is why CLIO agents can work for hours on complex tasks across multiple topic transitions without losing track of their current objectives. In long sessions where early work is long done and the agent has moved through several task transitions, the original session-start message is intentionally NOT re-injected - it's stale and misleading. The thread_summary already captures it. The most recent user message represents the actual current work.

Long-Term Memory (LTM)

Module: lib/CLIO/Memory/LongTerm.pm
Storage: .clio/ltm.json (per project)

Long-Term Memory is CLIO's project-level knowledge base. It persists across all sessions and accumulates knowledge about your specific codebase and workflows.

What Gets Stored

Type	Purpose	Example
Discoveries	Facts about the codebase	"CLIO uses CLIO::Util::JSON for all JSON encoding"
Solutions	Problem-fix pairs	"If streaming 400 errors occur, increase retry budget to 20"
Patterns	Coding conventions	"Always use atomic writes (temp + rename) for session files"

Each entry includes:

Confidence score (0.0-1.0) - Higher scores indicate well-verified knowledge
Timestamps - When first discovered and last confirmed
Examples - File paths demonstrating the pattern
Application count - How many times a solution has been used

Automatic Prompt Injection

At the start of every session, LTM entries are formatted and injected into the system prompt by PromptManager. The AI sees all accumulated project knowledge before you even ask your first question.

The injection includes:

Key Discoveries - Up to 15 high-confidence facts, newest first
Problem Solutions - Up to 15 error/solution pairs with application counts
Code Patterns - Up to 10 verified patterns with example file paths

This means an agent starting a new session already knows: what coding conventions your project uses, what bugs have been fixed before and how, and what patterns to follow. No re-discovery needed.

LTM injection can be disabled with --no-ltm or --incognito flags for sessions where you want a clean slate.

How Agents Learn

During a session, the AI adds new entries via the memory_operations tool:

# Discover a fact about the codebase
memory_operations(operation: "add_discovery", fact: "Config uses YAML not JSON", confidence: 0.9)

# Record a problem and its solution
memory_operations(operation: "add_solution",
    error: "Session save fails with permission denied",
    solution: "Check .clio/ directory ownership, must match current user",
    examples: ["lib/CLIO/Session/State.pm"])

# Document a coding pattern
memory_operations(operation: "add_pattern",
    pattern: "All file writes use atomic temp+rename pattern",
    confidence: 0.95,
    examples: ["lib/CLIO/Memory/LongTerm.pm", "lib/CLIO/Session/State.pm"])

Agents are instructed to add LTM entries when they discover something significant - a new pattern, a bug fix that could recur, or a fact about the codebase structure. This happens organically during normal work sessions.

Pruning

Old or low-confidence entries are cleaned up to keep LTM focused:

memory_operations(operation: "prune_ltm", max_age_days: 90, min_confidence: 0.3)
memory_operations(operation: "ltm_stats")  # Check current LTM size

Atomic Persistence

LTM saves are atomic: data is written to a temporary file (with PID suffix to handle concurrent agents) and then renamed to the target path. This prevents corruption if a process is killed mid-write.

Session-Level Store

Module: lib/CLIO/Tools/MemoryOperations.pm
Storage: .clio/memory/<key>.json

The session-level store is a simple key-value system for temporary notes, investigation findings, and working data. Unlike LTM (which accumulates project knowledge), the session store is for per-task scratch data that an agent needs to reference during a session.

How Agents Use It

Agents store working notes during complex investigations:

# Store investigation findings
memory_operations(operation: "store",
    key: "auth_bug_analysis",
    content: "Root cause: token refresh uses return inside eval, loses result")

# Retrieve later in the session
memory_operations(operation: "retrieve", key: "auth_bug_analysis")

# Search across all stored memories
memory_operations(operation: "search", query: "token refresh")

# List everything stored
memory_operations(operation: "list")

Operations

Operation	Description
`store`	Write a key-value pair to `.clio/memory/`
`retrieve`	Read a stored value by key
`search`	Find memories matching a keyword
`list`	List all stored memory keys
`delete`	Remove a stored memory

The session-level store is also used for automatic checkpoints. Before context trimming events, CLIO writes a session_progress.md checkpoint that includes the current task state, recent tool calls, and iteration count. After recovery, agents can retrieve this checkpoint to understand where they were.

Cross-Session Recall

Operation: memory_operations(operation: "recall_sessions")

Cross-session recall lets agents search through all previous session transcripts for relevant context. This is one of CLIO's most powerful memory features - it means knowledge isn't limited to what's in LTM. Anything discussed in any previous session is searchable.

How It Works

CLIO reads all session files from .clio/sessions/, sorted newest-first
For each session (up to max_sessions), it loads the message history
Messages are scored against the search query using:
- Exact match boost (+3) - Query appears verbatim in the message
- Keyword scoring (+1 per keyword) - Individual words from the query found
- Density bonus (+1.5) - High ratio of matching keywords to total content
- Title relevance (+0.5) - Session name matches the query
Top results are returned with preview text

Agent Usage

Agents use recall_sessions in several situations:

# After context trimming - recover lost information
memory_operations(operation: "recall_sessions",
    query: "authentication refactor approach",
    max_sessions: 10,
    max_results: 5)

# Before starting work - check if similar work was done
memory_operations(operation: "recall_sessions",
    query: "worktree implementation")

# Understanding past decisions
memory_operations(operation: "recall_sessions",
    query: "why we chose atomic writes")

After Context Recovery

When aggressive context trimming occurs, the recovery injection system tells agents to use recall_sessions to fill in gaps rather than re-reading handoff documentation. This is more efficient because recall_sessions returns targeted, relevant excerpts rather than entire documents.

Context Management Pipeline

These memory components work together in a coordinated pipeline to keep the AI effective during long sessions.

The Token Budget Challenge

AI models have a fixed context window (e.g., 128K tokens for Claude Sonnet, 200K for Claude Opus). A long session with many tool calls can easily exceed this. CLIO's context management prevents overflow without losing critical information.

Three-Stage Trimming

Stage 1: Proactive Trim (before API call, every iteration)
  WorkflowOrchestrator checks messages against 75% of context window
  If over: MessageValidator drops oldest message units (budget-walk newest to oldest)
  Dropped messages -> YaRN compression -> thread_summary injected
  thread_summary is preserved and merged across successive trim cycles

Stage 2: Validation Trim (just before sending to API)
  Final check against effective token limit
  Smart unit-based truncation (keeps tool call/result pairs together)
  Post-trim target: 50% of max prompt tokens

Stage 3: Reactive Trim (after API rejection)
  If API returns token_limit_exceeded despite proactive trim:
  Progressive reduction across up to 3 retry attempts (50% -> 25% -> minimal)
  Each retry injects recovery context (YaRN summary + todo state + git activity)
  Most recent user message preserved as the current task anchor

Token Estimation

Module: lib/CLIO/Memory/TokenEstimator.pm

Token estimation uses a character-to-token ratio that starts at a conservative default and learns from actual API responses. Each streaming response with real usage data updates the ratio, making estimates more accurate over time.

The learned ratio is critical - an inaccurate ratio means proactive trimming either fires too aggressively (wasting context) or too late (causing API rejections).

What Gets Preserved During Trimming

When messages must be dropped, CLIO prioritizes keeping:

System prompt - Always preserved
Most recent user message - The current task anchor (newest user message, not the session-start message)
Recent messages - Most recent conversation context (budget-walked newest to oldest)
Tool call/result pairs - Kept together to avoid orphaned results
Thread summary - Compressed history of dropped messages, injected before the conversation

Data Layout

All memory data is stored in the .clio/ directory within the project root:

.clio/
  ltm.json                          # Long-Term Memory (project knowledge)
  memory/
    session_progress.md             # Checkpoint written before trim events
    <key>.json                      # Session-level key-value pairs
  sessions/
    <session-id>.json               # Full session state (history, STM, YaRN, billing)

Session File Format

Each session JSON file contains:

Field	Content
`history`	Complete message array (all roles)
`stm`	Short-term memory state
`yarn`	YaRN thread archive
`billing`	Token usage records per request
`working_directory`	Where the session was started
`session_name`	Human-readable session name
`created_at`	Session creation timestamp

LTM File Format

The .clio/ltm.json file contains:

{
  "patterns": {
    "discoveries": [...],
    "problem_solutions": [...],
    "code_patterns": [...],
    "workflows": [...],
    "failures": [...],
    "context_rules": [...]
  },
  "metadata": {
    "created": "timestamp",
    "last_updated": "timestamp",
    "version": "1.0"
  }
}

User Commands

Memory Commands

Command	What It Does
`/memory list`	Show stored session memories
`/memory search <query>`	Search memory by keyword
`/memory stats`	LTM statistics (entry counts, ages)
`/memory prune`	Clean up old/low-confidence LTM entries

Session Commands

Command	What It Does
`/session show`	Current session info and usage
`/session list`	All saved sessions
`/session switch <id>`	Resume a previous session
`/session trim`	Manually trim context
`/session export <path>`	Export session to self-contained HTML

Design Principles

Nothing Is Lost

Messages trimmed from the active context are preserved in YaRN threads and session history. The full conversation is always available on disk, even when the AI can only "see" a window of it.

Learn Once, Remember Always

When an agent discovers something about your codebase - a coding convention, a bug fix pattern, a module relationship - it stores it in LTM. Every future session benefits from that knowledge without re-discovery.

Graceful Degradation

When context limits are hit, CLIO doesn't crash or lose track. It compresses what was lost into a summary, preserves the most important context, and injects recovery information. The AI continues working with reduced but coherent context.

Atomic Writes

All persistent storage (LTM, sessions, memory) uses atomic write patterns (temp file + rename) to prevent corruption from process kills or concurrent access. LTM writes use PID-suffixed temp files to handle multiple agents working in the same project.

How Agents Use Memory in Practice

The memory system isn't just infrastructure - it's actively used by agents throughout their work. Here's how the pieces come together in a typical session:

Session Start

LTM injection - All project knowledge is loaded into the system prompt
The agent sees discoveries, solutions, and patterns before you type anything
If resuming a session, YaRN threads and STM are restored from the session file

During Work

Tool calls - Every file read, command executed, and search performed is recorded in STM, YaRN, and session history
Investigation notes - Agents store findings in the session-level store for reference later
Learning - When agents discover new patterns or solve novel problems, they add entries to LTM
Todo tracking - Task state is maintained through the todo_operations tool, providing structure that survives context trims

When Context Gets Full

Proactive trim fires when approaching 75% of the model's context window
Oldest messages are compressed via YaRN into a summary
The summary is injected as a system message so the AI knows what was dropped
A progress checkpoint is written to .clio/memory/session_progress.md

After Context Recovery

The recovery injection tells the agent:

Check LTM patterns already in the system prompt
Use recall_sessions to search past sessions for specific information
Retrieve the session_progress checkpoint for task state
Use git log and todo state to understand current progress
Do NOT read handoff documentation (which would waste the newly freed context space)

Between Sessions

LTM persists with all accumulated knowledge
Session files contain the complete conversation archive
Session-level memories in .clio/memory/ remain available
Next session gets all LTM entries injected automatically

Privacy and Control

Incognito Mode

Running CLIO with --incognito disables all memory persistence:

No LTM injection into prompts
No session saving
No memory writes
No user profile injection

No-LTM Mode

Running with --no-ltm skips just the LTM injection while keeping session persistence. Useful when you want a fresh perspective without accumulated assumptions.

Data Location

All memory data lives in the project's .clio/ directory (gitignored by default). Nothing is sent to external services - memory is purely local. The only data that leaves your machine is the conversation context sent to the AI provider for each API call.

Uh oh!

FilesExpand file tree

MEMORY.md

Latest commit

History

MEMORY.md

File metadata and controls

CLIO Memory Architecture

Overview

Short-Term Memory

How It Works

Key Characteristics

YaRN (Yet another Recurrence Navigation)

Why YaRN Matters

Compression

Seamless Recovery

Session Recovery

Long-Term Memory (LTM)

What Gets Stored

Automatic Prompt Injection

How Agents Learn

Pruning

Atomic Persistence

Session-Level Store

How Agents Use It

Operations

Cross-Session Recall

How It Works

Agent Usage

After Context Recovery

Context Management Pipeline

The Token Budget Challenge

Three-Stage Trimming

Token Estimation

What Gets Preserved During Trimming

Data Layout

Session File Format

LTM File Format

User Commands

Memory Commands

Session Commands

Design Principles

Nothing Is Lost

Learn Once, Remember Always

Graceful Degradation

Atomic Writes

How Agents Use Memory in Practice

Session Start

During Work

When Context Gets Full

After Context Recovery

Between Sessions

Privacy and Control

Incognito Mode

No-LTM Mode

Data Location