Skip to content

Port stability remediation from installed dist into source#2

Open
yashwant86 wants to merge 1 commit intomainfrom
pr-67350
Open

Port stability remediation from installed dist into source#2
yashwant86 wants to merge 1 commit intomainfrom
pr-67350

Conversation

@yashwant86
Copy link
Copy Markdown

@yashwant86 yashwant86 commented Apr 15, 2026

Mirror of openclaw#67350


Summary by MergeMonkey

  • Just Shipped:
    • Health checks now report operational truth including delivery, transport, route integrity, and session store state.
    • Session routing uses unified resolver with provenance tracking across all surfaces (Telegram, TUI, API, webhooks).
    • Heartbeat sessions are now isolated with explicit base session key tracking to prevent route contradictions.
    • Polling watchdog suppresses false stall restarts when gateway work is still active.
  • Bug Fixes:
    • Fixed legacy heartbeat-main route contradictions by self-healing during session store load.
    • Fixed session store backup and transcript archive paths to use dedicated artifact directories instead of inline backups.
    • Fixed session write lock format to v2 with owner metadata and lease tracking for better multi-process safety.
    • Fixed overflow compaction to synthesize token counts when providers omit overflow totals.
    • Fixed cron session key resolution to use unified route resolver instead of legacy path functions.
  • Under the Hood:
    • Migrated sent-message cache from legacy store path to state directory with automatic migration.
    • Refactored session key resolution across cron, TUI, gateway, and heartbeat runners to use unified resolver.
    • Updated health command to report degraded state and collect operational health issues.
    • Added route metadata and integrity state fields to session entries for audit and diagnostics.

@mergemonkeyhq
Copy link
Copy Markdown

mergemonkeyhq Bot commented Apr 15, 2026

Risk AssessmentCRITICAL · ~45 min review

Focus areas: Telegram operational truth inspection and path traversal in fs.readFileSync · Unified session route resolver correctness and scope/surface normalization · Session store self-healing and route metadata derivation · Health snapshot degradation logic and issue collection

Assessment: Adds health degradation logic, session routing provenance tracking, and operational truth inspection affecting all channels.

Walkthrough

User runs health check. System scans agent session directories to inspect operational truth (route contradictions, artifact noise, store integrity). Health snapshot builder collects delivery truth (stopped/warming/observed/unknown), transport truth (reachable/unreachable), and route integrity (ok/contradictory). Issues are reported if delivery is unknown despite transport reachability, routes are contradictory, or store contains mixed artifacts. Session routing now uses unified resolver that tracks provenance (provider, accountId, chatType, actor fingerprint) and scope (agent-main, peer-scoped-direct, heartbeat-isolated, explicit, global) for all surfaces.

Changes

Files Summary
Telegram Operational Health & Stability
extensions/telegram/src/channel.ts, status-issues.ts, polling-session.ts, polling-session.test.ts
Adds operational truth inspection for Telegram delivery, transport, route integrity, and session store state. Suppresses polling watchdog false restarts when gateway work is active. Reports health degradation when transport is reachable but delivery truth is unknown.
Unified Session Route Resolver
src/routing/resolve-route.ts
src/routing/resolve-route.test.ts
src/config/sessions/session-key.ts
src/cron/isolated-agent/session-key.ts
src/gateway/http-utils.ts
src/tui/tui.ts
src/tui/tui-session-actions.ts
Introduces resolveSessionRoute() with provenance tracking (provider, accountId, chatType, actor fingerprint) across all surfaces. Replaces legacy toAgentStoreSessionKey and canonicalizeMainSessionAlias calls. Normalizes scope, surface, and explicit routing flags for audit and diagnostics.
Session Store Route Metadata & Self-Healing
src/config/sessions/store-load.ts, types.ts, store.session-key-normalization.test.ts
Adds route metadata and integrity state to session entries during load. Self-heals legacy heartbeat-main contradictions by removing Telegram origin from main sessions and normalizing surface to heartbeat. Derives route scope, surface, and actor fingerprint from session key and origin.
Health Snapshot & Operational Issues
src/commands/health.ts
src/commands/health.types.ts
src/commands/health.command.coverage.test.ts
src/commands/health.test.ts
src/plugin-sdk/status-helpers.ts
src/channels/plugins/types.core.ts
Extends health snapshot with delivery truth, transport truth, route integrity, and session store integrity. Collects operational health issues (delivery unknown, route contradictory, mixed artifacts, degraded state). Reports health state as degraded when issues detected.
Session Store Backup & Archive Paths
src/config/sessions/store-maintenance.ts
src/config/sessions/store.ts
src/gateway/session-transcript-files.fs.ts
Moves session store backups and transcript archives to dedicated artifact directories (agents/*/artifacts/session-store and session-transcripts) instead of inline .bak files. Adds resolveSessionMaintenanceRoot, resolveSessionStoreBackupDir, resolveSessionTranscriptArchiveDir helpers.
Session Write Lock v2 Format
src/agents/session-write-lock.ts, session-write-lock.test.ts
Upgrades lock file format to v2 with owner metadata (pid, hostname, cwd, starttime), lease tracking (leaseId, expiresAt, maxHoldMs), and canonical paths. Maintains backward compatibility by reading legacy pid field from root or owner object.
Sent Message Cache Migration
extensions/telegram/src/sent-message-cache.ts, send.test-harness.ts
Migrates sent-message cache from legacy store path to state directory (resolveStateDir/telegram/sent-messages.json). Automatically migrates existing cache via rename or copy+delete on cross-device filesystems.
Overflow Compaction Token Synthesis
src/agents/pi-embedded-runner/run.ts, run.overflow-compaction.test.ts
Synthesizes token counts for overflow compaction when providers omit observed overflow totals. Tracks token count source (observed vs synthetic) and forces recovery compaction when in-attempt compaction persists without observed count.
Heartbeat Runner Session Resolution
src/infra/heartbeat-runner.ts
Refactors heartbeat session resolution to use unified resolveBaseSessionKey helper. Simplifies logic by removing legacy toAgentStoreSessionKey and canonicalizeMainSessionAlias calls.
Gateway Health Runtime Snapshot
src/gateway/server.impl.ts
src/gateway/server/health-state.ts
src/gateway/server-methods/health.ts
Passes runtime snapshot (channel accounts and their state) to health snapshot builder. Allows health checks to access live channel runtime state for operational truth inspection.
Status Display & Formatting
src/commands/channels/status.ts
src/commands/status.command-sections.ts
src/terminal/health-style.ts
src/tui/tui-event-handlers.test.ts
src/tui/tui-session-actions.test.ts
Displays transport truth, delivery truth, route integrity, and session store integrity in status output. Adds degraded state styling. Updates test fixtures to use canonical main session key format (main instead of agent:main:main).
Plugin SDK Exports
src/plugin-sdk/gateway-runtime.ts, routing.ts
Exports resolveSessionRoute, ResolvedSessionRoute, and gateway runtime helpers (getActiveEmbeddedRunCount, getActiveTaskCount, getTotalQueueSize) for plugin use.

Sequence Diagram

sequenceDiagram
  participant User
  participant HealthCmd as Health Command
  participant Inspector as Telegram Inspector
  participant SessionStore as Session Store
  participant HealthBuilder as Health Builder
  participant Router as Route Resolver
  User->>HealthCmd: Request health snapshot
  HealthCmd->>Inspector: inspectTelegramOperationalStoreTruth()
  Inspector->>SessionStore: Scan agents/*/sessions directories
  Inspector->>SessionStore: Read sessions.json, check route integrity
  Inspector-->>HealthCmd: Return routeIntegrity, sessionStoreIntegrity, contradictions
  HealthCmd->>HealthBuilder: buildChannelAccountSnapshot()
  HealthBuilder->>Router: resolveSessionRoute() for each session
  Router-->>HealthBuilder: Return scope, surface, provenance, actorFingerprint
  HealthBuilder-->>HealthCmd: Return snapshot with route metadata
  HealthCmd->>HealthCmd: resolveOperationalHealthIssues()
  alt Delivery unknown
    HealthCmd->>HealthCmd: Add delivery_truth_unknown issue
  end
  alt Route contradictory
    HealthCmd->>HealthCmd: Add route_integrity_contradictory issue
  end
  alt Mixed artifacts
    HealthCmd->>HealthCmd: Add session_store_mixed_artifacts issue
  end
  HealthCmd-->>User: Return health summary with state=degraded if issues
Loading

Dig Deeper With Commands

  • /review <file-path> <function-optional>
  • /chat <file-path> "<question>"
  • /roast <file-path>

Runs only when explicitly triggered.

continue;
}
for (const dirEntry of dirEntries) {
if (dirEntry.isDirectory() || isTelegramSessionArtifactNoise(dirEntry.name)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR condition counts every subdirectory as artifact noise

The condition dirEntry.isDirectory() || isTelegramSessionArtifactNoise(dirEntry.name) increments artifactNoise for all directories under sessions/, not just noise-named ones. Legitimate directories like artifacts/ (created by store-maintenance.ts) are counted as noise, inflating the counter and causing sessionStoreIntegrity to flip to "mixed-artifacts", which in turn triggers a session_store_mixed_artifacts health issue and makes the health snapshot report ok: false. The intent was likely to skip directories and only count noise-named files — change to if (!dirEntry.isFile()) continue; before the noise check, or use && !dirEntry.isDirectory() so only noise-named files are counted.

Split the condition: skip directories with continue, then check noise only for files:

for (const dirEntry of dirEntries) {
  if (dirEntry.isDirectory()) continue;
  if (isTelegramSessionArtifactNoise(dirEntry.name)) {
    result.artifactNoise += 1;
  }
}

@mergemonkeyhq
Copy link
Copy Markdown

mergemonkeyhq Bot commented Apr 15, 2026

Actionable Comments Posted: 1

🧾 Coverage Summary
✔️ Covered (41 files)
- extensions/telegram/src/channel.ts
- extensions/telegram/src/polling-session.test.ts
- extensions/telegram/src/polling-session.ts
- extensions/telegram/src/send.test-harness.ts
- extensions/telegram/src/sent-message-cache.ts
- extensions/telegram/src/status-issues.ts
- src/agents/pi-embedded-runner/run.overflow-compaction.test.ts
- src/agents/pi-embedded-runner/run.ts
- src/agents/session-write-lock.test.ts
- src/agents/session-write-lock.ts
- src/channels/plugins/types.core.ts
- src/commands/channels/status.ts
- src/commands/health.command.coverage.test.ts
- src/commands/health.test.ts
- src/commands/health.ts
- src/commands/health.types.ts
- src/commands/status.command-sections.ts
- src/config/sessions/session-key.ts
- src/config/sessions/store-load.ts
- src/config/sessions/store-maintenance.ts
- src/config/sessions/store.session-key-normalization.test.ts
- src/config/sessions/store.ts
- src/config/sessions/types.ts
- src/cron/isolated-agent/session-key.ts
- src/gateway/http-utils.ts
- src/gateway/server-methods/health.ts
- src/gateway/server.impl.ts
- src/gateway/server/health-state.ts
- src/gateway/session-transcript-files.fs.ts
- src/infra/heartbeat-runner.ts
- src/plugin-sdk/gateway-runtime.ts
- src/plugin-sdk/routing.ts
- src/plugin-sdk/status-helpers.ts
- src/routing/resolve-route.test.ts
- src/routing/resolve-route.ts
- src/terminal/health-style.ts
- src/tui/tui-event-handlers.test.ts
- src/tui/tui-session-actions.test.ts
- src/tui/tui-session-actions.ts
- src/tui/tui.test.ts
- src/tui/tui.ts

@yashwant86
Copy link
Copy Markdown
Author

/review --force

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant