Skip to content

Maple Local Mode Destructive Recovery After Unclean Shutdown #113

Description

@robbiemu

TL;DR: After an interrupted/unclean host restart, Maple Local mode refused to reopen its persistent store, classifying it as an "unversioned legacy store" and requiring a destructive reset. For an unattended local service, this means losing all previous telemetry data. I'm asking for a non-destructive recovery path (quarantine + fresh start) and clearer error messaging.


Issue

I'm using Maple Local mode as a small self-hosted OpenTelemetry cache on an Apple Silicon Mac mini. After an interrupted/unclean host restart, Maple refused to open its existing data store, claiming it was "incompatible" and a "legacy store." The only way to get Maple running again was to wipe the data directory and start fresh, losing all the traces, logs, and metrics I had previously collected.

This isn't about Maple failing to open a potentially unsafe store; failing closed is likely correct. The issue is that the only recovery path is destructive, and the error message doesn't distinguish between a truly legacy store and one that simply has incomplete metadata after a crash. For a homelab service meant to run unattended, this is a tough pill to swallow.

Context and environment

I'm using Hermes-agent to assist in devops on my homelab, currently in setting up Maple in local mode inside an Apple Container, with an OpenTelemetry Collector forwarding data to it on loopback. The setup worked perfectly through normal graceful restarts. The failure only occurred after the host experienced an unclean shutdown: the macOS host restarted unexpectedly, and the container runtime API became unavailable during shutdown.

📋 Technical Environment Details
  • Maple Release: v0.0.12
  • Maple Binary SHA-256: f02cb8bafee0e2ddadb212e2b5281b0b30596f87674195410a7628430ae480a5
  • libchdb.so SHA-256: 1501eba8cf38f395592fe26b0efdeb56a9afeb7443a1f71d645b1dac27303e14
  • Maple Release Bundle SHA-256: 2483e7e55a6317bf6876a77df7ca6a363f5d8f5cf51fa93e5b5bf7de86d1dd88
  • macOS Version: 26.5.1 (Build 25F80)
  • Kernel: Darwin 25.5.0, RELEASE_ARM64_T8103
  • Hardware: Apple Silicon Mac mini (Macmini9,1, M1)
  • Container Runtime: Apple Container CLI 1.0.0 (commit ee848e3)
  • Container Architecture: linux/arm64
  • Storage: Native Apple Container volume mounted at /var/lib/maple
  • Maple Data Directory: /var/lib/maple/data
  • Store version marker: /var/lib/maple/maple-store-version.json
  • Dirty-shutdown marker: /var/lib/maple/maple-store-open
  • PID file: /var/lib/maple/maple.pid
  • Maple Mode: Local mode, private loopback listener on 127.0.0.1:4418
  • Deployment: Maple supervised inside a container; OpenTelemetry Collector forwarded OTLP to Maple on loopback. Only the Collector’s OTLP/HTTP port was published.

Maple Start Command:

/opt/maple/maple start \
  --port 4418 \
  --data-dir /var/lib/maple/data \
  --offline \
  --log-level info

Maple’s process working directory was /var/lib/maple, on the persistent volume. Marker files (maple-store-version.json, maple-store-open, maple.pid) were also on the persistent volume.

Overview

flowchart LR
    A[Normal Graceful Stop] --> B[Store Reopens Fine<br>Data Preserved]
    C[Unclean Host Shutdown] --> D[Store Refused<br>“Unversioned Legacy” Error]
    D --> E[Only Option: Destructive Reset<br>All Data Lost]
Loading

Graceful-stop control:

  • Started Maple with its persistent volume.
  • Ingested distinct witness traces, logs, and metrics.
  • Gracefully stopped the container.
  • Started it again.
  • Confirmed the witness telemetry remained queryable.

Planned-reboot control:

  • Started with a populated persistent store.
  • Ran a manual pre-reboot helper to cleanly stop Maple while the container runtime was still available and verified that Maple had closed its store. The pre-reboot helper did not reset the container runtime or touch other services. It unloaded the Maple LaunchDaemon, stopped only the maple container with a 45s grace period, verified the persistent maple-store-open marker was absent, verified the container was stopped, and only then printed SAFE TO REBOOT.
  • Selected Restart from the macOS Apple menu.
  • After reboot, Maple reopened the store and the witness telemetry remained queryable.

Failure path:

  1. Maple was running with a populated persistent store.
  2. I selected Restart from the macOS Apple menu. This was not a forced power-off or deliberate kill of the container.
  3. During the resulting macOS shutdown, the supervisor received its shutdown signal and attempted container stop --time 30, but the Apple Container API was already unavailable:
2026-06-23T17:17:23-04:00 received shutdown signal; stopping container gracefully (--time 30)
2026-06-23T17:17:23-04:00   Error: internalError: "failed to stop container" (cause: "interrupted: "XPC connection error: Connection invalid"")
2026-06-23T17:17:23-04:00   Ensure container system service has been started with `container system start`.
2026-06-23T17:17:24-04:00 graceful stop issued; reconciler exiting 0
  1. After the next boot, macOS reported the restart as unexpected, and Maple refused to reopen the existing store:
the local store at /var/lib/maple/data is incompatible with this build's chDB
(store: an unversioned legacy store; build: v26.1.0)

— loading it would crash chDB.
Wipe it with maple reset, or start fresh via maple start --reset.
5. The service exited on that error and the supervisor repeatedly restarted it until I added a guard.
6. The only practical recovery was a destructive reset; the previous telemetry was lost.

Observations, questions

  • The store was not empty. Maple’s startup error found and rejected it.
  • This of course wasn’t caused by a normal graceful stop; graceful stops preserved data.
  • Apple Container didn't cause the issue, IMO. The same risk likely exists anywhere local-mode Maple/chDB is force-killed or the host loses power.
  • In v0.0.12, this exact error means Maple found a populated store/ or metadata/ directory but could not obtain a valid store-version marker. From the post-reset evidence, I cannot distinguish whether that marker was missing, incomplete, unreadable, or malformed.
  • I’m not asking Maple to open a store if it would crash chDB. Failing closed may be correct. The issue is that the recovery path is opaque and destructive.

My core questions:

  1. Is classifying an existing store with missing/unreadable version metadata as an “unversioned legacy store” intentional before evaluating whether the shutdown was unclean?
  2. Is there a supported non-destructive inspection or recovery path for this state?
  3. Can local mode quarantine the old store and start a fresh one automatically, rather than requiring an operator wipe/reset?
  4. Are the store-version and dirty/open markers intended to be crash-durable across abrupt host failure?
  5. Is there any existing flag or command that distinguishes “legacy schema” from “current schema but incomplete/crash-tainted metadata”?

Expected behavior

For unattended local-mode use, any of these would be much safer:

  1. Quarantine on Detection: On incompatible/unclean store detection, rename the old data directory (e.g., data/data.quarantined-<timestamp>/) and start a new empty store.

  2. Recovery Policy Flag: Add an explicit flag like --on-store-recovery=quarantine-and-fresh (with fail or quarantine-and-fresh as safe defaults).

  3. Diagnostic Command: Add a read-only command that explains detected store version/schema state, dirty-shutdown-marker state, why Maple is refusing to open it, and whether a reset is mandatory.

  4. Clearer Error Messaging: Distinguish between:

    • Actual old/legacy stores
    • Stores created by this Maple version but missing expected metadata after an interrupted shutdown
    • Known unsafe chDB recovery states

I can provide any of the following if useful:

  • Full Maple startup/reconciler logs around the failed shutdown and failed reopen.
  • Exact maple start command and relevant non-secret environment variables.
  • Exact macOS and Apple Container versions.
  • Container lifecycle/shutdown logs showing the failed graceful-stop attempt.
  • Checksums for the Maple binary and libchdb.so.
  • A minimal reproduction using a disposable persistent volume.

I no longer have the failed store in its original state because I reset it to restore service, but I can reproduce on a disposable volume if that helps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions