Skip to content

Add configurable monitoring for stale Dali store with critical operator alerts#168

Draft
Copilot wants to merge 3 commits into
masterfrom
copilot/add-detection-for-stale-store
Draft

Add configurable monitoring for stale Dali store with critical operator alerts#168
Copilot wants to merge 3 commits into
masterfrom
copilot/add-detection-for-stale-store

Conversation

Copilot AI commented Feb 17, 2026

Copy link
Copy Markdown

Dali's main store file can go unsaved indefinitely when sasha-coalescer fails, causing unbounded delta file growth. Added monitoring to detect this condition and issue critical operator errors.

Configuration

Two new Dali attributes in dali.xsd:

  • storeNotSavedWarningPeriod (default: 72 hours) - time threshold
  • minDeltaSizeWarningThreshold (default: 50000 KB) - delta size threshold

Monitoring Logic

Enhanced CLightCoalesceThread to check delta file size when time threshold exceeded:

// Existing time check now also checks delta file size
if (t/3600 >= storeNotSavedWarningPeriodHours && ...) {
    StringBuffer deltaFilename(dataPath);
    iStoreHelper->getCurrentDeltaFilename(deltaFilename);
    offset_t deltaSizeKB = deltaIFile->size() / 1024;
    
    if (deltaSizeKB >= minDeltaSizeWarningThresholdKB) {
        LOG(MCoperatorError, "Store has not been saved for %u hours and delta file has grown to %llu KB...", ...);
    } else {
        OERRLOG("Store has not been saved for %u hours (delta file size: %llu KB)", ...);
    }
}

Behavior

  • Both conditions met: Issues MCoperatorError with detailed diagnostic message
  • Only time threshold met: Issues OERRLOG with delta size info (backward compatible)

Prevents alert fatigue while ensuring operators are notified of genuine sasha-coalescer failures before delta files consume excessive disk space.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 17, 2026 17:18
- Added storeNotSavedWarningPeriod config (default 72 hours)
- Added minDeltaSizeWarningThreshold config (default 50MB)
- Enhanced CLightCoalesceThread to check delta file size
- Issue critical operator errors when both conditions are met
- Provides early warning when sasha-coalescer may not be functioning

Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
- Mark dataPath member as const since it's never modified
- Remove redundant 'CRITICAL:' prefix from error message

Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Copilot AI changed the title [WIP] Add detection for stale store file and critical error handling Add configurable monitoring for stale Dali store with critical operator alerts Feb 17, 2026
Copilot AI requested a review from jakesmith February 17, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants