feat(controller): observable vault scanner skip failures (spec 043)#14
Merged
Conversation
…n on spec-043 funcs - Remove task/controller/controller (43MB) and scanner.test (40MB) committed by prompt-195 container - Extend .gitignore with *.test and explicit task/controller/controller / task/executor/executor entries - Add //nolint:funlen on main.go Run() (+6 lines from metrics.New() wiring) and vault_scanner.go processFile() (+5 statements from spec-043 counter calls); extraction would split tightly-coupled wiring or duplicate per-site logic - Pick up post-precommit formatter cleanups (gofmt const alignment in metrics.go, golines wrap in vault_scanner_internal_test.go) and the spec AC#6 awk regex fix from pre-approval review
There was a problem hiding this comment.
PR Review Summary
Branch: feature/vault-scanner-skip-alerts → master
PR: #14
Code Review Findings
Correctness:
- 4
glog.Warningf("skipping...")sites correctly promoted toglog.Errorf(lines 239, 245, 254, 299) read_failed(line 229) andinject_task_identifier_failed(line 326) correctly remain atWarningfper spec- 6 counter increment calls matching 6 skip sites (5 unique reasons, with
invalid_frontmatterused at 2 sites) - Counter pre-initialization in
init()for all 5 label values (lines 201-209) - Constructor signature correctly accepts
metrics.Metricsinterface
Tests:
vault_scanner_test.gohas comprehensiveSkippedFilesTotal counterContext for all 5 reasons (lines 780-1059)metrics_test.goverifies all 5 reason labels are pre-initialized (lines 98-110)vault_scanner_internal_test.gotestsinject_task_identifier_failedcounter- AC#6 invariant test at line 1035 validates skip-site/counter-call parity
Documentation:
docs/controller-design.mdupdated with counter documentation (line 28)CHANGELOG.mdhas unreleased entry with incident reference- Spec correctly references vault doctrine
No issues found.
{
"verdict": "approve",
"summary": "Implementation fully satisfies spec-043. The vault scanner now emits a labelled Prometheus counter (agent_controller_vault_scanner_skipped_files_total{reason}) at all 6 skip sites, with 4 operator-actionable log lines promoted from Warningf to Errorf. All 5 reason labels are pre-initialised to zero. Comprehensive Ginkgo test coverage validates per-reason increment behaviour and the skip-site/counter-call invariant.",
"comments": [],
"concerns_addressed": [
"correctness: 4 glog.Warningf skip sites promoted to glog.Errorf — errorf at lines 239,245,254,299; read_failed and inject_task_identifier_failed correctly remain at Warningf per spec",
"correctness: Counter increment logic is idempotent per scan cycle — hash check prevents double-counting for empty_status; re-process correctly re-increments for broken files",
"correctness: 5 label values are a closed enum (constants), no high cardinality risk",
"tests: All 5 skip reasons covered by Ginkgo Context blocks in vault_scanner_test.go and vault_scanner_internal_test.go",
"performance: Counter pre-initialisation in init() uses Add(0) for 5 label values — negligible memory overhead"
]
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores Prometheus observability for vault task files silently skipped by the scanner. Closes the dedup safety net's silent failure mode that hid 24 broken files (4 of them
human_reviewPR-review tasks) across the 2026-05-31 / 2026-06-01 incident.agent_controller_vault_scanner_skipped_files_total{reason}with five pre-initialised label values (invalid_frontmatter,duplicate_frontmatter_invalid,empty_status,inject_task_identifier_failed,read_failed); increments exactly once per skipped file per scan cycle.glog.Warningf("skipping …")sites toglog.Errorfso existing log-scrape alert routing surfaces the event.read_failedandinject_task_identifier_failedremainWarningf(counter alone is sufficient — both are rare, both are operator-actionable via the metric).specs/in-progress/043-surface-vault-scanner-skip-failures.md. Doctrine advanced: [[Make Parked Agent Tasks Visible to Operator]]. Systematic-prevention half of [[Fix unresolved merge conflicts in OpenClaw vault]].Test plan
make precommitexits 0 (lint funlen suppressed with justification onRunandprocessFile)BRANCH=dev make buca— image SHAef52275a…running onagent-task-controller-0, startup log reportsversion=v0.63.35-19-g3508338 commit=3508338:devfromagent-devworktree to realign with master; then deploy to prod<<<<<<<markers in dev OpenClaw vault, observeinvalid_frontmattercounter increment +glog.Errorflog line, then remove