|
| 1 | +# Code Review Report |
| 2 | + |
| 3 | +## Metadata |
| 4 | +- **Project**: git-notes-memory-manager |
| 5 | +- **Review Date**: 2025-12-19 |
| 6 | +- **Reviewer**: Claude Code Review Agent (6 specialist subagents) |
| 7 | +- **Scope**: Hook Enhancement v2 implementation (`src/git_notes_memory/hooks/`, `hooks/`, tests) |
| 8 | +- **Commit**: feature/hook-enhancement-v2 |
| 9 | + |
| 10 | +## Executive Summary |
| 11 | + |
| 12 | +### Overall Health Score: 7.5/10 |
| 13 | + |
| 14 | +| Dimension | Score | Critical | High | Medium | Low | |
| 15 | +|-----------|-------|----------|------|--------|-----| |
| 16 | +| Security | 8/10 | 0 | 0 | 1 | 5 | |
| 17 | +| Performance | 6/10 | 1 | 2 | 5 | 4 | |
| 18 | +| Architecture | 7/10 | 0 | 0 | 4 | 6 | |
| 19 | +| Code Quality | 7/10 | 0 | 1 | 4 | 5 | |
| 20 | +| Test Coverage | 8/10 | 0 | 1 | 8 | 6 | |
| 21 | +| Documentation | 6/10 | 0 | 3 | 4 | 3 | |
| 22 | + |
| 23 | +### Key Findings |
| 24 | + |
| 25 | +1. **CRITICAL (Performance)**: `DomainExtractor` is instantiated on every call in hot path - creates garbage collection pressure on every file operation |
| 26 | +2. **HIGH (Performance)**: `batch_check_novelty()` doesn't actually batch - runs N sequential queries instead of batched |
| 27 | +3. **HIGH (Code Quality)**: 200+ lines of duplicated utility functions across 5 handler files |
| 28 | +4. **HIGH (Documentation)**: PostToolUse and PreCompact hooks not documented in README or USER_GUIDE |
| 29 | +5. **MEDIUM (Security)**: Path traversal risk in `session_analyzer.py` - transcript path not validated |
| 30 | + |
| 31 | +### Recommended Action Plan |
| 32 | + |
| 33 | +1. **Immediate** (before merge): |
| 34 | + - Fix DomainExtractor singleton pattern (CRITICAL performance) |
| 35 | + - Add input size limits to JSON parsing (MEDIUM security) |
| 36 | + |
| 37 | +2. **This Sprint**: |
| 38 | + - Extract common handler utilities to shared module (HIGH code quality) |
| 39 | + - Update README and USER_GUIDE with new hooks (HIGH documentation) |
| 40 | + - Add path validation to session_analyzer.py (MEDIUM security) |
| 41 | + |
| 42 | +3. **Next Sprint**: |
| 43 | + - Implement true batching in NoveltyChecker |
| 44 | + - Consolidate configuration sources |
| 45 | + - Add missing test coverage for PostToolUse config |
| 46 | + |
| 47 | +4. **Backlog**: |
| 48 | + - Refactor config_loader.py to use declarative schema |
| 49 | + - Add async-signal-safe timeout handlers |
| 50 | + - Document hook models in DEVELOPER_GUIDE |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Critical Findings (🔴) |
| 55 | + |
| 56 | +### PERF-1: DomainExtractor Instantiated Per Call |
| 57 | + |
| 58 | +**Location**: `src/git_notes_memory/hooks/domain_extractor.py:259-260` |
| 59 | + |
| 60 | +**Description**: The convenience function `extract_domain_terms()` creates a new `DomainExtractor` instance on every invocation. Since `PostToolUse` hook fires on every file Read/Write/Edit operation, this is extremely hot. |
| 61 | + |
| 62 | +**Impact**: CPU overhead, garbage collection pressure, adds ~0.5-1ms per call |
| 63 | + |
| 64 | +**Evidence**: |
| 65 | +```python |
| 66 | +def extract_domain_terms(file_path: str) -> list[str]: |
| 67 | + extractor = DomainExtractor() # Created every call |
| 68 | + return extractor.extract(file_path) |
| 69 | +``` |
| 70 | + |
| 71 | +**Remediation**: |
| 72 | +```python |
| 73 | +# Module-level singleton |
| 74 | +_default_extractor: DomainExtractor | None = None |
| 75 | + |
| 76 | +def extract_domain_terms(file_path: str) -> list[str]: |
| 77 | + global _default_extractor |
| 78 | + if _default_extractor is None: |
| 79 | + _default_extractor = DomainExtractor() |
| 80 | + return _default_extractor.extract(file_path) |
| 81 | +``` |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## High Priority Findings (🟠) |
| 86 | + |
| 87 | +### PERF-2: batch_check_novelty Does Not Batch Embeddings |
| 88 | + |
| 89 | +**Location**: `src/git_notes_memory/hooks/novelty_checker.py:253-277` |
| 90 | + |
| 91 | +**Description**: Each signal triggers a separate embedding + search operation. For N signals, this is O(N) database queries when batching could reduce to O(1). |
| 92 | + |
| 93 | +**Impact**: Latency multiplied by number of signals |
| 94 | + |
| 95 | +**Evidence**: |
| 96 | +```python |
| 97 | +def batch_check_novelty(self, signals: list[CaptureSignal]) -> list[NoveltyResult]: |
| 98 | + return [self.check_signal_novelty(signal) for signal in signals] |
| 99 | +``` |
| 100 | + |
| 101 | +**Remediation**: Implement batch embedding and search API. |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +### QUAL-1: Duplicated Utility Functions Across Handlers |
| 106 | + |
| 107 | +**Location**: |
| 108 | +- `src/git_notes_memory/hooks/pre_compact_handler.py:54-112` |
| 109 | +- `src/git_notes_memory/hooks/post_tool_use_handler.py:57-115` |
| 110 | +- `src/git_notes_memory/hooks/stop_handler.py:52-108` |
| 111 | +- `src/git_notes_memory/hooks/session_start_handler.py:48-104` |
| 112 | +- `src/git_notes_memory/hooks/user_prompt_handler.py:56-114` |
| 113 | + |
| 114 | +**Description**: Four utility functions are copied verbatim across 5 handler files: `_setup_logging()`, `_setup_timeout()`, `_cancel_timeout()`, `_read_input()` |
| 115 | + |
| 116 | +**Impact**: 200+ lines of duplicated code. Bug fixes must be applied 5 times. |
| 117 | + |
| 118 | +**Remediation**: Extract to shared `hook_utils.py` module. |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +### DOC-1: PostToolUse and PreCompact Missing from Documentation |
| 123 | + |
| 124 | +**Location**: |
| 125 | +- `README.md` (lists only 3 hooks) |
| 126 | +- `docs/USER_GUIDE.md` (missing hook sections and config variables) |
| 127 | + |
| 128 | +**Description**: The README and USER_GUIDE only document SessionStart, UserPromptSubmit, and Stop hooks. PostToolUse and PreCompact are implemented but undocumented. |
| 129 | + |
| 130 | +**Impact**: Users cannot discover or configure the new hooks. |
| 131 | + |
| 132 | +**Remediation**: Add comprehensive documentation for both hooks including all configuration options. |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## Medium Priority Findings (🟡) |
| 137 | + |
| 138 | +### SEC-1: Path Traversal in session_analyzer.py |
| 139 | + |
| 140 | +**Location**: `src/git_notes_memory/hooks/session_analyzer.py:145-152` |
| 141 | + |
| 142 | +**Description**: The `parse_transcript` method accepts a file path without validation. An attacker controlling hook input could read arbitrary files. |
| 143 | + |
| 144 | +**Remediation**: Validate transcript path is within expected directory. |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +### PERF-3: Service Instantiation Per NoveltyChecker |
| 149 | + |
| 150 | +**Location**: `src/git_notes_memory/hooks/novelty_checker.py:99-113` |
| 151 | + |
| 152 | +**Description**: Each `NoveltyChecker` instance may create new service instances. |
| 153 | + |
| 154 | +**Remediation**: Use the existing singleton services directly. |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +### ARCH-1: Missing Abstraction for Hook Response Contract |
| 159 | + |
| 160 | +**Location**: Multiple handler files |
| 161 | + |
| 162 | +**Description**: Each handler manually constructs JSON responses with keys like `"hookSpecificOutput"`. No shared abstraction. |
| 163 | + |
| 164 | +**Remediation**: Create `HookResponse` dataclass. |
| 165 | + |
| 166 | +--- |
| 167 | + |
| 168 | +### QUAL-2: Duplicated VALID_NAMESPACES Constant |
| 169 | + |
| 170 | +**Location**: |
| 171 | +- `src/git_notes_memory/hooks/guidance_builder.py:119` |
| 172 | +- `src/git_notes_memory/hooks/namespace_parser.py:30` |
| 173 | + |
| 174 | +**Description**: Same 10-namespace list defined in two places. |
| 175 | + |
| 176 | +**Remediation**: Create shared `constants.py`. |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +### QUAL-3: Complex load_hook_config() Function |
| 181 | + |
| 182 | +**Location**: `src/git_notes_memory/hooks/config_loader.py:272-418` |
| 183 | + |
| 184 | +**Description**: 146 lines with 30+ if statements. High cyclomatic complexity. |
| 185 | + |
| 186 | +**Remediation**: Use declarative config schema. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +### TEST-1: Missing Tests for PostToolUse Configuration |
| 191 | + |
| 192 | +**Location**: `tests/test_hooks.py` |
| 193 | + |
| 194 | +**Description**: PostToolUse settings (`HOOK_POST_TOOL_USE_ENABLED`, `HOOK_POST_TOOL_USE_MIN_SIMILARITY`, etc.) not tested. |
| 195 | + |
| 196 | +**Remediation**: Add tests for all PostToolUse environment variables. |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +### DOC-2: SessionStart Guidance Config Not Documented |
| 201 | + |
| 202 | +**Location**: `docs/USER_GUIDE.md` |
| 203 | + |
| 204 | +**Description**: `HOOK_SESSION_START_INCLUDE_GUIDANCE` and `HOOK_SESSION_START_GUIDANCE_DETAIL` not documented. |
| 205 | + |
| 206 | +**Remediation**: Add guidance configuration section. |
| 207 | + |
| 208 | +--- |
| 209 | + |
| 210 | +## Low Priority Findings (🟢) |
| 211 | + |
| 212 | +### SEC-2: ReDoS Vulnerability in Signal Patterns |
| 213 | +**Location**: `src/git_notes_memory/hooks/signal_detector.py:67` |
| 214 | +**Description**: Pattern `r"(?i)\bcan('t| not)\s+.{1,30}\s+because\b"` could cause backtracking. |
| 215 | +**Mitigation**: Hook timeout limits impact. |
| 216 | + |
| 217 | +### SEC-3: Signal Handler Safety |
| 218 | +**Location**: `src/git_notes_memory/hooks/pre_compact_handler.py:75-80` |
| 219 | +**Description**: Timeout handlers call `print()` and `json.dumps()` which are not async-signal-safe. |
| 220 | + |
| 221 | +### SEC-4: JSON Input Size Not Limited |
| 222 | +**Location**: `src/git_notes_memory/hooks/pre_compact_handler.py:94-112` |
| 223 | +**Description**: `sys.stdin.read()` has no size limit. |
| 224 | + |
| 225 | +### PERF-4: Reinforcers List Created Per Match |
| 226 | +**Location**: `src/git_notes_memory/hooks/signal_detector.py:341-343` |
| 227 | +**Description**: List recreated on every call. |
| 228 | + |
| 229 | +### PERF-5: Full Transcript Loaded Into Memory |
| 230 | +**Location**: `src/git_notes_memory/hooks/session_analyzer.py:151-152` |
| 231 | +**Description**: No size limit check before reading. |
| 232 | + |
| 233 | +### QUAL-4: Dead Code in _extract_summary() |
| 234 | +**Location**: `src/git_notes_memory/hooks/pre_compact_handler.py:132-135` |
| 235 | +**Description**: Prefix stripping loop breaks without modifying text. |
| 236 | + |
| 237 | +### QUAL-5: Magic Numbers |
| 238 | +**Location**: Various handler files |
| 239 | +**Description**: Values like 20, 100, 97, 50 used without named constants. |
| 240 | + |
| 241 | +### ARCH-2: Inconsistent Dependency Injection |
| 242 | +**Location**: Multiple service classes |
| 243 | +**Description**: Mix of constructor injection and lazy loading. |
| 244 | + |
| 245 | +### ARCH-3: Entry Points Swallow Exceptions Silently |
| 246 | +**Location**: `hooks/session_start.py:36-40` |
| 247 | +**Description**: Exceptions caught and exit(0) called silently. |
| 248 | + |
| 249 | +### TEST-2: Missing Timeout Function Tests |
| 250 | +**Location**: `tests/test_pre_compact_handler.py`, `tests/test_post_tool_use_handler.py` |
| 251 | +**Description**: `_setup_timeout()` and `_cancel_timeout()` not tested. |
| 252 | + |
| 253 | +--- |
| 254 | + |
| 255 | +## Positive Findings |
| 256 | + |
| 257 | +1. **Security**: Git command injection properly prevented - no `shell=True`, refs validated |
| 258 | +2. **Security**: XML injection prevented via ElementTree escaping |
| 259 | +3. **Architecture**: Frozen dataclasses used consistently for immutability |
| 260 | +4. **Architecture**: Clean separation of concerns across service classes |
| 261 | +5. **Architecture**: Lazy loading in `__init__.py` prevents slow imports |
| 262 | +6. **Code Quality**: Consistent type hints throughout |
| 263 | +7. **Test Coverage**: Overall test suite is comprehensive (1276 tests, 87% coverage) |
| 264 | + |
| 265 | +--- |
| 266 | + |
| 267 | +## Appendix |
| 268 | + |
| 269 | +### Files Reviewed |
| 270 | +- `src/git_notes_memory/hooks/*.py` (17 files) |
| 271 | +- `hooks/*.py` (7 files) |
| 272 | +- `tests/test_*.py` (28 files) |
| 273 | +- `docs/*.md` (4 files) |
| 274 | +- `README.md` |
| 275 | + |
| 276 | +### Specialist Agents Used |
| 277 | +1. Security Analyst (security-engineer) |
| 278 | +2. Performance Engineer (performance-engineer) |
| 279 | +3. Architecture Reviewer (architect-reviewer) |
| 280 | +4. Code Quality Analyst (code-reviewer) |
| 281 | +5. Test Coverage Analyst (test-automator) |
| 282 | +6. Documentation Reviewer (documentation-engineer) |
0 commit comments