feat(sanitize): expand injection patterns 10→56, strip zero-width chars before match by tzone85 · Pull Request #74 · tzone85/vortex-dispatch

tzone85 · 2026-06-12T14:18:12Z

Summary

Closes the last open follow-up from the bulletproof certification pass: `internal/sanitize.DetectPromptInjection` was previously a 10-entry substring blocklist easily bypassed with zero-width characters spliced into the payload.

	Before	After
Substring patterns	10	56
Attack families	4	9
Zero-width normalisation	none	strips U+00AD, U+200B-U+200F, U+202A-U+202E, U+2060-U+206F, U+FEFF before matching
Pattern-match attribution	none	new `MatchInjectionPattern` returns the canonical pattern that fired

What changed

`internal/sanitize/sanitize.go`:

56 patterns grouped by attack family with rationale comments: override/disregard, role/identity coercion, authority spoofing, output coercion, memory poisoning, action coercion, exfiltration, jailbreak labels (DAN / developer / jailbreak mode), and chat-template tags (`<|system|>`, `<|im_start/end|>`, `<|user|>`, `<|assistant|>`, `[INST]`/`[/INST]`, `<>`/`<>`).
`zeroWidthRe` strips zero-width and bidi-override characters before substring matching. Defeats `ignore previous instructions` style bypasses. Built with regex `\x{...}` escapes so the source file stays pure ASCII (embedded invisibles silently break diffs, and Go rejects U+FEFF in source).
`normaliseForInjectionMatch`: lowers, strips invisibles, collapses whitespace runs. Result is fed only to the matcher, never used for content storage.
`MatchInjectionPattern` returns the canonical pattern that fired (or `""` on no match) so callers can log which family triggered for post-mortems.

`internal/sanitize/sanitize_test.go`:

26 positive cases driving every attack family (one `t.Run` per case).
6 negative cases (benign developer text + the "new endpoint" trap).
6 zero-width-bypass cases pinning the Unicode normalisation guard.
3 whitespace-collapse cases (tabs, newlines, doubled spaces).
3 `MatchInjectionPattern` tests including `HonoursUnicodeNormalisation` verifying the returned pattern is the canonical ASCII form, not the munged input.

Why this is still defence-in-depth, not the primary defence

The audit verdict is unchanged: a sufficiently motivated attacker can bypass any substring blocklist via base64 directives, lookalike Unicode (Cyrillic 'е' for Latin 'e'), or non-English variants. The durable defence is the structural `<untrusted_content>` framing applied by `analyzer.Triage` and `implementer.Implement`. A positive hit here remains a strong signal worth aborting on; an absent hit must NOT be read as "safe".

Test plan

`go build ./...` clean
`go vet ./...` clean
`go test ./... -count=1` — all 30 packages pass
`golangci-lint run --timeout=5m ./...` — 0 issues
CLAUDE.md item 55 added with the full pattern-family list
CLAUDE.md "Still open" section trimmed

…rs before match Background The bulletproof certification audit treated the heuristic substring blocklist in DetectPromptInjection as a 1% defence — the durable mitigation is the structural <untrusted_content> framing applied by analyzer.Triage and implementer.Implement. The list was nevertheless tracked as "pattern expansion deprioritized but worth doing". This PR closes that follow-up. What changed internal/sanitize/sanitize.go: - 10 → 56 patterns grouped by attack family with rationale comments: override/disregard, role/identity coercion, authority spoofing, output coercion, memory poisoning, action coercion, exfiltration, jailbreak labels (DAN/developer/jailbreak mode), and chat-template tags (<|system|>, <|im_start/end|>, <|user|>, <|assistant|>, [INST]/[/INST], <<SYS>>/<</SYS>>). - New zeroWidthRe strips ZWSP, ZWNJ, ZWJ, LRM/RLM, LRE/RLE/PDF/LRO/RLO, the word joiner range, soft hyphen, and BOM before substring matching. Defeats the "ig<ZWSP>nore previous instructions" bypass that the audit flagged as a known weakness. Built with regex \x{...} escapes so the source file stays pure ASCII — embedded invisibles silently break diffs and Go rejects U+FEFF in source. - New normaliseForInjectionMatch helper: lowers, strips invisibles, collapses whitespace runs. Result is fed only to the matcher, never used for content storage. - New MatchInjectionPattern returns the canonical pattern that fired (or "" on no match) so callers can log which family triggered for post-mortems. internal/sanitize/sanitize_test.go: - 26 positive cases driving every attack family (one t.Run per case so failures point at the specific phrase). - 6 negative cases covering benign developer text (refactor, bug fix, README, dep bump) and the "new endpoint" trap (the word "new" is fine; "new instructions" is the bad phrase). - 6 zero-width-bypass cases pinning the Unicode normalisation guard. - 3 whitespace-collapse cases (tabs, newlines, doubled spaces). - 3 MatchInjectionPattern tests including the HonoursUnicodeNormalisation case verifying the returned pattern is the canonical ASCII form, not whatever munged form the input carried. Verified go build ./..., go vet ./..., go test ./... -count=1 — all 30 packages pass. golangci-lint run --timeout=5m ./... — 0 issues.

tzone85 merged commit a2faa26 into main Jun 12, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sanitize): expand injection patterns 10→56, strip zero-width chars before match#74

feat(sanitize): expand injection patterns 10→56, strip zero-width chars before match#74
tzone85 merged 1 commit into
mainfrom
feat/sanitize-patterns

tzone85 commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tzone85 commented Jun 12, 2026

Summary

What changed

Why this is still defence-in-depth, not the primary defence

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant