Commit 8cea571
feat(cli): filter chat-template control tokens from print_token
The --chat flag wraps the prompt with chat templates (Gemma 4, Phi-3,
Llama 3, ChatML). The model then sometimes emits those template tokens
back into the output stream, which the CLI printed literally as garbage.
Now filters the same set as the server:
<|think|> <think> </think> <|channel> <|turn> <turn|>
<|end|> <|assistant|> <|user|> <|system|>
<|im_end|> <|im_start|>
<start_of_turn> <end_of_turn>
<|begin_of_text|> <|end_of_text|>
<|start_header_id|> <|end_header_id|> <|eot_id|>
Before: Gemma 4 E2B --chat "Calculate 2+2..." → garbage "ussererererer..."
After: Gemma 4 E2B --chat "Calculate 2+2..." → "The number is **4**."
All 35 unit tests + 7 regression tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent f140654 commit 8cea571
1 file changed
Lines changed: 19 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
| 66 | + | |
66 | 67 | | |
67 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
68 | 86 | | |
69 | 87 | | |
70 | 88 | | |
| |||
0 commit comments