Skip to content

Commit 8cea571

Browse files
unamedkrclaude
andcommitted
feat(cli): filter chat-template control tokens from print_token
The --chat flag wraps the prompt with chat templates (Gemma 4, Phi-3, Llama 3, ChatML). The model then sometimes emits those template tokens back into the output stream, which the CLI printed literally as garbage. Now filters the same set as the server: <|think|> <think> </think> <|channel> <|turn> <turn|> <|end|> <|assistant|> <|user|> <|system|> <|im_end|> <|im_start|> <start_of_turn> <end_of_turn> <|begin_of_text|> <|end_of_text|> <|start_header_id|> <|end_header_id|> <|eot_id|> Before: Gemma 4 E2B --chat "Calculate 2+2..." → garbage "ussererererer..." After: Gemma 4 E2B --chat "Calculate 2+2..." → "The number is **4**." All 35 unit tests + 7 regression tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent f140654 commit 8cea571

1 file changed

Lines changed: 19 additions & 1 deletion

File tree

tools/quant.c

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,27 @@ static int clock_gettime(int id, struct timespec* ts) {
6262
/* Forward-pass profiling flag (defined in tq_transformer.c) */
6363
extern int g_tq_profile_enabled;
6464

65-
/* Streaming token callback */
65+
/* Streaming token callback — filters chat-template control tokens that
66+
* would otherwise leak into CLI output when --chat is active. */
6667
static void print_token(const char* text, void* user_data) {
6768
(void)user_data;
69+
if (!text || !text[0]) return;
70+
71+
/* Skip thinking / template tokens (same list as server). Gemma 4 raw
72+
* output contains <|think|>, Qwen3 uses <think>/</think>, chat-templated
73+
* models may emit <|end|>/<|im_end|>/<|eot_id|> etc. */
74+
if (strstr(text, "<|think|>") || strstr(text, "<think>") ||
75+
strstr(text, "</think>") || strstr(text, "<|channel>") ||
76+
strstr(text, "<|turn>") || strstr(text, "<turn|>") ||
77+
strstr(text, "<|end|>") || strstr(text, "<|assistant|>") ||
78+
strstr(text, "<|user|>") || strstr(text, "<|system|>") ||
79+
strstr(text, "<|im_end|>") || strstr(text, "<|im_start|>") ||
80+
strstr(text, "<start_of_turn>") || strstr(text, "<end_of_turn>") ||
81+
strstr(text, "<|begin_of_text|>") || strstr(text, "<|end_of_text|>") ||
82+
strstr(text, "<|start_header_id|>") || strstr(text, "<|end_header_id|>") ||
83+
strstr(text, "<|eot_id|>"))
84+
return;
85+
6886
fputs(text, stdout);
6987
fflush(stdout);
7088
}

0 commit comments

Comments
 (0)