You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Llama 3.2 support + thought token filtering + EOS handling
- Fix hybrid attention detection: restrict to Gemma only (was breaking Llama
by misdetecting head_dim=64 instead of 128 due to GQA kv_heads)
- Llama 3.2 3B Instruct: verified, 11.6 tok/s, correct code generation
- Filter Gemma 4 thinking tokens: thought, <channel|>, <tool|>, <mask>, <unused*>
- Add Llama 3 EOS tokens: 128001 (<|end_of_text|>), 128009 (<|eot_id|>)
- Clean output: "The capital of France is **Paris**." (no noise tokens)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments