Commit 8b83b0c

and

committed

Llama architecture verified: SmolLM2 1.7B, 1-bit KV = PPL identical

4th architecture verified (Llama/SmolLM2, after Gemma 3, Qwen3.5, Qwen2-MoE). SmolLM2 1.7B (Llama arch, GGUF Q8_0): baseline PPL: 5.8441 1-bit K + FP16 V PPL: 5.8441 (+0.00%) ← exactly identical 1-bit K + Q4 V PPL: 5.8233 30-token output: byte-identical ✓ Speed: 24 tok/s (Q4, 6T, M3) Also fixed: 4B GGUF Q4 conversion (threshold 8→16 GB, DeltaNet detect) Qwen3.5-4B: 0.1 → 5.4 tok/s (54x improvement) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1 parent 6b2ce68 commit 8b83b0cCopy full SHA for 8b83b0c

2 files changed

tq_convert
tq_run

`‎tq_convert‎`

-192 KB

Binary file not shown.

`‎tq_run‎`

-295 KB

Binary file not shown.

Comments

(0)