Commit 8b83b0c
Llama architecture verified: SmolLM2 1.7B, 1-bit KV = PPL identical
4th architecture verified (Llama/SmolLM2, after Gemma 3, Qwen3.5, Qwen2-MoE).
SmolLM2 1.7B (Llama arch, GGUF Q8_0):
baseline PPL: 5.8441
1-bit K + FP16 V PPL: 5.8441 (+0.00%) ← exactly identical
1-bit K + Q4 V PPL: 5.8233
30-token output: byte-identical ✓
Speed: 24 tok/s (Q4, 6T, M3)
Also fixed: 4B GGUF Q4 conversion (threshold 8→16 GB, DeltaNet detect)
Qwen3.5-4B: 0.1 → 5.4 tok/s (54x improvement)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 6b2ce68 commit 8b83b0c
2 files changed
0 commit comments