Skip to content

Commit 18eeed1

Browse files
unamedkrclaude
andcommitted
bench: llama.cpp full KV type PPL comparison (q4_0 to q8_0)
SmolLM2 1.7B, 2K tokens: f16: 2.83, q8_0: 2.82, q5_1: 2.86, q5_0: 2.85, q4_1: 2.92, q4_0: 3.13 Q5 types show <1% PPL loss. Q4_0 shows +10.6%. This provides the baseline for comparison with TurboQuant approaches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3bc4ded commit 18eeed1

1 file changed

Lines changed: 11 additions & 0 deletions

File tree

bench/results/ppl_comparison.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,14 @@ The KEY metric is the DELTA from each engine's own baseline.
3232

3333
TurboQuant achieves 4x more compression on keys with zero PPL increase,
3434
while llama.cpp's Q4 KV shows measurable quality degradation.
35+
36+
## llama.cpp Full KV Type Comparison (SmolLM2 1.7B, 2K tokens)
37+
38+
| KV Type | PPL | Delta vs F16 | Bits/element |
39+
|---------|-----|-------------|--------------|
40+
| f16 (baseline) | 2.83 || 16 |
41+
| q8_0 | 2.82 | -0.4% | 8 |
42+
| q5_1 | 2.86 | +0.9% | 5 |
43+
| q5_0 | 2.85 | +0.6% | 5 |
44+
| q4_1 | 2.92 | +3.2% | 4 |
45+
| q4_0 | 3.13 | +10.6% | 4 |

0 commit comments

Comments
 (0)