Commit 662f7eb
WBS 2.2: 1000-token PPL comparison — all KV types identical
SmolLM2 1.7B (Llama), 998 tokens:
uniform_4b: PPL = 12.07 (baseline)
turbo_kv_1b: PPL = 12.07 (+0.00%)
turbo_kv_3b: PPL = 12.07 (+0.00%)
1-bit K+Q4V: PPL = 12.38 (+2.6%)
K-only quantization is PPL-identical at 1000 tokens.
llama.cpp fork: builds + runs, but 1-bit dequant quality insufficient
for standard attention (head_dim=64). Need head_dim≥128 models.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 9b31d94 commit 662f7eb
1 file changed
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
0 commit comments