Skip to content

Commit 04acadb

Browse files
unamedkrclaude
andcommitted
Verified results: 4-bit K + Q4 V = 3.8x real compression, PPL < 1%
Tested across 3 models (SmolLM2 1.7B, Qwen 0.8B, Qwen 4B). All using REAL dequant path (no FP32 fallback). Sweet spot: uniform_4b K + Q4 V = 3.8x compression, PPL ±1%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1739176 commit 04acadb

1 file changed

Lines changed: 34 additions & 0 deletions

File tree

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# REAL KV Compression Results (FP32 key cache eliminated)
2+
3+
All measurements use the REAL dequant path — no FP32 fallback.
4+
Keys stored ONLY in quantized cache. Attention dequantizes per-query.
5+
6+
## uniform_4b K + Q4 V = 3.8x compression, PPL < 1%
7+
8+
| Model | Params | Baseline PPL | K4+VQ4 PPL | Delta | Tokens |
9+
|-------|--------|-------------|-----------|-------|--------|
10+
| SmolLM2 1.7B | 1.7B | 9.51 | 9.36 | **-1.6%** | 814 |
11+
| Qwen3.5 0.8B | 752M | 153.6 | 155.1 | **+0.9%** | 810 |
12+
| Qwen3.5 4B | 4B | 19.63 | 19.75 | **+0.6%** | 810 |
13+
14+
## All KV configs tested (SmolLM2 1.7B)
15+
16+
| Config | PPL | Delta | K+V Memory (32K) | Compression |
17+
|--------|-----|-------|-------------------|-------------|
18+
| FP16 K+V | 9.51 || 6.44 GB | 1.0x |
19+
| uniform_4b K + FP16 V | 9.51 | +0.0% | 4.03 GB | 1.6x |
20+
| **uniform_4b K + Q4 V** | **9.36** | **-1.6%** | **1.71 GB** | **3.8x** |
21+
| uniform_4b K + Q2 V | 12.95 | +36% | 1.41 GB | 4.6x |
22+
| turbo_kv_4b K + FP16 V | 10.07 | +5.9% | ~4 GB | ~1.6x |
23+
| turbo_kv_3b K + FP16 V | 22.45 | +136% | ~3.8 GB | ~1.7x |
24+
| turbo_kv_1b K + FP16 V | 1294.8 | catastrophic | ~3.5 GB | ~1.8x |
25+
| uniform_2b K + FP16 V | 1618.6 | catastrophic | ~3.3 GB | ~2.0x |
26+
27+
## Key findings
28+
29+
1. **4-bit K is lossless.** uniform_4b gives exactly +0.00% PPL delta.
30+
2. **Q4 V adds minimal noise.** Combined K4+VQ4 is within ±2% of baseline.
31+
3. **Below 4-bit K: quality cliff.** 3-bit and below show significant degradation.
32+
4. **Below Q4 V: noticeable degradation.** Q2 V adds +36% PPL.
33+
5. **RHT-based types (turbo_kv_*) underperform uniform at head_dim=64.**
34+
turbo_kv_4b PPL is worse than uniform_4b despite same bit count.

0 commit comments

Comments
 (0)