Commit 7559f45
BREAKTHROUGH: Delta+2-bit KV = 4-bit quality at 3 bpe (cosine 0.996)
Delta compression on key vectors: key[t]-key[t-1] has ~30% of absolute range.
2-bit quantization of delta matches 4-bit quality on absolute values.
Prototype results (head_dim=128, seq_len=1024):
Direct 2-bit: cosine 0.949 (3.0 bpe)
Direct 4-bit: cosine 0.995 (4.25 bpe)
Delta+2-bit: cosine 0.996 (3.04 bpe) ← EXCEEDS 4-bit!
No drift accumulation even without I-frames (pure delta, 0.996 cosine).
Implementation: compute delta before quantize, accumulate on dequant.
Next: integrate into engine and verify PPL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent b7fe468 commit 7559f45
18 files changed
Lines changed: 693 additions & 51 deletions
File tree
- bench
- data
- include/turboquant
- integrations/llamacpp
- src
- backend
- cuda
- metal
- rocm
- core
- engine
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
0 commit comments