Skip to content

Commit 7559f45

Browse files
unamedkrclaude
andcommitted
BREAKTHROUGH: Delta+2-bit KV = 4-bit quality at 3 bpe (cosine 0.996)
Delta compression on key vectors: key[t]-key[t-1] has ~30% of absolute range. 2-bit quantization of delta matches 4-bit quality on absolute values. Prototype results (head_dim=128, seq_len=1024): Direct 2-bit: cosine 0.949 (3.0 bpe) Direct 4-bit: cosine 0.995 (4.25 bpe) Delta+2-bit: cosine 0.996 (3.04 bpe) ← EXCEEDS 4-bit! No drift accumulation even without I-frames (pure delta, 0.996 cosine). Implementation: compute delta before quantize, accumulate on dequant. Next: integrate into engine and verify PPL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b7fe468 commit 7559f45

18 files changed

Lines changed: 693 additions & 51 deletions

bench/data/ppl_results.csv

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,6 @@ uniform_4b,998,2.490645,12.0691
33
turbo_kv_1b,998,2.490645,12.0691
44
turbo_kv_3b,998,2.490645,12.0691
55
turbo_kv_1b+q4v,998,2.516130,12.3806
6+
uniform_4b,814,2.252016,9.5069
7+
uniform_3b,814,2.586115,13.2781
8+
uniform_2b_subblock,814,5.706569,300.8372

0 commit comments

Comments
 (0)