Commit 4da6915
TurboQuant ablation: QJL stage contributes nothing, MSE stage is the bug
Added env-var ablation switch to turbo_kv attention paths and ran:
turbo_kv_4b full (MSE+QJL): PPL 16.03
turbo_kv_4b MSE-only: PPL 16.03 ← byte-identical
turbo_kv_3b full (MSE+QJL): PPL 25.84
turbo_kv_3b MSE-only: PPL 25.84 ← byte-identical
Two findings:
1. The QJL correction term is computing as ~0 regardless of input. The
constant √(π/2)/m may be wrong for our Rademacher (±1) projection
rows — the original QJL paper uses Gaussian rows.
2. Even ignoring QJL, the MSE-only Lloyd-Max-Gaussian codebook is
strictly worse than uniform per-block min-max at the same bit
budget. Real key vectors after a single Hadamard rotation still have
heavier tails than N(0,1), so the codebook clips outliers that
uniform_4b's per-block range captures naturally.
Two structural fixes are needed to match the paper:
- Outlier handling at Stage 1 (paper does this — 32 outlier channels)
- QJL constant verification for Rademacher rows
Reverted the env-var ablation switch (kept the findings in the
reproduction doc).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent a3262ee commit 4da6915
1 file changed
Lines changed: 22 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
59 | 81 | | |
60 | 82 | | |
61 | 83 | | |
| |||
0 commit comments