You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
turbo_kv_3bo: Variant G with 3-bit base + 8 outliers (research)
Smaller base codebook with the same per-block outlier mechanism as 4bo.
Block layout: 8 hdr + 48 mse_3bit + 8 out_idx + 16 out_val_fp16 = 80 bytes.
Lives between 4b (72B) and 5b (88B) on the size axis.
Llama 3.2 3B PPL on bench/data/ppl_1k.txt (FP32 = 13.56):
turbo_kv_4b 72B 14.28 (+5.3%)
turbo_kv_3bo 80B 14.03 (+3.5%) ← Pareto improvement over 4b
turbo_kv_5b 88B 13.60 (+0.34%)
turbo_kv_4bo 96B 13.86 (+2.2%) (dominated by 5b)
SmolLM2 135M PPL (FP32 = 18.62):
turbo_kv_4b 72B 19.70 (+5.8%)
turbo_kv_3bo 80B 20.45 (+9.8%) ← regression on this model
turbo_kv_5b 88B 18.94 (+1.7%)
turbo_kv_4bo 96B 19.29 (+3.6%)
Key finding: per-channel outlier handling is **model-dependent**. On
Llama 3.2 3B with head_dim=128 and a heavier-tailed distribution,
3bo Pareto-improves over 4b. On SmolLM2 135M with smaller dimensions,
the 3-bit base is too coarse even with outliers and we regress past
4b. 5b remains the quality champion across both models.
Decision: ship 3bo and 4bo as research/experimental types (selectable
via -k turbo_kv_3bo / turbo_kv_4bo). The README headline keeps
turbo_kv_4b as default and turbo_kv_5b as the quality option.
35/35 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments