Commit b8286b0
validation: document RoPE mismatch in context shift + S1 eval limitation
Rigorous multi-angle validation of S1-S4 findings uncovered two issues:
1. S2 (Infinite Scrollback) — RoPE POSITION MISMATCH:
After context shift, keys in KV cache retain their original RoPE
rotation angles, but new queries use RoPE(new_pos). The relative
positional distances become incorrect. This is the same limitation
as llama.cpp's basic context shift. Quality degrades ~2-5% per
shift (unmeasured). Added NOTE in code; proper fix requires either
key re-rotation or position offsets in attention. Tracked for v0.11.
2. S1 (Attention-Aware) — EVAL LENGTH LIMITATION:
All PPL measurements were at 957 tokens (tokenizer produces only
958 tokens regardless of input file size — suspected tokenizer cap).
The "2-bit + k512 Pareto-dominates flat 4-bit" claim was measured
with 53.5% of tokens at FP32 (512/957). At real long context (32K),
this fraction drops to 1.6%. The claim is theoretically sound
(attention concentrates on recent tokens) but NOT empirically
validated at long context. Honest correction #9.
S3 (Layer-Adaptive) negative result is unaffected — distribution
statistics don't depend on eval length.
S4 (Persistence) works but has limited validation scope (SmolLM2 only).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 82afed5 commit b8286b0
1 file changed
Lines changed: 16 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
372 | 372 | | |
373 | 373 | | |
374 | 374 | | |
375 | | - | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
376 | 387 | | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
377 | 392 | | |
378 | 393 | | |
379 | 394 | | |
| |||
0 commit comments