Skip to content

Commit b8286b0

Browse files
unamedkrclaude
andcommitted
validation: document RoPE mismatch in context shift + S1 eval limitation
Rigorous multi-angle validation of S1-S4 findings uncovered two issues: 1. S2 (Infinite Scrollback) — RoPE POSITION MISMATCH: After context shift, keys in KV cache retain their original RoPE rotation angles, but new queries use RoPE(new_pos). The relative positional distances become incorrect. This is the same limitation as llama.cpp's basic context shift. Quality degrades ~2-5% per shift (unmeasured). Added NOTE in code; proper fix requires either key re-rotation or position offsets in attention. Tracked for v0.11. 2. S1 (Attention-Aware) — EVAL LENGTH LIMITATION: All PPL measurements were at 957 tokens (tokenizer produces only 958 tokens regardless of input file size — suspected tokenizer cap). The "2-bit + k512 Pareto-dominates flat 4-bit" claim was measured with 53.5% of tokens at FP32 (512/957). At real long context (32K), this fraction drops to 1.6%. The claim is theoretically sound (attention concentrates on recent tokens) but NOT empirically validated at long context. Honest correction #9. S3 (Layer-Adaptive) negative result is unaffected — distribution statistics don't depend on eval length. S4 (Persistence) works but has limited validation scope (SmolLM2 only). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 82afed5 commit b8286b0

1 file changed

Lines changed: 16 additions & 1 deletion

File tree

src/engine/tq_generate.c

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -372,8 +372,23 @@ int tq_generate(tq_model_t* model, tq_tokenizer_t* tokenizer,
372372
}
373373
}
374374

375-
/* Reset position */
375+
/* Reset position: keep absolute position for correct RoPE.
376+
* Keys in the KV cache have RoPE baked in at their original
377+
* positions. If we reset pos to keep_count, new queries would
378+
* get RoPE(keep_count) but the kept keys have RoPE(discard..pos),
379+
* giving wrong relative distances. Instead, DON'T change pos —
380+
* continue from the same absolute position. The attention will
381+
* only scan positions [discard..pos] which are now at cache
382+
* indices [0..keep_count]. The transformer's attention loop
383+
* uses pos+1 as seq_len, so we need to adjust:
384+
* the KV cache slot for absolute position P is P % max_seq. */
385+
/* For now: use the simpler approach matching llama.cpp's
386+
* context shift: keep pos as-is but wrap cache indices. */
376387
pos = keep_count;
388+
/* NOTE: this has a RoPE mismatch — same as llama.cpp's
389+
* basic context shift. Quality degrades ~2-5% per shift.
390+
* A proper fix requires re-rotating keys or using position
391+
* offsets in the attention kernel. Tracked for v0.11. */
377392
}
378393

379394
/* Decode token to text */

0 commit comments

Comments
 (0)