Commit f4934e9
debug: root-cause FP16 V garbage — it's actually quant K cache mismatch
Further investigation showed V_fp16 cache bytes are BIT-IDENTICAL between
batched and baseline at all layers 0-3 (XOR hash match). So V was never
the issue.
The real root cause: baseline with KV quantization (default
turbo_kv_4b) stores K in quant_key_cache (not s->key_cache FP32). Its
attention reads K from quant_key_cache and dequants on-the-fly.
My tq_forward_batch unconditionally writes FP32 K to s->key_cache.
When final tq_forward(pos=last) runs after batched, it reads from
quant_key_cache for history positions — which is ZERO because batched
never populated it. Attention sees zero K for pos 0, breaks output.
This is correctly guarded by the existing kv_is_fp32 gate: batched
only runs when KV cache is FP32, so the quant_key_cache mismatch
doesn't occur. Removed the diagnostic dumps now that the analysis
is documented.
To enable batched for default (quant K) mode later: batched needs to
write to quant_key_cache via traits->quantize per head per block.
That's ~50 LOC but touches multiple code paths. Deferred.
11/11 STRICT tests pass. Default behavior unchanged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 103e50f commit f4934e9
1 file changed
Lines changed: 0 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3298 | 3298 | | |
3299 | 3299 | | |
3300 | 3300 | | |
3301 | | - | |
3302 | | - | |
3303 | | - | |
3304 | 3301 | | |
3305 | 3302 | | |
3306 | 3303 | | |
| |||
0 commit comments