Skip to content

Commit f4934e9

Browse files
unamedkrclaude
andcommitted
debug: root-cause FP16 V garbage — it's actually quant K cache mismatch
Further investigation showed V_fp16 cache bytes are BIT-IDENTICAL between batched and baseline at all layers 0-3 (XOR hash match). So V was never the issue. The real root cause: baseline with KV quantization (default turbo_kv_4b) stores K in quant_key_cache (not s->key_cache FP32). Its attention reads K from quant_key_cache and dequants on-the-fly. My tq_forward_batch unconditionally writes FP32 K to s->key_cache. When final tq_forward(pos=last) runs after batched, it reads from quant_key_cache for history positions — which is ZERO because batched never populated it. Attention sees zero K for pos 0, breaks output. This is correctly guarded by the existing kv_is_fp32 gate: batched only runs when KV cache is FP32, so the quant_key_cache mismatch doesn't occur. Removed the diagnostic dumps now that the analysis is documented. To enable batched for default (quant K) mode later: batched needs to write to quant_key_cache via traits->quantize per head per block. That's ~50 LOC but touches multiple code paths. Deferred. 11/11 STRICT tests pass. Default behavior unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 103e50f commit f4934e9

1 file changed

Lines changed: 0 additions & 3 deletions

File tree

src/engine/tq_transformer.c

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3298,9 +3298,6 @@ int tq_forward_batch(tq_model_t* model, tq_state_t* s,
32983298
memcpy(s->value_cache + (size_t)l * kv_layer_stride + (size_t)pos * kv_dim,
32993299
VB + (size_t)n * kv_dim, (size_t)kv_dim * sizeof(float));
33003300
} else if (s->value_cache_fp16) {
3301-
/* Match tq_forward exactly: hardware FP16 conversion via NEON
3302-
* vcvt_f16_f32. Inline manual conversion gave subtly different
3303-
* rounding which propagated through attention and broke output. */
33043301
uint16_t* dst = s->value_cache_fp16
33053302
+ (size_t)l * kv_layer_stride + (size_t)pos * kv_dim;
33063303
f32_to_fp16_vec(VB + (size_t)n * kv_dim, dst, kv_dim);

0 commit comments

Comments
 (0)