Skip to content

Commit f0091fc

Browse files
unamedkrclaude
andcommitted
fix(qwen35): detect DeltaNet layers before Phi-3 fused-QKV path
Regression from 08e8661 (Apr 12 split-source port): the Phi-3 fused QKV detection matched every layer with attn_qkv.weight, including Qwen3.5 DeltaNet layers (which also expose attn_qkv.weight for their conv1d input projection). All 32 layers were counted as self_attn instead of 8, so the CLI path treated the 24 DeltaNet layers as ordinary self-attention — forward pass produced garbage for Qwen3.5-4B via CLI (server was spared because its binary was stale from before the regression). Fix: probe blk.N.ssm_a before the attn_qkv check. When present, the layer is DeltaNet and the existing DeltaNet loading path takes over. quant.h single-header already had this guard — only split-source was affected. Verified: - CLI --chat "Hi" now produces "Hello! How can I help you?" (was: " -\n-") - Hybrid detection logs "8 attn layers out of 32 total" (matches server) - All 7 regression tests pass (Phi-3.5 Q8/Q4, Gemma E2B, Llama 3.1 8B, Llama 3.2 1B/3B, Qwen2.5-0.5B) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 8cea571 commit f0091fc

1 file changed

Lines changed: 8 additions & 2 deletions

File tree

src/engine/tq_model.c

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3246,10 +3246,16 @@ tq_model_t* tq_load_gguf(const char* path) {
32463246
* the existing FP32 weight pointer fields. For GGUF models, we use a special
32473247
* dispatch: if gguf_ctx is non-NULL, the forward pass uses tq_matmul_gguf. */
32483248

3249-
/* Fused QKV detection (Phi-3 etc.): attn_qkv.weight contains Q, K, V concatenated */
3249+
/* Fused QKV detection (Phi-3 etc.): attn_qkv.weight contains Q, K, V concatenated.
3250+
* NOTE: Qwen3.5 DeltaNet layers ALSO have attn_qkv.weight as their fused Q/K/V
3251+
* projection, but those are NOT self-attention. Distinguish by checking for
3252+
* DeltaNet marker tensor (ssm_a) at the same layer — if present, this is a
3253+
* DeltaNet layer and the attn_qkv will be loaded by the DeltaNet path below. */
3254+
snprintf(tname, sizeof(tname), "blk.%d.ssm_a", l);
3255+
const tq_gguf_tensor_t* ssm_probe = find_gguf_tensor(gguf, tname);
32503256
snprintf(tname, sizeof(tname), "blk.%d.attn_qkv.weight", l);
32513257
const tq_gguf_tensor_t* wqkv_t = find_gguf_tensor(gguf, tname);
3252-
if (wqkv_t) {
3258+
if (wqkv_t && !ssm_probe) {
32533259
layer->gguf_w_qkv = wqkv_t->data;
32543260
layer->gguf_w_qkv_type = wqkv_t->type;
32553261
c->has_fused_qkv = 1;

0 commit comments

Comments
 (0)