fix(qwen35): detect DeltaNet layers before Phi-3 fused-QKV path

unamedkr · claude · unamedkr · commit f0091fcbe5d4 · 2026-04-15T07:49:40.000+09:00
Regression from 08e8661 (Apr 12 split-source port): the Phi-3 fused QKV detection matched every layer with attn_qkv.weight, including Qwen3.5 DeltaNet layers (which also expose attn_qkv.weight for their conv1d input projection). All 32 layers were counted as self_attn instead of 8, so the CLI path treated the 24 DeltaNet layers as ordinary self-attention — forward pass produced garbage for Qwen3.5-4B via CLI (server was spared because its binary was stale from before the regression). Fix: probe blk.N.ssm_a before the attn_qkv check. When present, the layer is DeltaNet and the existing DeltaNet loading path takes over. quant.h single-header already had this guard — only split-source was affected. Verified: - CLI --chat "Hi" now produces "Hello! How can I help you?" (was: " -\n-") - Hybrid detection logs "8 attn layers out of 32 total" (matches server) - All 7 regression tests pass (Phi-3.5 Q8/Q4, Gemma E2B, Llama 3.1 8B, Llama 3.2 1B/3B, Qwen2.5-0.5B) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
diff --git a/src/engine/tq_model.c b/src/engine/tq_model.c
@@ -3246,10 +3246,16 @@ tq_model_t* tq_load_gguf(const char* path) {
          * the existing FP32 weight pointer fields. For GGUF models, we use a special
          * dispatch: if gguf_ctx is non-NULL, the forward pass uses tq_matmul_gguf. */
 
-        /* Fused QKV detection (Phi-3 etc.): attn_qkv.weight contains Q, K, V concatenated */
+        /* Fused QKV detection (Phi-3 etc.): attn_qkv.weight contains Q, K, V concatenated.
+         * NOTE: Qwen3.5 DeltaNet layers ALSO have attn_qkv.weight as their fused Q/K/V
+         * projection, but those are NOT self-attention. Distinguish by checking for
+         * DeltaNet marker tensor (ssm_a) at the same layer — if present, this is a
+         * DeltaNet layer and the attn_qkv will be loaded by the DeltaNet path below. */
+        snprintf(tname, sizeof(tname), "blk.%d.ssm_a", l);
+        const tq_gguf_tensor_t* ssm_probe = find_gguf_tensor(gguf, tname);
         snprintf(tname, sizeof(tname), "blk.%d.attn_qkv.weight", l);
         const tq_gguf_tensor_t* wqkv_t = find_gguf_tensor(gguf, tname);
-        if (wqkv_t) {
+        if (wqkv_t && !ssm_probe) {
             layer->gguf_w_qkv = wqkv_t->data;
             layer->gguf_w_qkv_type = wqkv_t->type;
             c->has_fused_qkv = 1;