Skip to content

Commit f6f513b

Browse files
unamedkrclaude
andcommitted
fix(gemma4): p-RoPE timing fix — apply AFTER hybrid detection
Critical bug: the proportional RoPE adjustment for full attention layers (512 -> 128 dims) was placed BEFORE hybrid attention detection, so c->full_head_dim was still 0 at that point → adjustment never ran. Moved p-RoPE adjustment to after c->full_head_dim is set (~line 12240). Now correctly logs: "Gemma4 p-RoPE — full layer RoPE dims 512 -> 128" Also confirmed: previous "server crashes" were actually curl timeouts (262K vocab + FP32 weights = very slow lm_head matmul on CPU). Status: Gemma 4 E2B still produces garbage with all fixes applied: - RoPE dims: swa=256, full=128 (p-RoPE) ✅ - Attention softcap: disabled for Gemma 4 ✅ - layer_output_scale: simple multiply ✅ - Chat template: Gemma format ✅ - KV sharing: framework ready (off by default) ✅ Remaining hypothesis: residual connection order, sliding window masking, or weight loading issue in attention projection matrices. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4ce525b commit f6f513b

1 file changed

Lines changed: 14 additions & 18 deletions

File tree

quant.h

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11597,24 +11597,9 @@ tq_model_t* tq_load_gguf(const char* path) {
1159711597
/* Gemma 4 (STEP35) detection: architecture string is "gemma4" */
1159811598
if (strstr(gguf->arch, "gemma4") != NULL) {
1159911599
c->is_gemma4 = 1;
11600-
/* Gemma 4 proportional RoPE for full attention layers:
11601-
* HuggingFace config has partial_rotary_factor=0.25 for full layers.
11602-
* GGUF rope.dimension_count=512 is the full head_dim, NOT the RoPE dim.
11603-
* Actual RoPE dims for full layers = full_head_dim * 0.25 = 128.
11604-
*
11605-
* Sliding layers: rope.dimension_count_swa=256 = full head_dim(256) → all rotated.
11606-
*
11607-
* We adjust rope_n_dims_full to reflect the partial rotation. */
11608-
if (c->rope_n_dims_full > 0 && c->full_head_dim > 0) {
11609-
/* partial_rotary_factor = 0.25 for Gemma 4 E2B/E4B */
11610-
int partial_rope = c->full_head_dim / 4; /* 512/4 = 128 */
11611-
fprintf(stderr, "tq_load_gguf: Gemma4 p-RoPE — full layer RoPE dims %d -> %d "
11612-
"(partial_rotary_factor=0.25)\n", c->rope_n_dims_full, partial_rope);
11613-
c->rope_n_dims_full = partial_rope;
11614-
}
11615-
fprintf(stderr, "tq_load_gguf: Gemma4 — RoPE dims swa=%d full=%d, "
11616-
"GeGLU FFN, rope_freqs for full layers only\n",
11617-
c->rope_n_dims, c->rope_n_dims_full);
11600+
/* Gemma 4 proportional RoPE: deferred to after hybrid attention
11601+
* detection sets full_head_dim (see below, ~line 12238). */
11602+
fprintf(stderr, "tq_load_gguf: Gemma4 detected (p-RoPE will be applied after hybrid detection)\n");
1161811603
}
1161911604
fprintf(stderr, "tq_load_gguf: Gemma family detected (sliding_window=%d)\n", c->sliding_window);
1162011605
} else if (c->is_moe) {
@@ -12253,6 +12238,17 @@ tq_model_t* tq_load_gguf(const char* path) {
1225312238
}
1225412239
}
1225512240

12241+
/* Gemma 4 proportional RoPE: NOW apply, after hybrid detection set full_head_dim.
12242+
* HuggingFace config: partial_rotary_factor=0.25 for full attention layers.
12243+
* GGUF rope.dimension_count=512 is the full head_dim, NOT the rotated dim.
12244+
* Actual RoPE dims for full layers = full_head_dim / 4 = 128. */
12245+
if (c->is_gemma4 && c->rope_n_dims_full > 0 && c->full_head_dim > 0) {
12246+
int partial_rope = c->full_head_dim / 4; /* 512/4 = 128 */
12247+
fprintf(stderr, "tq_load_gguf: Gemma4 p-RoPE — full layer RoPE dims %d -> %d "
12248+
"(partial_rotary_factor=0.25)\n", c->rope_n_dims_full, partial_rope);
12249+
c->rope_n_dims_full = partial_rope;
12250+
}
12251+
1225612252
/* Load embedding + output weights */
1225712253
const tq_gguf_tensor_t* emb_t = find_gguf_tensor(gguf, "token_embd.weight");
1225812254
if (emb_t) {

0 commit comments

Comments
 (0)