Commit f6f513b
fix(gemma4): p-RoPE timing fix — apply AFTER hybrid detection
Critical bug: the proportional RoPE adjustment for full attention layers
(512 -> 128 dims) was placed BEFORE hybrid attention detection, so
c->full_head_dim was still 0 at that point → adjustment never ran.
Moved p-RoPE adjustment to after c->full_head_dim is set (~line 12240).
Now correctly logs: "Gemma4 p-RoPE — full layer RoPE dims 512 -> 128"
Also confirmed: previous "server crashes" were actually curl timeouts
(262K vocab + FP32 weights = very slow lm_head matmul on CPU).
Status: Gemma 4 E2B still produces garbage with all fixes applied:
- RoPE dims: swa=256, full=128 (p-RoPE) ✅
- Attention softcap: disabled for Gemma 4 ✅
- layer_output_scale: simple multiply ✅
- Chat template: Gemma format ✅
- KV sharing: framework ready (off by default) ✅
Remaining hypothesis: residual connection order, sliding window
masking, or weight loading issue in attention projection matrices.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 4ce525b commit f6f513b
1 file changed
Lines changed: 14 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11597 | 11597 | | |
11598 | 11598 | | |
11599 | 11599 | | |
11600 | | - | |
11601 | | - | |
11602 | | - | |
11603 | | - | |
11604 | | - | |
11605 | | - | |
11606 | | - | |
11607 | | - | |
11608 | | - | |
11609 | | - | |
11610 | | - | |
11611 | | - | |
11612 | | - | |
11613 | | - | |
11614 | | - | |
11615 | | - | |
11616 | | - | |
11617 | | - | |
| 11600 | + | |
| 11601 | + | |
| 11602 | + | |
11618 | 11603 | | |
11619 | 11604 | | |
11620 | 11605 | | |
| |||
12253 | 12238 | | |
12254 | 12239 | | |
12255 | 12240 | | |
| 12241 | + | |
| 12242 | + | |
| 12243 | + | |
| 12244 | + | |
| 12245 | + | |
| 12246 | + | |
| 12247 | + | |
| 12248 | + | |
| 12249 | + | |
| 12250 | + | |
| 12251 | + | |
12256 | 12252 | | |
12257 | 12253 | | |
12258 | 12254 | | |
| |||
0 commit comments