You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
R8: Proportional RoPE for full attention layers
- GGUF rope.dimension_count=512 is the full head_dim, NOT the RoPE dim
- Gemma 4 uses partial_rotary_factor=0.25 for full layers
- Actual RoPE dims = full_head_dim * 0.25 = 128 (not 512)
- Adjusted rope_n_dims_full accordingly
R10: layer_output_scale — simple multiply (llama.cpp reference)
- Previous: x = residual + los * (x - residual) — separated residual
- Correct (llama.cpp gemma4-iswa.cpp): x *= los — simple elementwise
- Added TQ_MAX_LAYERS debug env for per-layer diagnosis
Still produces garbage — remaining candidates:
- Residual connection order (pre-norm vs post-norm flow)
- PLE gating uses gelu, not silu (llama.cpp confirms LLM_FFN_GELU)
- output_gguf Q5_0 matmul accuracy for 262K vocab
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments