Commit 65bbf5f
Gemma 4 full architecture support: E2B (2B dense) + 26B-A4B (MoE)
Major features implemented:
- Hybrid sliding/full attention with per-layer head_dim (256/512)
- Per-Layer Embedding (PLE) injection — critical for E2B
- Variable FFN dim per layer (6144/12288 for E2B)
- MoE fused gate_up_exps loading (128 experts, Gemma 4)
- K=V attention for full layers (26B-A4B)
- Layer output scaling (layer_scalar)
- Final logit soft-capping (30.0)
- Router input scaling (ffn_gate_inp.scale)
- Per-expert output scaling (ffn_down_exps.scale)
- Gemma 4 norm auto-detection (weight-based, no +1 needed)
- Gemma 4 BOS/EOS handling (no BOS, EOS=106)
- Attention scale=1.0 for dense Gemma 4 with QK-norm
Verified: Qwen 0.8B 26 tok/s (regression OK), E2B 7.2 tok/s, 26B 0.9 tok/s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 199f066 commit 65bbf5f
6 files changed
Lines changed: 433 additions & 168 deletions
File tree
- include/turboquant
- src/engine
- tools
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
102 | 104 | | |
103 | | - | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
104 | 130 | | |
105 | 131 | | |
106 | 132 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| 61 | + | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| |||
84 | 85 | | |
85 | 86 | | |
86 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
87 | 95 | | |
88 | 96 | | |
89 | 97 | | |
| |||
206 | 214 | | |
207 | 215 | | |
208 | 216 | | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
209 | 224 | | |
210 | 225 | | |
211 | 226 | | |
| |||
323 | 338 | | |
324 | 339 | | |
325 | 340 | | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
326 | 344 | | |
327 | 345 | | |
328 | 346 | | |
329 | 347 | | |
330 | 348 | | |
331 | | - | |
| 349 | + | |
332 | 350 | | |
333 | 351 | | |
334 | 352 | | |
| |||
342 | 360 | | |
343 | 361 | | |
344 | 362 | | |
| 363 | + | |
345 | 364 | | |
346 | 365 | | |
347 | 366 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
| 169 | + | |
169 | 170 | | |
170 | 171 | | |
171 | 172 | | |
| |||
201 | 202 | | |
202 | 203 | | |
203 | 204 | | |
204 | | - | |
205 | | - | |
206 | | - | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
207 | 211 | | |
208 | 212 | | |
209 | 213 | | |
| |||
0 commit comments