Commit b8a27d2
feat(gemma4): E4B support + comprehensive numeric analysis
E4B test (42 layers, dim=2560, heads=8/2):
tok100 = -4.99 (vs E2B: -16.90) — E4B less sensitive to quant noise
layer_output_scale: 0.061 (vs E2B: 0.018) — 3x larger = more robust
llama.cpp E4B: "Four" (correct) ✅
Numeric comparison (E2B Q8_0 vs MLX BF16):
Embedding: diff < 0.012 ✅
Attn norm: diff < 0.1 ✅
Q projection: diff < 0.1 ✅
K projection: diff < 0.25 ✅
Layer 0 output: diff ~0.1 per element (compounds over 35 layers)
Final logits: tok100 = -16.90 (ours) vs 22.88 (MLX) ← 40 logit gap
Root cause: implementation-level numeric precision difference in
matmul accumulation. Q8_0 dequant is bit-identical, but FP32 matmul
accumulation order differs between our code (scalar loop) and
llama.cpp (SIMD fused). With layer_output_scale ~0.02, small
matmul rounding differences compound exponentially over 35 layers.
NOT a logic bug. The forward pass architecture is correct.
Fix requires either:
1. Higher precision weights (F16/BF16)
2. SIMD-fused matmul matching llama.cpp's accumulation order
3. Compensation for the layer_output_scale sensitivity
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent bbb9159 commit b8a27d2
1 file changed
Lines changed: 32 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11633 | 11633 | | |
11634 | 11634 | | |
11635 | 11635 | | |
| 11636 | + | |
| 11637 | + | |
| 11638 | + | |
| 11639 | + | |
| 11640 | + | |
| 11641 | + | |
| 11642 | + | |
| 11643 | + | |
| 11644 | + | |
11636 | 11645 | | |
11637 | 11646 | | |
11638 | 11647 | | |
| |||
11852 | 11861 | | |
11853 | 11862 | | |
11854 | 11863 | | |
11855 | | - | |
| 11864 | + | |
| 11865 | + | |
| 11866 | + | |
| 11867 | + | |
11856 | 11868 | | |
11857 | 11869 | | |
11858 | 11870 | | |
11859 | 11871 | | |
11860 | | - | |
11861 | | - | |
11862 | 11872 | | |
11863 | 11873 | | |
11864 | 11874 | | |
| |||
14348 | 14358 | | |
14349 | 14359 | | |
14350 | 14360 | | |
14351 | | - | |
| 14361 | + | |
| 14362 | + | |
| 14363 | + | |
| 14364 | + | |
| 14365 | + | |
14352 | 14366 | | |
| 14367 | + | |
| 14368 | + | |
14353 | 14369 | | |
14354 | 14370 | | |
14355 | | - | |
| 14371 | + | |
14356 | 14372 | | |
14357 | 14373 | | |
14358 | 14374 | | |
14359 | 14375 | | |
14360 | | - | |
14361 | | - | |
14362 | | - | |
14363 | | - | |
| 14376 | + | |
| 14377 | + | |
| 14378 | + | |
| 14379 | + | |
| 14380 | + | |
14364 | 14381 | | |
14365 | 14382 | | |
14366 | 14383 | | |
14367 | 14384 | | |
14368 | | - | |
| 14385 | + | |
14369 | 14386 | | |
14370 | 14387 | | |
14371 | 14388 | | |
14372 | 14389 | | |
14373 | | - | |
14374 | | - | |
14375 | | - | |
14376 | | - | |
| 14390 | + | |
| 14391 | + | |
| 14392 | + | |
| 14393 | + | |
| 14394 | + | |
14377 | 14395 | | |
14378 | 14396 | | |
14379 | 14397 | | |
| |||
0 commit comments