Commit 7dbd4ad
Metal GPU matmul dispatch connected to forward pass
WBS v1.3 Phase 1 progress:
- tq_matmul_gguf() now dispatches to Metal GPU for supported types
(IQ2_XXS, IQ2_S, Q8_0, Q4_K) when Metal is available
- Batch mode: GPU dispatch when transformer wraps ops in batch
- Immediate mode: GPU for out_dim >= 512, CPU for smaller
- CPU fallback: transparent when Metal returns unsupported type
- Both CPU and Metal builds compile clean, 34/34 tests pass
Current limitation: Q4 load-time converted weights use internal
format that doesn't match GGUF Metal shaders. Native GGUF weights
(IQ2, Q8_0, Q4_K_M without conversion) will trigger GPU path.
Next: extend batch mode to cover entire forward pass per layer,
reducing per-dispatch overhead from ~30 dispatches to 1 per token.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 0f6f78c commit 7dbd4ad
1 file changed
Lines changed: 32 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1691 | 1691 | | |
1692 | 1692 | | |
1693 | 1693 | | |
1694 | | - | |
1695 | | - | |
1696 | | - | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
| 1702 | + | |
| 1703 | + | |
| 1704 | + | |
| 1705 | + | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
1697 | 1726 | | |
1698 | 1727 | | |
1699 | 1728 | | |
| |||
0 commit comments