Commit f969ee5
perf: switch to Phi-3.5-Q8_0 — 2x faster than Q4_K_M on NEON
Q8_0 (3.8GB): 3.0 tok/s — simple int8 dequant, NEON-friendly
Q4_K_M (2.2GB): 1.5 tok/s — complex super-block dequant overhead
Both produce identical quality output. Q8_0 is the better choice
for Apple Silicon NEON where dequant cost dominates bandwidth.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent d06d0bc commit f969ee5
2 files changed
Lines changed: 6 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
74 | | - | |
75 | | - | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
76 | 78 | | |
77 | 79 | | |
78 | 80 | | |
| |||
0 commit comments