Commit d0df1f5
perf: default thread count = P-core count on Apple Silicon
M1 Pro is 8P+2E. Mixing P and E cores at the same priority makes the
two slow E threads become stragglers — the matmul barrier waits on
them while the 8 P threads sit idle. 8 P-only beats 10 mixed in
practice. Detect via sysctlbyname("hw.perflevel0.physicalcpu").
Phi-3.5 Q4_K_M: 6.2 → 6.4 tok/s (+3%, total session 3.2 → 6.4 = 2.0×).
Llama 3.2 3B Q8_0: 18.6 → 19.3 tok/s (+4%).
11/11 STRICT+COHERENT+Metal-ON pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent aadd059 commit d0df1f5
1 file changed
Lines changed: 17 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
40 | 43 | | |
41 | 44 | | |
42 | 45 | | |
| |||
195 | 198 | | |
196 | 199 | | |
197 | 200 | | |
198 | | - | |
199 | | - | |
200 | | - | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
201 | 215 | | |
202 | 216 | | |
203 | 217 | | |
| |||
0 commit comments