Skip to content

Commit d0df1f5

Browse files
unamedkrclaude
andcommitted
perf: default thread count = P-core count on Apple Silicon
M1 Pro is 8P+2E. Mixing P and E cores at the same priority makes the two slow E threads become stragglers — the matmul barrier waits on them while the 8 P threads sit idle. 8 P-only beats 10 mixed in practice. Detect via sysctlbyname("hw.perflevel0.physicalcpu"). Phi-3.5 Q4_K_M: 6.2 → 6.4 tok/s (+3%, total session 3.2 → 6.4 = 2.0×). Llama 3.2 3B Q8_0: 18.6 → 19.3 tok/s (+4%). 11/11 STRICT+COHERENT+Metal-ON pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent aadd059 commit d0df1f5

1 file changed

Lines changed: 17 additions & 3 deletions

File tree

tools/quant.c

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@
3737
#include <time.h>
3838
#include <math.h>
3939
#include <unistd.h> /* sysconf for default thread count */
40+
#if defined(__APPLE__)
41+
#include <sys/sysctl.h> /* sysctlbyname for hw.perflevel0.physicalcpu */
42+
#endif
4043

4144
/* MSVC: clock_gettime compatibility */
4245
#ifdef _WIN32
@@ -195,9 +198,20 @@ int main(int argc, char** argv) {
195198
float temperature = 0.7f;
196199
float top_p = 0.9f;
197200
tq_type kv_type = TQ_TYPE_TURBO_KV_4B;
198-
/* Default: all available cores. M1 Pro has 6P+2E=8; tests show
199-
* 8 threads gives ~65% more throughput than the prior fixed-4 default. */
200-
int n_threads = (int)sysconf(_SC_NPROCESSORS_ONLN);
201+
/* Default: P-core count on macOS, total core count elsewhere.
202+
* On Apple Silicon, mixing P+E cores at the same priority makes
203+
* the slow E threads become stragglers — total throughput drops.
204+
* Tests on M1 Pro: 8P-only (8 threads) ≈ 8P+2E (10 threads). */
205+
int n_threads;
206+
#if defined(__APPLE__)
207+
{
208+
size_t sz = sizeof(int);
209+
if (sysctlbyname("hw.perflevel0.physicalcpu", &n_threads, &sz, NULL, 0) != 0)
210+
n_threads = (int)sysconf(_SC_NPROCESSORS_ONLN);
211+
}
212+
#else
213+
n_threads = (int)sysconf(_SC_NPROCESSORS_ONLN);
214+
#endif
201215
if (n_threads < 1) n_threads = 4;
202216
if (n_threads > 16) n_threads = 16; /* matches TQ_TP_MAX */
203217
int quant_mode = 0; /* 0 = none (default), 2 = Q2, 4 = Q4, 8 = Q8 */

0 commit comments

Comments
 (0)