Commit 72e815b
fix: Phi-3 Q8_0 default + unified server in CLI + CMake (#80)
## Phi-3.5 registry → Q8_0 (2x faster)
Q8_0 is 2x faster than Q4_K_M on Apple Silicon NEON (3.0 vs 1.5 tok/s
measured on M3). Q4_K_M's complex super-block dequant dominates compute
at batch-1, while Q8_0's simple int8 dequant is NEON-friendly. Both
produce identical quality output.
- Registry: `Phi-3.5-mini-instruct-Q4_K_M.gguf` (2.2 GB)
→ `Phi-3.5-mini-instruct-Q8_0.gguf` (3.8 GB)
- Module docstring size updated (2.4 GB → 3.8 GB)
## CLI `serve` → prefers `quant-server-unified`
`quantcpp serve` now searches for `quant-server-unified` first, then
falls back to the legacy `quant-server`. The unified server builds
directly on quant.h (single-header amalgamation), which fixes #77
(SmolLM2-1.7B regression from libturboquant divergence).
Search order: PATH → ./build/ → ./build_metal/ → ./build_cpu/
## CMake `quant-server-unified` target
Added `quant-server-unified` build target under `TQ_BUILD_SERVER=ON`.
Compiles `tools/quant_server_unified.c` directly against quant.h.
## Verified
- ctest → 35/35 passed
- `quant-server-unified` builds (360 KB binary)
- Python registry confirms Q8_0 filename
- CLI `quantcpp serve` prefers unified binary
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent f969ee5 commit 72e815b
3 files changed
Lines changed: 48 additions & 20 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
309 | 309 | | |
310 | 310 | | |
311 | 311 | | |
| 312 | + | |
312 | 313 | | |
313 | 314 | | |
314 | 315 | | |
| |||
323 | 324 | | |
324 | 325 | | |
325 | 326 | | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
326 | 340 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
16 | 15 | | |
17 | 16 | | |
18 | 17 | | |
| |||
72 | 71 | | |
73 | 72 | | |
74 | 73 | | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
81 | 80 | | |
82 | 81 | | |
83 | | - | |
84 | | - | |
| 82 | + | |
| 83 | + | |
85 | 84 | | |
86 | 85 | | |
87 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
228 | | - | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
229 | 234 | | |
230 | 235 | | |
231 | 236 | | |
| |||
235 | 240 | | |
236 | 241 | | |
237 | 242 | | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
242 | 252 | | |
243 | 253 | | |
244 | 254 | | |
| 255 | + | |
| 256 | + | |
245 | 257 | | |
246 | 258 | | |
247 | 259 | | |
248 | | - | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
249 | 265 | | |
250 | | - | |
251 | 266 | | |
252 | 267 | | |
253 | 268 | | |
| |||
0 commit comments