Commit 5bf3e8e
docs: honest speed comparison with llama.cpp
Add a benchmark table to 'How It Compares' section that states
measured speeds vs llama.cpp on 4 representative models, and
honestly names where we match (Q8_0 on 3B+) and where we lag
(Q4_K_M, tiny models).
Rationale: the README previously only compared KV compression
quality, implying we were competitive on inference speed too.
In reality:
- We match llama.cpp on Llama 3.2 3B Q8_0 (105%)
- We're at 52% on Phi-3.5 Q8_0
- We're at 18% on Phi-3.5 Q4_K_M (Q3_K still scalar)
- We're at 13% on Llama 3.2 1B (llama.cpp is extremely optimized
for tiny models)
Also corrects '16K LOC' to accurate '17.6K LOC'.
Being honest about limitations strengthens the "read it end-to-end"
value proposition rather than claiming parity on every axis.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent db0af26 commit 5bf3e8e
1 file changed
Lines changed: 19 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
401 | 401 | | |
402 | 402 | | |
403 | 403 | | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
404 | 421 | | |
405 | 422 | | |
406 | 423 | | |
| |||
0 commit comments