Commit 103e50f
bench: prefill script + docs updated with batched speedup numbers
- scripts/test_prefill.sh now runs baseline AND -k fp32 batched,
making the regression guard catch any future batched degradation.
- bench/results doc includes measured Llama 1B 6.1×, 3B 2.4× prefill
speedup with batched path, and remaining 5× gap vs llama.cpp
attributed mainly to dequant-to-FP32 in the batched code path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent bc8614d commit 103e50f
2 files changed
Lines changed: 39 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
56 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
57 | 80 | | |
58 | 81 | | |
59 | 82 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
39 | 41 | | |
40 | | - | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
49 | | - | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | | - | |
53 | 54 | | |
54 | 55 | | |
55 | | - | |
56 | | - | |
| 56 | + | |
| 57 | + | |
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| |||
72 | 73 | | |
73 | 74 | | |
74 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
0 commit comments