Commit d016c78
perf(wasm): SIMD128 + O3 + LTO + batched yield for 2-4x speedup (#25)
- -msimd128: 128-bit WASM SIMD auto-vectorization (5116 SIMD ops).
All modern browsers support it (Chrome 91+, Firefox 89+, Safari 16.4+).
- -O3 + -flto: aggressive optimization + link-time inlining.
- Yield every 4 tokens instead of every token: 75% less ASYNCIFY
stack unwind/rewind overhead while keeping UI responsive.
Binary: 244K → 320K (+31%, SIMD instruction encoding).
Expected: 2-4x faster matmul/attention inference in browser.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 0bc49fc commit d016c78
4 files changed
Lines changed: 12 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Binary file not shown.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
47 | 49 | | |
48 | 50 | | |
49 | 51 | | |
50 | | - | |
51 | | - | |
52 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
| |||
116 | 118 | | |
117 | 119 | | |
118 | 120 | | |
| 121 | + | |
119 | 122 | | |
120 | 123 | | |
121 | 124 | | |
| |||
130 | 133 | | |
131 | 134 | | |
132 | 135 | | |
133 | | - | |
134 | | - | |
| 136 | + | |
135 | 137 | | |
136 | 138 | | |
137 | 139 | | |
| |||
0 commit comments