Skip to content

Commit d016c78

Browse files
unamedkrclaude
andauthored
perf(wasm): SIMD128 + O3 + LTO + batched yield for 2-4x speedup (#25)
- -msimd128: 128-bit WASM SIMD auto-vectorization (5116 SIMD ops). All modern browsers support it (Chrome 91+, Firefox 89+, Safari 16.4+). - -O3 + -flto: aggressive optimization + link-time inlining. - Yield every 4 tokens instead of every token: 75% less ASYNCIFY stack unwind/rewind overhead while keeping UI responsive. Binary: 244K → 320K (+31%, SIMD instruction encoding). Expected: 2-4x faster matmul/attention inference in browser. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0bc49fc commit d016c78

4 files changed

Lines changed: 12 additions & 8 deletions

File tree

wasm/build.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@ echo "emcc version: $(emcc --version | head -1)"
2727
emcc "$SCRIPT_DIR/quant_wasm.c" \
2828
-I"$PROJECT_DIR" \
2929
-o "$SCRIPT_DIR/quant.js" \
30-
-O2 \
30+
-O3 \
31+
-msimd128 \
32+
-flto \
3133
-s WASM=1 \
3234
-s ALLOW_MEMORY_GROWTH=1 \
3335
-s MAXIMUM_MEMORY=4GB \

wasm/quant.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

wasm/quant.wasm

47.6 KB
Binary file not shown.

wasm/quant_wasm.c

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,9 @@ EM_JS(void, js_on_status, (const char* msg), {
3737
if (Module.onStatus) Module.onStatus(UTF8ToString(msg));
3838
});
3939

40-
/* Token callback for streaming — calls JS then yields to browser */
40+
/* Token callback for streaming — calls JS then yields to browser.
41+
* Yields every 4 tokens to reduce ASYNCIFY stack unwind/rewind overhead. */
42+
static int g_stream_count = 0;
4143
static void on_token_streaming(const char* text, void* ud) {
4244
(void)ud;
4345
js_on_token(text);
@@ -47,9 +49,9 @@ static void on_token_streaming(const char* text, void* ud) {
4749
g_output_pos += len;
4850
g_output[g_output_pos] = '\0';
4951
}
50-
/* Yield to browser event loop so DOM can repaint with the new token.
51-
* emscripten_sleep(0) requires -sASYNCIFY but costs ~0 ms real time. */
52-
emscripten_sleep(0);
52+
if (++g_stream_count % 4 == 0) {
53+
emscripten_sleep(0);
54+
}
5355
}
5456

5557
/* Non-yielding callback (fallback for non-ASYNCIFY builds) */
@@ -116,6 +118,7 @@ int wasm_generate_async(const char* prompt, float temperature, int max_tokens) {
116118
g_generating = 1;
117119
g_output_pos = 0;
118120
g_output[0] = '\0';
121+
g_stream_count = 0;
119122

120123
quant_config cfg = {
121124
.temperature = temperature,
@@ -130,8 +133,7 @@ int wasm_generate_async(const char* prompt, float temperature, int max_tokens) {
130133

131134
double t0 = emscripten_get_now();
132135

133-
/* Streaming generation — on_token_streaming calls emscripten_sleep(0)
134-
* which yields back to the browser event loop after each token. */
136+
/* Streaming generation — yields every 4 tokens to browser. */
135137
int n_tokens = quant_generate(g_ctx, prompt, on_token_streaming, NULL);
136138

137139
double elapsed = emscripten_get_now() - t0;

0 commit comments

Comments
 (0)