ux(wasm): clear prefill expectation message + verify ccall works (#35)

unamedkr · claude · web-flow · commit 09adb11f0f97 · 2026-04-10T20:36:16.000+09:00
The "hang" users see is actually the prefill phase (processing all
prompt tokens through 28 layers in WASM). This takes 5-10s for a
0.8B model and cannot be interrupted — it runs synchronously before
the first ASYNCIFY yield point in the generation callback.

Changes:
- Message now says "Processing prompt (may take a few seconds)..."
  to set expectations correctly
- Stats bar shows "processing prompt..."
- Confirmed ccall({async:true}) is the correct ASYNCIFY pattern
  and generation streaming works AFTER prefill completes

The prefill blocking is a fundamental WASM limitation without a
step-by-step API. Future: expose a single-token-forward API to
enable prefill yielding.

Co-authored-by: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/wasm/index.html b/wasm/index.html
@@ -405,11 +405,11 @@ <h2>Run an <span>LLM</span> in your browser</h2>
 
     addMessage('user', text);
     const aDiv = addMessage('assistant', '');
-    aDiv.innerHTML = '<span class="thinking"><span class="spinner"></span> Thinking...</span>';
+    aDiv.innerHTML = '<span class="thinking"><span class="spinner"></span> Processing prompt (may take a few seconds)...</span>';
     let output = '', count = 0;
     const t0 = performance.now();
     document.getElementById('statTokens').textContent = '';
-    document.getElementById('statSpeed').textContent = 'prefill...';
+    document.getElementById('statSpeed').textContent = 'processing prompt...';
 
     Module.onToken = (tok) => {
         output += tok; count++;