Skip to content

Commit 09adb11

Browse files
unamedkrclaude
andauthored
ux(wasm): clear prefill expectation message + verify ccall works (#35)
The "hang" users see is actually the prefill phase (processing all prompt tokens through 28 layers in WASM). This takes 5-10s for a 0.8B model and cannot be interrupted — it runs synchronously before the first ASYNCIFY yield point in the generation callback. Changes: - Message now says "Processing prompt (may take a few seconds)..." to set expectations correctly - Stats bar shows "processing prompt..." - Confirmed ccall({async:true}) is the correct ASYNCIFY pattern and generation streaming works AFTER prefill completes The prefill blocking is a fundamental WASM limitation without a step-by-step API. Future: expose a single-token-forward API to enable prefill yielding. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 60e44c6 commit 09adb11

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

wasm/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -405,11 +405,11 @@ <h2>Run an <span>LLM</span> in your browser</h2>
405405

406406
addMessage('user', text);
407407
const aDiv = addMessage('assistant', '');
408-
aDiv.innerHTML = '<span class="thinking"><span class="spinner"></span> Thinking...</span>';
408+
aDiv.innerHTML = '<span class="thinking"><span class="spinner"></span> Processing prompt (may take a few seconds)...</span>';
409409
let output = '', count = 0;
410410
const t0 = performance.now();
411411
document.getElementById('statTokens').textContent = '';
412-
document.getElementById('statSpeed').textContent = 'prefill...';
412+
document.getElementById('statSpeed').textContent = 'processing prompt...';
413413

414414
Module.onToken = (tok) => {
415415
output += tok; count++;

0 commit comments

Comments
 (0)