Commit 41a8441
docs: update pivot plan with P0 profiling results
P0 bottleneck identified and fixed:
- Root cause: tokenizer re-parsed from GGUF (32K tokens) + KV state
double-allocated on every HTTP request
- Fix: context reuse across requests (commit 6e39e64)
- Result: 2.0 → 4.5 tok/s (2.3x on warm requests)
- Remaining: tq_generate internal state recreation (separate PR)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 6e39e64 commit 41a8441
1 file changed
Lines changed: 6 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
0 commit comments