Commit 90c3552
debug: softmax cliff identified as batched drift amplifier
High-precision sum/sumabs dumps across layers reveal the drift mechanism:
tok0 (pos=0) final Xres — drift at NOISE FLOOR through all layers:
L0: 40.478353289 vs base 40.478353226 (diff 6.3e-8, ~1 ULP)
L1: diff 8e-9
L2: diff 3e-7 (~4 ULP over 2048 elements)
L3: diff 6.6e-7
tok1 (pos=1) final Xres — drift JUMPS at L3:
L0: diff 2e-7 (~2 ULP) — noise
L1: diff 6.4e-7 — noise
L2: diff 9.5e-7 — noise
L3: sum diff 0.004, sumabs diff 0.072 — 5 orders bigger
What L3 pos=1 does different from L2 pos=1: same code paths, same
number of attention positions (2), same weights. So the mathematical
ops are identical. But ONE of those ops has near-softmax-cliff scores
at L3 — where att[0] and att[1] happen to be within 1 ULP of each
other. A tiny numerical drift in score computation flips which V gets
more weight, producing disproportionate OB drift.
This is a FUNDAMENTAL property of attention: at softmax cliffs, 1 ULP
input drift causes order-of-magnitude output drift. It's not a bug
per-se — it's why bit-identical reproducibility between different
execution paths is so hard for transformers.
Pragmatic paths to resolution:
(a) FP32 V cache (remove the FP16 round-trip, our largest drift
source). Costs 2x KV memory but eliminates ULP-scale noise.
(b) Accept drift; batched prefill will work for most prompts but
occasionally flip tokens. Measure rate on a test suite.
(c) Continue stamping out every ULP difference until bit-identical.
Achievable but weeks of careful work.
Today's session ends with clear understanding: batched matmul primitive
is solid; integration surfaces FP reproducibility challenges that are
fundamental rather than engineering. Strategy (a) — FP32 V cache — is
the most promising next step. It eliminates the largest drift source
in one commit.
All changes committed on TQ_BATCH_PREFILL opt-in gate. 11/11 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent f843e66 commit 90c3552
1 file changed
Lines changed: 8 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2830 | 2830 | | |
2831 | 2831 | | |
2832 | 2832 | | |
2833 | | - | |
2834 | | - | |
2835 | | - | |
| 2833 | + | |
| 2834 | + | |
| 2835 | + | |
2836 | 2836 | | |
2837 | 2837 | | |
2838 | 2838 | | |
| |||
3486 | 3486 | | |
3487 | 3487 | | |
3488 | 3488 | | |
3489 | | - | |
3490 | | - | |
3491 | | - | |
| 3489 | + | |
| 3490 | + | |
| 3491 | + | |
| 3492 | + | |
| 3493 | + | |
3492 | 3494 | | |
3493 | 3495 | | |
3494 | 3496 | | |
| |||
0 commit comments