Commit b503c2e
Gemma quality progress: BOS fix + attention softcap + diagnostics
Multiple fixes for Gemma 3/4 output quality:
1. BOS token: all Gemma models now get BOS=2 prepended (was only Gemma 3).
With BOS, Gemma 4 produces semantically relevant tokens ("Maison" for
a France prompt) instead of pure random — model is partially working.
2. Attention logit softcap: added cap*tanh(score/cap) before softmax.
Gemma 2/3/4 use attn_logit_softcap=50.0. Without this, attention
scores grow unboundedly through QK dot products.
3. Attention scaling: Gemma 4 with QK-norm now uses 1/sqrt(head_dim)
instead of 1.0.
4. TQ_NO_PLE debug flag: env var to disable PLE for diagnostics.
REMAINING ISSUE: Gemma 4 logits still too large (100+ vs normal 20-30).
With final_logit_softcap=30, all high logits compress to ~30, destroying
ranking. With softcap disabled, output shows relevant tokens but falls
into repetition. Root cause: hidden state grows to norm ~13 at layer 34.
Investigation continues on learned RoPE frequencies and FFN scaling.
SmolLM2 + Qwen3.5 unaffected — 34/34 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent d3e7a44 commit b503c2e
3 files changed
Lines changed: 42 additions & 4 deletions
File tree
- src/engine
- tq_run.dSYM/Contents/Resources/Relocations/aarch64
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
209 | 209 | | |
210 | 210 | | |
211 | 211 | | |
212 | | - | |
| 212 | + | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
216 | | - | |
| 215 | + | |
| 216 | + | |
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
| |||
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
230 | 238 | | |
231 | 239 | | |
232 | 240 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1032 | 1032 | | |
1033 | 1033 | | |
1034 | 1034 | | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
1035 | 1065 | | |
1036 | 1066 | | |
1037 | 1067 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
0 commit comments