Skip to content

Commit a86a837

Browse files
unamedkrclaude
andcommitted
fix(gemma4): revert layer_output_scale to residual-separation formula
R13: "x *= los" destroys residual (los=0.0178 → embedding scaled to 0 after 35 layers). Reverted to the original formula: x = x_input + los * (x_current - x_input) which preserves the residual and only scales the layer's contribution. Added TQ_NO_LOS=1 env var for debugging without layer_output_scale. Still produces garbage — A/B test confirms the issue is in the forward pass itself (garbage with AND without layer_output_scale, just different patterns). Waiting for llama.cpp reference output to confirm if the GGUF file itself is valid. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 8a7825b commit a86a837

1 file changed

Lines changed: 11 additions & 6 deletions

File tree

quant.h

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15668,11 +15668,16 @@ float* tq_forward(tq_model_t* model, tq_state_t* s, int token, int pos) {
1566815668
tq_add(s->x, s->x, ple_proj_out, dim);
1566915669
}
1567015670

15671-
/* Gemma 4: layer_output_scale — simple multiplication of entire output.
15672-
* llama.cpp reference (gemma4-iswa.cpp): cur = ggml_mul(cur, out_scale)
15673-
* Previous implementation incorrectly separated residual contribution.
15674-
* The correct approach is a straight elementwise multiply. */
15675-
if (layer->layer_output_scale != 0.0f) {
15671+
/* Gemma 4: layer_output_scale scales layer CONTRIBUTION only.
15672+
* x_next = x_input + los * (x_current - x_input)
15673+
* This preserves the residual signal. With los=0.0178, only
15674+
* the layer's attn+ffn+PLE contribution is scaled down.
15675+
*
15676+
* CRITICAL: "x *= los" was WRONG — it destroys the residual
15677+
* (los=0.0178 multiplied onto the accumulated residual = catastrophic).
15678+
* The residual-separation formula is the correct implementation.
15679+
* TQ_NO_LOS=1 disables for debugging. */
15680+
if (layer->layer_output_scale != 0.0f && !getenv("TQ_NO_LOS")) {
1567615681
float los = layer->layer_output_scale;
1567715682
if (pos == 0 && getenv("TQ_DEBUG") && l < 3) {
1567815683
float maxv = 0, minv = 0;
@@ -15683,7 +15688,7 @@ float* tq_forward(tq_model_t* model, tq_state_t* s, int token, int pos) {
1568315688
fprintf(stderr, "[DEBUG] layer%d pre_scale min=%.3f max=%.3f (los=%.4f)\n", l, minv, maxv, los);
1568415689
}
1568515690
for (int i = 0; i < dim; i++) {
15686-
s->x[i] *= los;
15691+
s->x[i] = layer_residual_buf[i] + los * (s->x[i] - layer_residual_buf[i]);
1568715692
}
1568815693
}
1568915694

0 commit comments

Comments
 (0)