You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Delta KV: NaN on Qwen (DeltaNet hybrid) — Llama verified, Qwen needs fix
Delta compression verified on SmolLM2 (Llama arch, pure attention).
Qwen3.5 (DeltaNet hybrid, only 6/24 attn layers) produces NaN —
delta accumulation interacts with DeltaNet's non-attention layers.
Needs: delta mode should only apply to attention layers, not DeltaNet.
Llama-family models work correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments