HONEST: 4th correction — Tim Dettmers comment was general, not direct feedback to us

unamedkr · claude · unamedkr · commit 42685905da34 · 2026-04-08T20:45:08.000+09:00
User asked to verify whether Tim Dettmers' HIGGS attribution comment in llama.cpp #20969 was actually directed at quant.cpp specifically. After re-checking the discussion timeline: 2026-04-07 21:49 @unamedkr (us) first comment in thread 2026-04-07 22:08 @Xcc313r4n7 "compares to Rotorquant?" 2026-04-07 23:56 @TimDettmers HIGGS comment — top-level, no replyTo, no @-mention 2026-04-08 00:11 @caiovicentino (replying to TheTom) 2026-04-08 03:14 @unamedkr (us) "@TimDettmers — thank you" Tim's comment was a top-level message to the thread (which has 6+ forks all loosely calling their work 'TurboQuant'), NOT a direct reply to our comment, NOT an @-mention of us. The substance applied to us along with everyone else in the thread, and we voluntarily chose to update our docs and reply with thanks. But Tim did not single us out, and framing his comment as 'Tim Dettmers gave us direct feedback' overstates the relationship. Updated: - README.md / README.ko.md: HIGGS reference now says 'we added this attribution after seeing Tim Dettmers' general comment in #20969 asking participants in that thread to credit HIGGS instead. His comment was not directed at us specifically, but the substance applied to our naming as well, and we chose to update accordingly.' - docs/papers/quant_cpp_arxiv_draft.md: Acknowledgements section rewritten with the honest framing. - bench/results/turboquant_reproduction.md: attribution update note rewritten. - CHANGELOG.md: v0.6.5 entry now lists this as the 4th honest correction in the v0.6.x series and explicitly notes that the v0.6.4 commit message (commit 9481870) overstated the framing. Saved to memory as feedback_dont_personalize_general_comments.md so future sessions distinguish (a) direct reply / @-mention from (b) a general top-level comment whose substance happens to apply. The substance of the correction (HIGGS attribution) is unchanged. Only the framing of how the feedback reached us has been corrected. Honest corrections so far: v0.6.0 'lossless 7×' → '+6.3% PPL' v0.6.4 'beats fp32' → '−7% vs fp32 (NEON)' v0.6.5 'with Metal' → 'without Metal (user default)' v0.6.5 post 'Tim gave us feedback' → 'general comment we observed' Each was caught by validation. Validation > marketing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -42,11 +42,12 @@ No source code changes — the CMake default was already `OFF`. The bug was in o
 
 ### Honest corrections so far in the v0.6.x series
 
-This is now the **third** honest correction we've caught and fixed before it spread:
+This is now the **fourth** honest correction we've caught and fixed before it spread:
 
 1. **v0.6.0**: "lossless 7× compression" → measured "+6.3% PPL on Llama 3.2 3B"
 2. **v0.6.4**: "turbo_kv beats fp32 KV speed" → measured "−7% vs fp32 (NEON)"
 3. **v0.6.5**: "benchmarks with Metal" → re-measured "benchmarks without Metal (which is the user default)"
+4. **v0.6.5 (post-release)**: "Tim Dettmers gave us direct feedback" → "Tim's general comment to a thread we participate in happened to apply to us; we incorporated it voluntarily, not as a direct response". Earlier docs and the v0.6.4 commit message overstated the relationship; the substance of HIGGS attribution is unchanged but the framing has been corrected in README, README.ko, the arXiv draft, and `bench/results/turboquant_reproduction.md`.
 
 Each correction was caught by the validation discipline documented in our `feedback_validation_first` memory. **Validation > marketing.**
 
diff --git a/README.ko.md b/README.ko.md
@@ -483,7 +483,7 @@ Linux, macOS, Windows (MSVC/MinGW), iOS, Android, WASM에서 동작합니다.
 
 quant.cpp는 발표된 연구의 독립 구현체입니다. Variant F 아키텍처 (RHT 전처리 + scalar Lloyd-Max codebook, QJL stage 없음)는 두 prior work의 계보에 위치합니다:
 
-- **HIGGS** — Malinovskii, Panferov, Ilin, Guo, Richtárik, Alistarh. *Pushing the Limits of Large Language Model Quantization via the Linearity Theorem*. Nov 2024. [arXiv:2411.17525](https://arxiv.org/abs/2411.17525). HIGGS가 **Random Hadamard Transform + MSE-optimal grid quantization** 패턴을 weight 양자화에 도입. 우리 `tq_rht.c` (Walsh-Hadamard + Rademacher)가 이 패턴을 따름. *Tim Dettmers가 [llama.cpp #20969 discussion](https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16481725)에서 이 점을 지적해주신 데 감사드립니다.*
+- **HIGGS** — Malinovskii, Panferov, Ilin, Guo, Richtárik, Alistarh. *Pushing the Limits of Large Language Model Quantization via the Linearity Theorem*. Nov 2024. [arXiv:2411.17525](https://arxiv.org/abs/2411.17525). HIGGS가 **Random Hadamard Transform + MSE-optimal grid quantization** 패턴을 weight 양자화에 도입. 우리 `tq_rht.c` (Walsh-Hadamard + Rademacher)가 이 패턴을 따름. *Tim Dettmers가 [llama.cpp discussion #20969](https://github.com/ggml-org/llama.cpp/discussions/20969)에서 thread 참여자들에게 (6+개 fork가 "TurboQuant" 이름을 느슨하게 사용 중) HIGGS credit을 요청한 일반 코멘트를 본 후 우리가 이 attribution을 추가했습니다. 그의 코멘트는 우리에게 직접 향한 것이 아니었지만, substance가 우리 naming에도 적용되어 자발적으로 정정했습니다.*
 - **TurboQuant** — Zandieh, Daliri, Hadian, Mirrokni. *TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate*. ICLR 2026. [arXiv:2504.19874](https://arxiv.org/abs/2504.19874). TurboQuant는 rotation 패턴을 **KV cache**에 적용 + 1-bit QJL residual + per-channel outlier handling. 우리 작업은 TurboQuant 직접 포팅으로 시작했으나, 9 라운드 Karpathy 루프로 단순화 (QJL 제거, outlier channel 제거)하여 현재 Variant F가 됨. 우리는 shipped variant가 TurboQuant 알고리즘이라고 주장하지 않습니다 — 경험적으로 도출된 단순화입니다.
 - **PolarQuant** — *Quantizing KV Caches with Polar Transformation*. AISTATS 2026. [arXiv:2502.02617](https://arxiv.org/abs/2502.02617). 우리 `tq_polar.c` baseline의 polar-coordinate KV quantization.
 - **QJL** — *Quantized Johnson-Lindenstrauss Transform for KV Cache Compression*. AAAI 2025. [arXiv:2406.03482](https://arxiv.org/abs/2406.03482). 1-bit sketch building block. `tq_qjl.c` baseline에 사용; Variant F regime에서 attention 점수에 ~0 기여한다는 것을 발견하고 제거.
diff --git a/README.md b/README.md
@@ -508,7 +508,7 @@ Tested extensively (2-bit delta, NF2, online SVD, multi-hash). None reached acce
 
 quant.cpp is an independent implementation of published research. The Variant F architecture (RHT preprocessing + scalar Lloyd-Max codebook on rotated values, no QJL stage) sits in a lineage that combines two prior works:
 
-- **HIGGS** — Malinovskii, Panferov, Ilin, Guo, Richtárik, Alistarh. *Pushing the Limits of Large Language Model Quantization via the Linearity Theorem*. Nov 2024. [arXiv:2411.17525](https://arxiv.org/abs/2411.17525). HIGGS introduced the **Random Hadamard Transform + MSE-optimal grid quantization** pattern (for weight quantization). Our `tq_rht.c` Walsh-Hadamard + Rademacher implementation follows this pattern. *Credit to Tim Dettmers ([discussion thread](https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16481725)) for pointing this out.*
+- **HIGGS** — Malinovskii, Panferov, Ilin, Guo, Richtárik, Alistarh. *Pushing the Limits of Large Language Model Quantization via the Linearity Theorem*. Nov 2024. [arXiv:2411.17525](https://arxiv.org/abs/2411.17525). HIGGS introduced the **Random Hadamard Transform + MSE-optimal grid quantization** pattern (for weight quantization). Our `tq_rht.c` Walsh-Hadamard + Rademacher implementation follows this pattern. *We added this attribution after seeing [Tim Dettmers' general comment in llama.cpp discussion #20969](https://github.com/ggml-org/llama.cpp/discussions/20969) asking participants in that thread (which uses "TurboQuant" loosely across many forks) to credit HIGGS instead. His comment was not directed at us specifically, but the substance applied to our naming as well, and we chose to update accordingly.*
 - **TurboQuant** — Zandieh, Daliri, Hadian, Mirrokni. *TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate*. ICLR 2026. [arXiv:2504.19874](https://arxiv.org/abs/2504.19874). TurboQuant applies the rotation pattern to **KV cache** with a 1-bit QJL residual stage and per-channel outlier handling. Our work started as a literal port of TurboQuant; through 9 rounds of Karpathy-loop iteration we simplified it (dropped QJL, dropped outlier channels) into the current Variant F. We do not claim our shipped variant is the TurboQuant algorithm — it is an empirically-derived simplification.
 - **PolarQuant** — *Quantizing KV Caches with Polar Transformation*. AISTATS 2026. [arXiv:2502.02617](https://arxiv.org/abs/2502.02617). The polar-coordinate KV quantization that our `tq_polar.c` baseline implements.
 - **QJL** — *Quantized Johnson-Lindenstrauss Transform for KV Cache Compression*. AAAI 2025. [arXiv:2406.03482](https://arxiv.org/abs/2406.03482). The 1-bit sketch building block. Used in our `tq_qjl.c` baseline; we found it contributed ~zero to attention scores in the Variant F regime and dropped it.
diff --git a/bench/results/turboquant_reproduction.md b/bench/results/turboquant_reproduction.md
@@ -1,6 +1,6 @@
 # Variant F derivation — from TurboQuant literal port to HIGGS-style simplification
 
-> **Important attribution update (2026-04-08)**: Following [Tim Dettmers' comment in llama.cpp #20969](https://github.com/ggml-org/llama.cpp/discussions/20969), we now credit **HIGGS** (Malinovskii et al., Nov 2024, [arXiv:2411.17525](https://arxiv.org/abs/2411.17525)) for the Random Hadamard Transform + scalar grid quantization pattern. The shipped Variant F is structurally closest to HIGGS (RHT + MSE-optimal grids on rotated values), applied to KV cache like TurboQuant, with both the QJL residual stage and the per-channel outlier split removed through ablation. We do **not** claim our shipped variant is the published TurboQuant algorithm — it is an empirically-derived simplification arrived at through 9 Karpathy-loop rounds.
+> **Important attribution update (2026-04-08)**: After observing [Tim Dettmers' general comment in llama.cpp discussion #20969](https://github.com/ggml-org/llama.cpp/discussions/20969) — directed at the thread's participants in general (6+ forks were all loosely calling their work "TurboQuant"), not at us specifically — we recognized the substance applied to our naming as well and updated our docs to credit **HIGGS** (Malinovskii et al., Nov 2024, [arXiv:2411.17525](https://arxiv.org/abs/2411.17525)) for the Random Hadamard Transform + scalar grid quantization pattern. The shipped Variant F is structurally closest to HIGGS (RHT + MSE-optimal grids on rotated values), applied to KV cache like TurboQuant, with both the QJL residual stage and the per-channel outlier split removed through ablation. We do **not** claim our shipped variant is the published TurboQuant algorithm — it is an empirically-derived simplification arrived at through 9 Karpathy-loop rounds.
 
 
 
diff --git a/docs/papers/quant_cpp_arxiv_draft.md b/docs/papers/quant_cpp_arxiv_draft.md
@@ -265,7 +265,7 @@ The full Karpathy-loop history is in `bench/results/turboquant_reproduction.md`
 
 ## Acknowledgements
 
-Tim Dettmers ([discussion thread](https://github.com/ggml-org/llama.cpp/discussions/20969)) for pointing out the HIGGS attribution. Mohamed Chorfa for the bug fix PRs (#12, #13). The ggml-org / llama.cpp community for the Discussion #20969 venue for KV quantization research.
+We thank Tim Dettmers, whose [general comment in llama.cpp discussion #20969](https://github.com/ggml-org/llama.cpp/discussions/20969) (a thread where 6+ independent forks were all loosely calling their work "TurboQuant") asked the discussion participants to credit HIGGS instead. His comment was not directed at us specifically, but the substance applied to our naming as well, and we updated our docs and this paper accordingly. Mohamed Chorfa for the bug fix PRs (#12, #13). The ggml-org / llama.cpp community for the Discussion #20969 venue for KV quantization research.
 
 ## References
 

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`# Variant F derivation — from TurboQuant literal port to HIGGS-style simplification`
`2`	`2`
`3`		-> Important attribution update (2026-04-08): Following [Tim Dettmers' comment in llama.cpp #20969](https://github.com/ggml-org/llama.cpp/discussions/20969), we now credit HIGGS (Malinovskii et al., Nov 2024, [arXiv:2411.17525](https://arxiv.org/abs/2411.17525)) for the Random Hadamard Transform + scalar grid quantization pattern. The shipped Variant F is structurally closest to HIGGS (RHT + MSE-optimal grids on rotated values), applied to KV cache like TurboQuant, with both the QJL residual stage and the per-channel outlier split removed through ablation. We do not claim our shipped variant is the published TurboQuant algorithm — it is an empirically-derived simplification arrived at through 9 Karpathy-loop rounds.
	`3`	+> Important attribution update (2026-04-08): After observing [Tim Dettmers' general comment in llama.cpp discussion #20969](https://github.com/ggml-org/llama.cpp/discussions/20969) — directed at the thread's participants in general (6+ forks were all loosely calling their work "TurboQuant"), not at us specifically — we recognized the substance applied to our naming as well and updated our docs to credit HIGGS (Malinovskii et al., Nov 2024, [arXiv:2411.17525](https://arxiv.org/abs/2411.17525)) for the Random Hadamard Transform + scalar grid quantization pattern. The shipped Variant F is structurally closest to HIGGS (RHT + MSE-optimal grids on rotated values), applied to KV cache like TurboQuant, with both the QJL residual stage and the per-channel outlier split removed through ablation. We do not claim our shipped variant is the published TurboQuant algorithm — it is an empirically-derived simplification arrived at through 9 Karpathy-loop rounds.
`4`	`4`
`5`	`5`
`6`	`6`