[AMD] Update MiniMax-M3 FP4 MI355X ATOM image and serving args (0630) by seungrokj · Pull Request #1967 · SemiAnalysisAI/InferenceX

seungrokj · 2026-07-01T02:08:59Z

Summary

Bump image rocm/atom-dev:MiniMax-M3-20260623 → MiniMax-M3-20260630 for both minimaxm3-fp4-mi355x-atom and minimaxm3-fp4-mi355x-atom-mtp
Add OPT_ARGS to both scripts: --online_quant_config (ptpc_fp8, excluding non-MoE layers) and --hf-overrides (use_index_cache: true, index_topk_freq: 4)
Replace AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0 + ATOM_M3_SPARSE_USE_ASM_PA=1 with ATOM_FORCE_ATTN_TRITON=1

As a PR reviewer and CODEOWNER, I have reviewed this and have:

Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
If an company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
Verified that the single-node recipes are similar to the official vLLM recipes and/or theSGLang cookbook:
- If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

🤖 Generated with Claude Code

- Bump image rocm/atom-dev:MiniMax-M3-20260623 → MiniMax-M3-20260630 - Add OPT_ARGS: --online_quant_config (ptpc_fp8, exclude non-MoE layers) and --hf-overrides (use_index_cache, index_topk_freq=4) - Replace AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0 + ATOM_M3_SPARSE_USE_ASM_PA=1 with ATOM_FORCE_ATTN_TRITON=1 - Apply to both STP (atom.sh) and MTP/EAGLE3 (atom_mtp.sh) scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-07-01T02:09:07Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-07-01T02:17:36Z

+    - "Bump image to rocm/atom-dev:MiniMax-M3-20260630 for both fp4 atom entries"
+    - "Add OPT_ARGS: pass --online_quant_config '{\"global_quant_config\": \"ptpc_fp8\", \"exclude_layer\": [\"lm_head\", \"model.embed_tokens\", \"vision_tower\", \"multi_modal_projector\", \"patch_merge_mlp\", \"*block_sparse_moe\"]}' and --hf-overrides '{\"use_index_cache\": true, \"index_topk_freq\": 4}' to both scripts"
+    - "Replace AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0 and ATOM_M3_SPARSE_USE_ASM_PA=1 with ATOM_FORCE_ATTN_TRITON=1"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1967


🔴 The new perf-changelog entry appended for this PR sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER — a literal, un-substituted token. The changelog validator (utils/validate_perf_changelog.py) only accepts the canonical /pull/<digits> URL or the sentinel XXX / .../pull/XXX; PLACEHOLDER is neither, so utils/prepare_perf_changelog_merge.py will raise appended entry N has unexpected pr-link and block merge. Replace it with 1967 (or XXX to hit the auto-canonicalization path).

Extended reasoning...

What the bug is. The last entry appended to perf-changelog.yaml at line 4362 carries pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER. Every other entry in the file uses either a canonical /pull/<digits> URL (e.g. the immediately preceding /pull/1966) or the accepted sentinel XXX. The literal string PLACEHOLDER was never substituted with the real PR number (1967).

Why the existing tooling won't accept it. utils/validate_perf_changelog.py defines the accepted placeholder set at lines 24-27:

PR_LINK_PLACEHOLDERS = { "XXX", "https://github.com/SemiAnalysisAI/InferenceX/pull/XXX", }

validate_added_pr_link() (lines 144-160) rejects any newly-appended pr-link that is neither in PR_LINK_PLACEHOLDERS nor matches the canonical CANONICAL_PR_LINK regex (which requires \d+, i.e. digits only). PLACEHOLDER matches neither: it is not the literal string XXX and PLACEHOLDER is not digits.

What triggers the failure. utils/merge_with_reuse.sh invokes utils/prepare_perf_changelog_merge.py at lines 181-185 unconditionally in the standard merge flow. That script's canonicalize_appended_links() (line 112) walks every appended entry and requires each pr-link to be either the canonical URL or in PR_LINK_PLACEHOLDERS — otherwise it raises ChangelogValidationError('appended entry N has unexpected pr-link ...'). The PR-branch validation gate exercises the same code path.

Step-by-step proof.

CI runs utils/validate_perf_changelog.py on the diff → validate_added_pr_link() sees https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER.

It first tests membership in PR_LINK_PLACEHOLDERS — no match (the sentinel is XXX, not PLACEHOLDER).

It then tests CANONICAL_PR_LINK.fullmatch() — no match (the regex demands one or more digits after /pull/, and PLACEHOLDER has none).

Validation raises ChangelogValidationError, the check fails, merge is blocked. The same failure recurs when prepare_perf_changelog_merge.py runs during the merge step.

Impact. This is a hard merge-blocker: neither the PR-branch validation gate nor the merge-time canonicalize step will accept the entry. Beyond the CI failure, the perf-changelog is an append-only audit log linking each benchmark trigger back to its owning PR; if this were somehow to sneak through, the entry would be orphaned from its PR context forever (per AGENTS.md, editing in place is discouraged).

Fix. Replace pull/PLACEHOLDER with pull/1967 (the canonical URL for this PR). Alternatively, use pull/XXX — the merge-prep script will auto-canonicalize XXX to the real PR number at merge time. The .claude/commands/nuke.md recipe uses a bare sentinel token PRLINK_PLACEHOLDER that its script substitutes later; here the substitution never happened, and the sentinel used doesn't match what the validator accepts anyway.

github-actions · 2026-07-01T10:39:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28505258231
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28505258231

seungrokj · 2026-07-01T10:58:43Z

/reuse-sweep-run

chunfangamd

[ x] Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
[x ] Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
[x ] Verified that this PR has passed PR validation: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28505258231
[x ] Verified that this PR passes evals: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28505258231
[ x] Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
[x ] If an company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.

Signed: @chunfangamd

Oseltamivir

lgtm

Oseltamivir

lgtm

seungrokj requested a review from a team July 1, 2026 02:09

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners July 1, 2026 02:09

github-project-automation Bot added this to InferenceMAX Board Jul 1, 2026

perf-changelog: update PR link for #1967

064cba8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude Bot reviewed Jul 1, 2026

View reviewed changes

seungrokj added AMD full-sweep-enabled labels Jul 1, 2026

chunfangamd approved these changes Jul 1, 2026

View reviewed changes

Oseltamivir added 2 commits July 1, 2026 07:00

Merge branch 'main' into amd/m3_atom_pd_fp4_0630

06c19b1

cleanup perfchangelog

009b63d

Oseltamivir approved these changes Jul 1, 2026

View reviewed changes

Oseltamivir merged commit 54f9e13 into main Jul 1, 2026
29 checks passed

Oseltamivir deleted the amd/m3_atom_pd_fp4_0630 branch July 1, 2026 14:17

github-project-automation Bot moved this to Done in InferenceMAX Board Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Update MiniMax-M3 FP4 MI355X ATOM image and serving args (0630)#1967

[AMD] Update MiniMax-M3 FP4 MI355X ATOM image and serving args (0630)#1967
Oseltamivir merged 4 commits into
mainfrom
amd/m3_atom_pd_fp4_0630

seungrokj commented Jul 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

claude Bot Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

seungrokj commented Jul 1, 2026

Uh oh!

chunfangamd left a comment

Uh oh!

Oseltamivir left a comment

Uh oh!

Oseltamivir left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

seungrokj commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

As a PR reviewer and CODEOWNER, I have reviewed this and have:

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

claude Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

seungrokj commented Jul 1, 2026

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seungrokj commented Jul 1, 2026 •

edited

Loading