Skip to content

[AMD] Update MiniMax-M3 FP4 MI355X ATOM image and serving args (0630)#1967

Merged
Oseltamivir merged 4 commits into
mainfrom
amd/m3_atom_pd_fp4_0630
Jul 1, 2026
Merged

[AMD] Update MiniMax-M3 FP4 MI355X ATOM image and serving args (0630)#1967
Oseltamivir merged 4 commits into
mainfrom
amd/m3_atom_pd_fp4_0630

Conversation

@seungrokj

@seungrokj seungrokj commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Bump image rocm/atom-dev:MiniMax-M3-20260623MiniMax-M3-20260630 for both minimaxm3-fp4-mi355x-atom and minimaxm3-fp4-mi355x-atom-mtp
  • Add OPT_ARGS to both scripts: --online_quant_config (ptpc_fp8, excluding non-MoE layers) and --hf-overrides (use_index_cache: true, index_topk_freq: 4)
  • Replace AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0 + ATOM_M3_SPARSE_USE_ASM_PA=1 with ATOM_FORCE_ATTN_TRITON=1

As a PR reviewer and CODEOWNER, I have reviewed this and have:

  • Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
  • Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
  • Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
  • Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
  • Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
  • If an company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
  • Verified that the single-node recipes are similar to the official vLLM recipes and/or theSGLang cookbook:
    • If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
  • If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

🤖 Generated with Claude Code

- Bump image rocm/atom-dev:MiniMax-M3-20260623 → MiniMax-M3-20260630
- Add OPT_ARGS: --online_quant_config (ptpc_fp8, exclude non-MoE layers) and --hf-overrides (use_index_cache, index_topk_freq=4)
- Replace AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0 + ATOM_M3_SPARSE_USE_ASM_PA=1 with ATOM_FORCE_ATTN_TRITON=1
- Apply to both STP (atom.sh) and MTP/EAGLE3 (atom_mtp.sh) scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread perf-changelog.yaml
- "Bump image to rocm/atom-dev:MiniMax-M3-20260630 for both fp4 atom entries"
- "Add OPT_ARGS: pass --online_quant_config '{\"global_quant_config\": \"ptpc_fp8\", \"exclude_layer\": [\"lm_head\", \"model.embed_tokens\", \"vision_tower\", \"multi_modal_projector\", \"patch_merge_mlp\", \"*block_sparse_moe\"]}' and --hf-overrides '{\"use_index_cache\": true, \"index_topk_freq\": 4}' to both scripts"
- "Replace AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0 and ATOM_M3_SPARSE_USE_ASM_PA=1 with ATOM_FORCE_ATTN_TRITON=1"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1967

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new perf-changelog entry appended for this PR sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER — a literal, un-substituted token. The changelog validator (utils/validate_perf_changelog.py) only accepts the canonical /pull/<digits> URL or the sentinel XXX / .../pull/XXX; PLACEHOLDER is neither, so utils/prepare_perf_changelog_merge.py will raise appended entry N has unexpected pr-link and block merge. Replace it with 1967 (or XXX to hit the auto-canonicalization path).

Extended reasoning...

What the bug is. The last entry appended to perf-changelog.yaml at line 4362 carries pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER. Every other entry in the file uses either a canonical /pull/<digits> URL (e.g. the immediately preceding /pull/1966) or the accepted sentinel XXX. The literal string PLACEHOLDER was never substituted with the real PR number (1967).

Why the existing tooling won't accept it. utils/validate_perf_changelog.py defines the accepted placeholder set at lines 24-27:

PR_LINK_PLACEHOLDERS = {
    "XXX",
    "https://github.com/SemiAnalysisAI/InferenceX/pull/XXX",
}

validate_added_pr_link() (lines 144-160) rejects any newly-appended pr-link that is neither in PR_LINK_PLACEHOLDERS nor matches the canonical CANONICAL_PR_LINK regex (which requires \d+, i.e. digits only). PLACEHOLDER matches neither: it is not the literal string XXX and PLACEHOLDER is not digits.

What triggers the failure. utils/merge_with_reuse.sh invokes utils/prepare_perf_changelog_merge.py at lines 181-185 unconditionally in the standard merge flow. That script's canonicalize_appended_links() (line 112) walks every appended entry and requires each pr-link to be either the canonical URL or in PR_LINK_PLACEHOLDERS — otherwise it raises ChangelogValidationError('appended entry N has unexpected pr-link ...'). The PR-branch validation gate exercises the same code path.

Step-by-step proof.

  1. CI runs utils/validate_perf_changelog.py on the diff → validate_added_pr_link() sees https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER.
  2. It first tests membership in PR_LINK_PLACEHOLDERS — no match (the sentinel is XXX, not PLACEHOLDER).
  3. It then tests CANONICAL_PR_LINK.fullmatch() — no match (the regex demands one or more digits after /pull/, and PLACEHOLDER has none).
  4. Validation raises ChangelogValidationError, the check fails, merge is blocked. The same failure recurs when prepare_perf_changelog_merge.py runs during the merge step.

Impact. This is a hard merge-blocker: neither the PR-branch validation gate nor the merge-time canonicalize step will accept the entry. Beyond the CI failure, the perf-changelog is an append-only audit log linking each benchmark trigger back to its owning PR; if this were somehow to sneak through, the entry would be orphaned from its PR context forever (per AGENTS.md, editing in place is discouraged).

Fix. Replace pull/PLACEHOLDER with pull/1967 (the canonical URL for this PR). Alternatively, use pull/XXX — the merge-prep script will auto-canonicalize XXX to the real PR number at merge time. The .claude/commands/nuke.md recipe uses a bare sentinel token PRLINK_PLACEHOLDER that its script substitutes later; here the substitution never happened, and the sentinel used doesn't match what the validator accepts anyway.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@seungrokj

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run

@chunfangamd chunfangamd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [ x] Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
  • [x ] Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
  • [x ] Verified that this PR has passed PR validation: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28505258231
  • [x ] Verified that this PR passes evals: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28505258231
  • [ x] Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
  • [x ] If an company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.

Signed: @chunfangamd

@Oseltamivir Oseltamivir left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Oseltamivir Oseltamivir left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Oseltamivir Oseltamivir merged commit 54f9e13 into main Jul 1, 2026
29 checks passed
@Oseltamivir Oseltamivir deleted the amd/m3_atom_pd_fp4_0630 branch July 1, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants