Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2695,7 +2695,7 @@ minimaxm3-fp4-mi355x-vllm-mtp:
# https://github.com/ROCm/ATOM/blob/5d42d49f9e4292e5b61475917e92e7ec1b1dacb7/recipes/MiniMax-M3.md
# block size 128 is mandatory for MSA. TP4 on a single gfx950 node, per the recipe.
minimaxm3-fp4-mi355x-atom:
image: rocm/atom-dev:MiniMax-M3-20260623
image: rocm/atom-dev:MiniMax-M3-20260630
model: amd/MiniMax-M3-MXFP4
model-prefix: minimaxm3
runner: mi355x
Expand All @@ -2714,7 +2714,7 @@ minimaxm3-fp4-mi355x-atom:
- { tp: 4, conc-start: 1, conc-end: 256 }

minimaxm3-fp4-mi355x-atom-mtp:
image: rocm/atom-dev:MiniMax-M3-20260623
image: rocm/atom-dev:MiniMax-M3-20260630
model: amd/MiniMax-M3-MXFP4
model-prefix: minimaxm3
runner: mi355x
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ if [ "$DP_ATTENTION" = "true" ]; then
fi

SPEC_ARGS=()
OPT_ARGS=(--online_quant_config '{"global_quant_config": "ptpc_fp8", "exclude_layer": ["lm_head", "model.embed_tokens", "vision_tower", "multi_modal_projector", "patch_merge_mlp", "*block_sparse_moe"]}' --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}')

# Start GPU monitoring (power, temperature, clocks every second)
start_gpu_monitor
MEM_FRAC_STATIC=0.8

set -x
export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0
export ATOM_M3_SPARSE_USE_ASM_PA=1
export ATOM_FORCE_ATTN_TRITON=1
export MAX_MODEL_LEN=32768
export MAX_NUM_BATCHED_TOKENS=32768
export MAX_NUM_SEQS=256
Expand All @@ -48,6 +48,7 @@ python3 -m atom.entrypoints.openai_server \
--server-port $PORT \
"${PARALLEL_ARGS[@]}" \
"${SPEC_ARGS[@]}" \
"${OPT_ARGS[@]}" \
--block-size 128 \
--gpu-memory-utilization $MEM_FRAC_STATIC \
--max-model-len $MAX_MODEL_LEN \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ if [ "$DP_ATTENTION" = "true" ]; then
fi

SPEC_ARGS=(--method eagle3 --draft-model Inferact/MiniMax-M3-EAGLE3 --num-speculative-tokens 3 )
OPT_ARGS=(--online_quant_config '{"global_quant_config": "ptpc_fp8", "exclude_layer": ["lm_head", "model.embed_tokens", "vision_tower", "multi_modal_projector", "patch_merge_mlp", "*block_sparse_moe"]}' --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}')

# Start GPU monitoring (power, temperature, clocks every second)
start_gpu_monitor
MEM_FRAC_STATIC=0.8

set -x
export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0
export ATOM_M3_SPARSE_USE_ASM_PA=1
export ATOM_FORCE_ATTN_TRITON=1
export MAX_MODEL_LEN=32768
export MAX_NUM_BATCHED_TOKENS=32768
export MAX_NUM_SEQS=256
Expand All @@ -48,6 +48,7 @@ python3 -m atom.entrypoints.openai_server \
--server-port $PORT \
"${PARALLEL_ARGS[@]}" \
"${SPEC_ARGS[@]}" \
"${OPT_ARGS[@]}" \
--block-size 128 \
--gpu-memory-utilization $MEM_FRAC_STATIC \
--max-model-len $MAX_MODEL_LEN \
Expand Down
9 changes: 9 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4352,6 +4352,15 @@
- "Reuse the existing MXFP8 B300 topology and concurrency matrix across 15 srt-slurm recipes, while dropping the FP8-only Marlin override from TP4 decode"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1966

- config-keys:
- minimaxm3-fp4-mi355x-atom
- minimaxm3-fp4-mi355x-atom-mtp
description:
- "Bump image to rocm/atom-dev:MiniMax-M3-20260630 for both fp4 atom entries"
- "Add OPT_ARGS: pass --online_quant_config '{\"global_quant_config\": \"ptpc_fp8\", \"exclude_layer\": [\"lm_head\", \"model.embed_tokens\", \"vision_tower\", \"multi_modal_projector\", \"patch_merge_mlp\", \"*block_sparse_moe\"]}' and --hf-overrides '{\"use_index_cache\": true, \"index_topk_freq\": 4}' to both scripts"
- "Replace AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0 and ATOM_M3_SPARSE_USE_ASM_PA=1 with ATOM_FORCE_ATTN_TRITON=1"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1967

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”΄ The new perf-changelog entry appended for this PR sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER β€” a literal, un-substituted token. The changelog validator (utils/validate_perf_changelog.py) only accepts the canonical /pull/<digits> URL or the sentinel XXX / .../pull/XXX; PLACEHOLDER is neither, so utils/prepare_perf_changelog_merge.py will raise appended entry N has unexpected pr-link and block merge. Replace it with 1967 (or XXX to hit the auto-canonicalization path).

Extended reasoning...

What the bug is. The last entry appended to perf-changelog.yaml at line 4362 carries pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER. Every other entry in the file uses either a canonical /pull/<digits> URL (e.g. the immediately preceding /pull/1966) or the accepted sentinel XXX. The literal string PLACEHOLDER was never substituted with the real PR number (1967).

Why the existing tooling won't accept it. utils/validate_perf_changelog.py defines the accepted placeholder set at lines 24-27:

PR_LINK_PLACEHOLDERS = {
    "XXX",
    "https://github.com/SemiAnalysisAI/InferenceX/pull/XXX",
}

validate_added_pr_link() (lines 144-160) rejects any newly-appended pr-link that is neither in PR_LINK_PLACEHOLDERS nor matches the canonical CANONICAL_PR_LINK regex (which requires \d+, i.e. digits only). PLACEHOLDER matches neither: it is not the literal string XXX and PLACEHOLDER is not digits.

What triggers the failure. utils/merge_with_reuse.sh invokes utils/prepare_perf_changelog_merge.py at lines 181-185 unconditionally in the standard merge flow. That script's canonicalize_appended_links() (line 112) walks every appended entry and requires each pr-link to be either the canonical URL or in PR_LINK_PLACEHOLDERS β€” otherwise it raises ChangelogValidationError('appended entry N has unexpected pr-link ...'). The PR-branch validation gate exercises the same code path.

Step-by-step proof.

  1. CI runs utils/validate_perf_changelog.py on the diff β†’ validate_added_pr_link() sees https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER.
  2. It first tests membership in PR_LINK_PLACEHOLDERS β€” no match (the sentinel is XXX, not PLACEHOLDER).
  3. It then tests CANONICAL_PR_LINK.fullmatch() β€” no match (the regex demands one or more digits after /pull/, and PLACEHOLDER has none).
  4. Validation raises ChangelogValidationError, the check fails, merge is blocked. The same failure recurs when prepare_perf_changelog_merge.py runs during the merge step.

Impact. This is a hard merge-blocker: neither the PR-branch validation gate nor the merge-time canonicalize step will accept the entry. Beyond the CI failure, the perf-changelog is an append-only audit log linking each benchmark trigger back to its owning PR; if this were somehow to sneak through, the entry would be orphaned from its PR context forever (per AGENTS.md, editing in place is discouraged).

Fix. Replace pull/PLACEHOLDER with pull/1967 (the canonical URL for this PR). Alternatively, use pull/XXX β€” the merge-prep script will auto-canonicalize XXX to the real PR number at merge time. The .claude/commands/nuke.md recipe uses a bare sentinel token PRLINK_PLACEHOLDER that its script substitutes later; here the substitution never happened, and the sentinel used doesn't match what the validator accepts anyway.


- config-keys:
- minimaxm3-fp8-mi355x-atom
- minimaxm3-fp8-mi355x-atom-mtp
Expand Down