[WIP] [do not merge] Add MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config#1982
[WIP] [do not merge] Add MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config#1982jasonlizhengjian wants to merge 3 commits into
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. 感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致 如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢 PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow 一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。 如需更多帮助,PR 作者可通过 Slack 联系核心维护者。 |
|
Claude finished @jasonlizhengjian's task in 3m 55s —— View job Review: MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config
LGTM — no blocking issues found. This is a clean, well-structured port. Details verified:
Minor (non-blocking, FYI only): the |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28599128708 |
|
Claude finished @jasonlizhengjian's task in 1m 47s —— View job Re-review: MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config
LGTM — no blocking issues found. Since the previous review, the only new change is commit
Minor (non-blocking, still open from prior review): the |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28599246606 |
| model: | ||
| path: "nvidia/MiniMax-M3-NVFP4" | ||
| container: "vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41" | ||
| precision: "fp4" |
There was a problem hiding this comment.
🔴 The launcher's model-path alias key and the recipe's model.path don't match for this new b200-dgxc + minimaxm3-fp4 pairing: runners/launch_b200-dgxc.sh:74 exports SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4" but the new recipe at line 4 uses path: "nvidia/MiniMax-M3-NVFP4", so srtctl'''s model_paths lookup misses. Every other analogous case in the tree (b300-nv minimaxm3-fp4, b200 minimaxm2.5-fp4/fp8, b200 dsv4-fp4) has the launcher prefix exactly equal to the recipe path. Fix by changing one side to match the other — e.g. set SRT_SLURM_MODEL_PREFIX="nvidia/MiniMax-M3-NVFP4" for the minimaxm3-fp4 branch on b200-dgxc (mirroring runners/launch_b300-nv.sh:52), or change the new recipe'''s model.path to "minimax-m3-nvfp4".
Extended reasoning...
The mismatch
runners/launch_b200-dgxc.sh:71-74 (pre-existing from PR #1932) sets up the minimaxm3-fp4 model resolution:
elif [[ $MODEL_PREFIX == "minimaxm3" && $PRECISION == "fp4" ]]; then
# NVFP4 checkpoint, pre-staged on the b200-dgxc scratch tree.
export MODEL_PATH="/scratch/fsw/models/MiniMax-M3-NVFP4"
export SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4"The launcher then writes srtslurm.yaml (around line 157):
model_paths:
"minimax-m3-nvfp4": "/scratch/fsw/models/MiniMax-M3-NVFP4"But the new recipe at benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b200-fp4/8k1k/2p1d-dep2-dep8-8k1k.yaml:4 uses:
model:
path: "nvidia/MiniMax-M3-NVFP4"srtctl looks up "nvidia/MiniMax-M3-NVFP4" in model_paths — the only registered alias is "minimax-m3-nvfp4", so the lookup misses.
Why this PR is the trigger
The pre-existing lines 71-74 were previously only exercised through the single-node code path (the else branch at the bottom of the launcher), which never touches SRT_SLURM_MODEL_PREFIX — it mounts $MODEL_PATH directly via --container-mounts and sets MODEL=$MODEL_PATH. So the mismatch was benign.
This PR adds the new elif at launch_b200-dgxc.sh:116-121 that first routes minimaxm3-fp4 through the srtctl / srt-slurm code path, which is the path that actually consumes SRT_SLURM_MODEL_PREFIX as an alias key in srtslurm.yaml. So the pre-existing but previously-latent mismatch becomes load-bearing exactly at this PR.
Cross-check with every other b200/b300 case
| Launcher case | SRT_SLURM_MODEL_PREFIX |
Recipe model.path |
Match? |
|---|---|---|---|
b300-nv minimaxm3-fp4 (launch_b300-nv.sh:52) |
nvidia/MiniMax-M3-NVFP4 |
nvidia/MiniMax-M3-NVFP4 |
✅ |
b300-nv minimaxm3-fp8 (launch_b300-nv.sh:55) |
MiniMaxAI/MiniMax-M3-MXFP8 |
MiniMaxAI/MiniMax-M3-MXFP8 |
✅ |
| b200-dgxc minimaxm2.5-fp4 | minimax-m2.5-nvfp4 |
minimax-m2.5-nvfp4 |
✅ |
| b200-dgxc minimaxm2.5-fp8 | minimax-m2.5-fp8 |
minimax-m2.5-fp8 |
✅ |
| b200-dgxc dsv4-fp4 | deepseek-v4-pro |
deepseek-v4-pro |
✅ |
| b200-dgxc minimaxm3-fp4 (this PR) | minimax-m3-nvfp4 |
nvidia/MiniMax-M3-NVFP4 |
❌ |
Every other pairing in the tree matches exactly; the new b200-dgxc minimaxm3-fp4 case is the sole outlier. The b300 minimaxm3-fp4 case in particular is instructive because the new b200 recipe is a direct port of the b300 4p2d-dep2-dep8-8k1k recipe (per the PR description), so it inherits nvidia/MiniMax-M3-NVFP4 — which matches on b300 but not on b200.
Step-by-step proof of the failure
- CI dispatches
minimaxm3-fp4-b200-dynamo-vllmon theb200-multinoderunner. launch_b200-dgxc.shruns withIS_MULTINODE=true,MODEL_PREFIX=minimaxm3,PRECISION=fp4,FRAMEWORK=dynamo-vllm.- Line 74 exports
SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4"andMODEL_PATH="/scratch/fsw/models/MiniMax-M3-NVFP4". - The new elif at lines 116-121 fires, clones srt-slurm, and copies the recipe into
recipes/vllm/minimax-m3/b200-fp4/. - The
cat > srtslurm.yaml <<EOFblock writesmodel_paths: { "minimax-m3-nvfp4": "/scratch/fsw/models/MiniMax-M3-NVFP4" }. srtctl apply -f $CONFIG_FILEis invoked; srtctl parses the recipe and readsmodel.path: "nvidia/MiniMax-M3-NVFP4".- srtctl checks
model_pathsfor the key"nvidia/MiniMax-M3-NVFP4"— not present. - Outcome A: srtctl errors on unknown alias and the job fails immediately at
srtctl apply. Outcome B: srtctl treats the unmatched value as a HuggingFace hub identifier and attempts to downloadnvidia/MiniMax-M3-NVFP4from the hub on every job invocation, negating the pre-staging that the comment onlaunch_b200-dgxc.sh:72explicitly relies on.
Either outcome makes the full-sweep check fail — either the job errors out at model resolution, or the HF pull blows the container FS / times out the runner. The PR is labeled full-sweep-fail-fast-no-canary, so this will surface as a fail-fast failure.
Fix
One-liner in either direction. The b300 side of the tree is the reference implementation, so the least surprising change is to line 74 of runners/launch_b200-dgxc.sh:
elif [[ $MODEL_PREFIX == "minimaxm3" && $PRECISION == "fp4" ]]; then
# NVFP4 checkpoint, pre-staged on the b200-dgxc scratch tree.
export MODEL_PATH="/scratch/fsw/models/MiniMax-M3-NVFP4"
- export SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4"
+ export SRT_SLURM_MODEL_PREFIX="nvidia/MiniMax-M3-NVFP4"This also matches runners/launch_b300-nv.sh:52 verbatim, keeping the two clusters consistent for the same model+precision.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28599474801 |
Summary
b200-multinoderunnermax-cudagraph-capture-sizeandmax-num-batched-tokensfrom the prefill configurationConfiguration
Validation
generate_sweep_configs.pyslice for MiniMax-M3 FP4 Dynamo-vLLM onb200-multinodepython -m pytest utils/matrix_logic/ -v(163 passed)bash -n runners/launch_b200-dgxc-slurm.shgit diff --check