-
Notifications
You must be signed in to change notification settings - Fork 217
[AMD] DeepSeek-V4 FP4 MI355X vLLM MTP: bump image to latest nightly #1981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Fangzhou-Ai
wants to merge
5
commits into
main
Choose a base branch
from
amd/dsv4-fp4-mi355x-vllm-mtp-image-bump
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+13
−3
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
9204152
[AMD] DeepSeek-V4 FP4 MI355X vLLM MTP: bump image to latest nightly
Fangzhou-Ai f293588
chore(changelog): set pr-link for dsv4-fp4-mi355x-vllm-mtp image bump…
Fangzhou-Ai f5ae0ad
docs(changelog): two-stage attention improves across all concurrency …
Fangzhou-Ai a6ff5d8
[AMD] dsv4 fp4 mi355x vllm MTP: use AITER MoE backend
Fangzhou-Ai bb54da7
Merge branch 'main' into amd/dsv4-fp4-mi355x-vllm-mtp-image-bump
Fangzhou-Ai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 The block comment immediately above this entry (lines 1978-1980) still reads "Image reuses the base entry's v0.22.0 ROCm build, which already contains the MTP commit." With this bump, the MTP variant is now on a nightly while the base entry
dsv4-fp4-mi355x-vllmstays onv0.22.0, so that rationale is stale. Consider replacing those two sentences with a note about the intentional divergence and the new rationale (two-stage attention kernels + AITER MLA) already documented in the PR description and perf-changelog.Extended reasoning...
What's stale. The trailing sentences of the block comment at
.github/configs/amd-master.yaml:1978-1980claim:\n\n> Image reuses the base entry's v0.22.0 ROCm build, which already contains the MTP commit.\n\nThat rationale explained why the two entries could share an image tag. It no longer holds.\n\nStep-by-step proof of the divergence.\n\n1. Base entrydsv4-fp4-mi355x-vllmat line 1955 still pinsimage: vllm/vllm-openai-rocm:v0.22.0(unchanged by this PR).\n2. This PR changes the MTP variant at line 1982 fromvllm/vllm-openai-rocm:v0.22.0tovllm/vllm-openai-rocm:nightly-09663abde0f50944a8d5ea30120666024b503faa.\n3. Therefore the two image strings now differ, and "reuses the base entry's v0.22.0 ROCm build" is factually wrong.\n\nWhy the existing wording will mislead. A future reader landing on this recipe will read the block comment, see "reuses the base entry's v0.22.0 ROCm build," and assume the two entries track the same image — for example when doing a future bump they might touch only one entry and expect the other to follow. The PR description already spells out the real reason for the bump (nightly enables two-stage attention kernels / split-KV decode and the AITER MLA backend for the DSv4 MLA path), and the perf-changelog entry restates it. That rationale belongs in the inline comment now that the images have diverged.\n\nImpact. Documentation-only — no functional change, sweep behavior is unaffected. Filing as nit since it's worth fixing while the change is fresh (the author has the context right now) but does not need to block merge.\n\nSuggested fix. Replace the trailing two sentences of the comment (roughly lines 1978-1980) with something like:\n\n> Previously reused the base entry's v0.22.0 image; bumped to a nightly to pick up two-stage attention kernels (split-KV decode) and the AITER MLA backend for the DSv4 MLA path. Base entry stays pinned to v0.22.0 intentionally.