Add wheel support for Newton-Schulz method via cuSolverMp#3004
Conversation
Greptile SummaryThis PR updates the wheel build infrastructure to include cuSolverMP as an optional feature by installing the system packages in both Dockerfiles, creating a canonical
Confidence Score: 4/5Safe to merge for cuSolverMP support, but the three other features promised in the PR description are not implemented. The cuSolverMP wiring — package install, symlink tree, ldconfig, ENV exports — is self-consistent and correct in both Dockerfiles. The gap is between what the PR claims to deliver (four optional build flags) and what is actually wired up (only one). Users relying on the PR description to know the wheel now includes cuBLASMP, NVSHMEM, or MPI support will be wrong. build_tools/wheel_utils/build_wheels.sh — the three missing flag exports (and the Dockerfiles that would need matching dependency installs) are the unfinished portion of this change. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Docker build\nDockerfile.x86 / .aarch] --> B[dnf install libcusolvermp0-cuda-CUDA_MAJOR\n+ devel package]
B --> C[Create /opt/nvidia/cusolvermp\nwith include + lib symlinks]
C --> D[echo lib path to ld.so.conf.d\n+ ldconfig]
D --> E[ENV CUSOLVERMP_HOME=/opt/nvidia/cusolvermp\nENV LD_LIBRARY_PATH += cusolvermp/lib]
E --> F[Container starts\nbuild_wheels.sh]
F --> G[export NVTE_WITH_CUSOLVERMP=1]
G --> H[python setup.py bdist_wheel\ncommon lib]
H --> I[Wheel includes cuSolverMP]
F -.->|NOT exported| J[NVTE_WITH_CUBLASMP\nNVTE_ENABLE_NVSHMEM\nNVTE_UB_WITH_MPI]
J -.->|NOT installed| K[cuBLASMP / NVSHMEM / OpenMPI\ndependencies]
style J fill:#ffcccc,stroke:#cc0000
style K fill:#ffcccc,stroke:#cc0000
Reviews (2): Last reviewed commit: "Add NS via cusolvermp to wheel build" | Re-trigger Greptile |
|
|
||
| SITE_PACKAGES=$(/opt/python/cp310-cp310/bin/python -c "import sysconfig; print(sysconfig.get_paths()['purelib'])") | ||
| export CUBLASMP_HOME="${SITE_PACKAGES}/nvidia/cublasmp/cu${CUDA_MAJOR}" | ||
| export CUSOLVERMP_HOME="${SITE_PACKAGES}/nvidia/cu${CUDA_MAJOR}" |
There was a problem hiding this comment.
Likely incorrect
CUSOLVERMP_HOME path
The path ${SITE_PACKAGES}/nvidia/cu${CUDA_MAJOR} is missing the package-name segment. Every other NVIDIA Python package follows the layout site-packages/nvidia/<package-name>/cu<ver>/ — for example, nvidia-cublasmp-cu12 installs under nvidia/cublasmp/cu12/, so nvidia-cusolvermp-cu12 should install under nvidia/cusolvermp/cu12/. With the current path the .so symlink loop silently skips cuSolverMP's lib/ directory ([ -d "$lib_dir" ] || continue), no unversioned .so stubs are created, and the linker will not find cuSolverMP at build time even though NVTE_WITH_CUSOLVERMP=1 is exported.
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
522c631 to
df140b3
Compare
| # Enable optional build features. cuSolverMp is provided by the build image | ||
| # (see Dockerfile.x86 / Dockerfile.aarch), which also sets CUSOLVERMP_HOME. | ||
| export NVTE_WITH_CUSOLVERMP=1 |
There was a problem hiding this comment.
Three of the four advertised flags never get exported
The PR description and title claim to enable NVTE_WITH_CUSOLVERMP, NVTE_WITH_CUBLASMP, NVTE_ENABLE_NVSHMEM, and NVTE_UB_WITH_MPI in the wheel build. Only NVTE_WITH_CUSOLVERMP is exported here. Neither NVTE_WITH_CUBLASMP, NVTE_ENABLE_NVSHMEM, nor NVTE_UB_WITH_MPI are exported in build_wheels.sh, and no corresponding packages (cuBLASMP, NVSHMEM, OpenMPI) are installed in either Dockerfile. Wheels built from this script will silently omit those three features.
On the second thought - we need nvidia-cusolvermp-cu12/cu13 dependency at runtime, not just when building TE/Common
Description
#2706 added distributed Newton-Schulz matrix orthogonalization API via cuSolverMp, this PR brings the support for the same via published wheels.
Type of change
Changes
NVTE_WITH_CUSOLVERMPTE build via PyPI wheel.Checklist: