Skip to content

Refactor torchada patching into focused compatibility modules#67

Open
yeahdongcn wants to merge 1 commit into
mainfrom
xd/refactoring
Open

Refactor torchada patching into focused compatibility modules#67
yeahdongcn wants to merge 1 commit into
mainfrom
xd/refactoring

Conversation

@yeahdongcn
Copy link
Copy Markdown
Collaborator

Summary

  • Split the monolithic patching internals into focused device, CUDA facade, accelerator, ctypes, runtime, platform, and C++ ops
    compatibility helpers.
  • Expand CUDA-shaped compatibility for public CUDA aliases, random/NCCL/MCCL modules, runtime symbol translation, accelerator fallbacks,
    memory APIs, and CUDA device wrappers.
  • Improve C++ extension/source-porting support and keep C++/CUDA compatibility comments consistent.
  • Replace the dense architecture image docs with concise README architecture text and add compatibility-gap documentation.
  • Add regression coverage for device translation, public aliases, runtime/CDLL mappings, accelerator fallbacks, C++ ops loading, and
    mapping rules.

Testing

  • docker exec -w /ws yeahdongcn1 python -m pytest tests/ --tb=short -q
    • 333 passed, 30 skipped
  • git diff --cached --check

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
@yeahdongcn
Copy link
Copy Markdown
Collaborator Author

@popsiclexu @froststeam Please help with testing on torch2.9 and review. Thanks!

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented May 17, 2026

🤖 Augment PR Summary

Summary: Refactors torchada’s CUDA→MUSA patching into focused compatibility modules and expands CUDA-shaped API coverage.

Key changes:

  • Split monolithic `_patch.py` internals into `_device_compat`, `_cuda_compat`, `_accelerator_compat`, `_ctypes_compat`, and `_runtime` helpers.
  • Broadened `torch.cuda` surface on MUSA: public aliases, `nccl`→`mccl`, `cudart()` translation, and shims for build/debug introspection.
  • Added CUDA-shaped module aliases (`torch.cuda.streams`, `torch.cuda.sparse`, `torch.cuda.random`, `torch.cuda.nvtx`, etc.).
  • Refined C++ ops loading with source discovery and MTGPU arch flag selection.
  • Updated README/README_CN with concise architecture notes; added compatibility-gap documentation.
  • Added regression tests covering device translation, public aliases, runtime/CDLL mappings, accelerator fallbacks, and C++ ops loading.
Compatibility note: `torch.cuda.is_available()` and `torch.version.cuda` remain intentionally unpatched so downstream can detect native CUDA vs MUSA.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

return _CDLLWrapper(cdll_instance, name_str)
return cdll_instance

ctypes.CDLL = PatchedCDLL
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch_ctypes_cdll() replaces ctypes.CDLL, but callers using ctypes.cdll.LoadLibrary (whose _dlltype is captured at import time) may bypass this wrapper and lose symbol translation. Is that an intended compatibility gap?

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

elif name in self._REMAP_ATTRS:
value = getattr(self._musa_module, self._REMAP_ATTRS[name])
else:
value = getattr(self._musa_module, name)
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getattr(self._musa_module, name) raises, the resulting AttributeError message will mention torch.musa rather than torch.cuda, which can be confusing for downstream debugging/handlers. You may want to ensure missing attributes raise a torch.cuda-shaped error message.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@popsiclexu
Copy link
Copy Markdown
Contributor

@popsiclexu @froststeam Please help with testing on torch2.9 and review. Thanks!

=================================================================================================================== short test summary info ====================================================================================================================
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_original_accelerator_takes_precedence_over_musa - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_musa_overrides_take_precedence_when_both_exist - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_fallback_to_musa_when_accelerator_missing - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_override_takes_precedence_over_everything - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_missing_everywhere_raises_attribute_error - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_resolved_attribute_is_cached - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_dir_includes_attributes_from_both_modules - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_remap_used_when_accel_and_musa_lack_index_suffix_name - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_remap_not_used_when_accelerator_has_official_impl - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_remap_keys_listed_in_dir - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_cuda_patching.py::TestAcceleratorModuleWrapper::test_special_attrs_for_nested_lookups - ImportError: cannot import name '_AcceleratorModuleWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
FAILED tests/test_mappings.py::TestCDLLWrapper::test_cdll_wrapper_class_exists - ImportError: cannot import name '_CDLLWrapper' from 'torchada._patch' (/home/dist/zhenxue/github/torchada/src/torchada/_patch.py)
==================================================================================================== 12 failed, 336 passed, 15 skipped, 4 warnings in 3.21s ===================================================================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants