MATE (MUSA AI Tensor Engine) is a centralized library for Generative AI workloads on MUSA. It provides high-performance Attention and GEMM operators, and compatibility wrappers for CUDA-oriented Python APIs.
- High-performance attention and GEMM operators for MUSA
- Compatibility wrappers for
flash_attn_3,sageattention,flash_mla, anddeep_gemm - CLI tools for environment checks, configuration inspection, and replay
| Component | Requirement |
|---|---|
| MUSA Toolkit | 4.3.6 or later |
| TorchMUSA | 2.7 or later |
| Architecture | Pinghu (MP31) |
Use these commands after the MUSA-enabled torch / torch_musa stack is
installed. Keep dependency resolution disabled for local builds so pip does not
replace that stack with upstream PyPI packages.
- Use
--no-build-isolationfor source installs. - Use
--no-isolationfor wheel builds. - Use
--no-depswhen installing local builds.
git clone https://github.com/MooreThreads/mate.git --recursive
cd mate
pip install --no-build-isolation --no-deps -e . -vgit clone https://github.com/MooreThreads/mate.git --recursive
cd mate
python -m build --wheel --no-isolation
python -m pip install --no-deps dist/mate-*.whlMATE_MUSA_ARCH_LIST=3.1 python -m mate.aot
python -m build --wheel --no-isolationCustomize AOT coverage when needed:
python -m mate.aot --attention-aot-level 0 --add-gemm true --add-moe false- If the checkout was cloned without
--recursive, rungit submodule update --init --recursive. - Do not let pip resolve and replace the MUSA PyTorch dependencies unless that is intentional.
- See docs/mate_cli.md for CLI extras and local wheel installation details.
- See docs/environment_variables.md for build and runtime environment variables.
MATE provides a command-line interface for configuration, debugging, diagnostics, and replay.
| Command | Purpose |
|---|---|
mate check |
Validate the runtime environment |
mate show-config |
Display installation and runtime configuration |
mate env |
Show relevant environment variables |
mate replay --dir PATH |
Replay API calls from Level 10 dumps |
mate list-dumps PATH |
List recorded dump directories |
Example:
mate check
mate show-config
mate env
mate replay --dir mate_dumps/
mate list-dumps mate_dumps/See docs/mate_cli.md for full CLI documentation. See docs/environment_variables.md for the complete environment variable reference.
MATE uses the packages under wrappers/ as a compatibility layer for CUDA-oriented software stacks on MUSA. These wrappers preserve familiar package names and high-level APIs while routing execution to MATE operators and kernels on MUSA, which helps existing integrations migrate with smaller code changes.
| Wrapper | Package | Import Path | Purpose | Documentation |
|---|---|---|---|---|
wrappers/flash-attention |
flash_attn_3 |
flash_attn_interface |
FlashAttention-3-compatible APIs on top of MATE attention operators on MUSA | wrapper README, compatibility summary |
wrappers/SageAttention |
sageattention |
sageattention |
SageAttention-compatible dense quantized attention wrapper on top of MATE on MUSA | wrapper README |
wrappers/FlashMLA |
flash_mla |
flash_mla |
FlashMLA-compatible MLA dense/sparse decode and sparse prefill APIs on top of MATE MLA operators on MUSA | wrapper README |
wrappers/DeepGEMM |
deep-gemm |
deep_gemm |
DeepGEMM-compatible APIs on top of MATE GEMM operators on MUSA | wrapper README |
| Path | Purpose |
|---|---|
mate/ |
Core Python package and public APIs |
wrappers/ |
Compatibility wrapper packages for existing Python ecosystems |
docs/ |
Markdown docs and Sphinx sources |
tests/ |
Correctness and integration tests |
benchmarks/ |
Performance and benchmarking scripts |
After installing mate, build the Sphinx docs with:
pip install sphinx furo
cd docs
make html- CLI documentation: docs/mate_cli.md
- Environment variables: docs/environment_variables.md
- FlashAttention-3 compatibility summary: docs/flash_attention.md
- FlashAttention-3 wrapper: wrappers/flash-attention/README.md
- SageAttention wrapper: wrappers/SageAttention/README.md
- FlashMLA wrapper: wrappers/FlashMLA/README.md
- DeepGEMM wrapper: wrappers/DeepGEMM/README.md
MATE is inspired by FlashInfer, FlashAttention, cutlass, FlashMLA, and DeepGemm.