We appreciate all contributions. If you are planning to contribute a bug fix for an open issue, please comment on the thread and we're happy to provide guidance. You are welcome to pick issues with good first issue and help wanted labels to get started. A series of videos is also available to give you an overview of how PyTorch/XLA works under the hood.
If you plan to contribute new features or extensions to this repository, first open an issue and discuss the feature with us. Sending a PR without discussion might result in a rejected PR, because we might be taking the repository in a different direction.
Please follow the steps below in order:
To work on PyTorch/XLA, you'll need a powerful Linux machine with plenty of
CPUs and RAM. Make sure you have git and docker installed on this machine.
If you don't, follow https://github.com/git-guides/install-git and
https://docs.docker.com/engine/install/ to install them.
In order to create PRs later, we need to first fork the Git repos we'll be working with:
- Go to https://github.com/pytorch/pytorch and fork it as
pytorch. - Go to https://github.com/pytorch/vision and fork it as
vision. - Go to https://github.com/pytorch/xla and fork it as
pytorch-xla. Note the change of project name: we want to avoid confusion with the OpenXLA project that PyTorch/XLA depends on, which is also namedxla.
Next, we need to clone the forked repos locally so that we can make changes.
On your Linux machine, decide a directory as your workspace. Make sure that this directory and all of its ancestors are publically readable. Then run the following commands on this machine:
# Make sure that all new files and directories are publically readable.
# Otherwise you may have permission issues when building the code in bazel's
# sandbox mode.
umask 022
# Create the workspace directory if you haven't.
export WORKSPACE_DIR=<absolute-path-to-your-workspace>
mkdir -p $WORKSPACE_DIR
# Clone the repos.
cd $WORKSPACE_DIR
git clone --recursive git@github.com:<your-github-user-name>/pytorch.git
git clone --recursive git@github.com:<your-github-user-name>/vision.git
git clone --recursive git@github.com:<your-github-user-name>/pytorch-xla.git pytorch/xlaSince PR #9654, PyTorch/XLA started pinnning a PyTorch version. The pinned
commit can be found in .torch_commit file at the root directory. Note that
the pinned PyTorch version guarantees all PyTorch/XLA tests are passing
whenever the underlying PyTorch is compiled at that specific commit. Therefore,
specially for development, it's recommended that PyTorch is compiled at that
specific commit. Otherwise you might end up with all kinds of errors: from
build errors, to segmentation faults. So, make sure to check out that version:
# Go to PyTorch directory.
cd $WORKSPACE_DIR/pytorch
# Retrieve the PyTorch commit pin inside PyTorch/XLA directory.
# Note: it's located in the last line of `.torch_commit`.
COMMIT=$(tail -1 "xla/.torch_commit")
# Create a branch (optional) and jump at that commit.
git checkout -b pin "$COMMIT"From time to time, we'll need to bring our forked repos up to date with the official (aka, upstream) repos. Therefore we'll need to tell Git where to find these upstream repos. We only need to do this once:
# Set up remote tracking for pytorch.
cd $WORKSPACE_DIR/pytorch
git remote add upstream https://github.com/pytorch/pytorch.git
# Set up remote tracking for vision.
cd $WORKSPACE_DIR/vision
git remote add upstream https://github.com/pytorch/vision.git
# Set up remote tracking for pytorch/xla.
cd $WORKSPACE_DIR/pytorch/xla
git remote add upstream https://github.com/pytorch/xla.gitcd $WORKSPACE_DIR
ln -s pytorch/xla/.devcontainer/ .devcontainer
ln -s pytorch/xla/contrib/vscode/ .vscode
ln -s pytorch/xla/.style.yapf .style.yapf
ln -s pytorch/xla/.clang-format .clang-formatWe recommend you use our prebuilt Docker image to start your development work using either VS Code or a local container:
WARNING: DO NOT run git commands that may change the repo's state inside
the container. Doing so will mess up the permission of Git's internal files
as you run as root inside the container. Instead, run all mutating git
commands on your Linux machine directly, outside of the container.
-
Start VS Code and ensure you have the
Remote DevelopmentExtension Pack installed. It includes theRemote - SSHandDev Containersextensions. -
From VS Code, connect to your remote Linux machine and open your workspace directory:
- New Window > Connect to... > Connect to Host ... > type the remote address.
- Open... > select the workspace directory on your remote machine.
- When asked "Do you trust the authors of the files in this folder?", click on "Yes, I trust the authors".
- When asked if you want to reopen in dev container, click on "Yes".
If you are not prompted to reopen in a container, in the VS Code command
pallete, type
Dev Containers: Reopen in Containerto open your workspace in one of our pre-built Docker containers. - Select the correct container based on the accellarators on your machine.
Use
tpu-contributorif you are unsure of which to use. If you're a Googler, usetpu-internal, which is set up for bazel remote build caching for faster builds.
-
Make sure VSCode discovers the
pytorch/xlarepo so that diff highlighting works (by default VSCode cannot discover it as it's nested inside thepytorchrepo):- Go to File > Add Folder to Workspace..., and add the
pytorch/xlafolder. - In the repository list, you should now see 3 repos:
xla(forpytorch/xla),pytorch, andvision.
- Go to File > Add Folder to Workspace..., and add the
-
Open a new terminal window in VS Code. Since you are running as root in this container, mark the repository directories as safe. The commands below assume your workspace directory is
torch, update the commands to use your workspace directory.git config --global --add safe.directory /workspaces/torch/pytorch git config --global --add safe.directory /workspaces/torch/pytorch/xla git config --global --add safe.directory /workspaces/torch/vision
-
In the terminal window, run the following commands to build PyTorch, TorchVision, and PyTorch/XLA:
# Uninstall any existing torch torch-xla torchvision installation # Run multiple times if needed pip uninstall torch torch-xla torchvision libtpu-nightly # pytorch/xla requires pytorch wheel to be presented under pytorch/dist cd pytorch python setup.py bdist_wheel python setup.py install cd ../vision python setup.py develop cd ../pytorch/xla python setup.py develop # Optional: if you're using TPU, install libtpu pip install torch_xla[tpu] \ -f https://storage.googleapis.com/libtpu-wheels/index.html \ -f https://storage.googleapis.com/libtpu-releases/index.html # Optional: if you're using custom kernels, install pallas dependencies pip install --pre torch_xla[pallas] --index-url https://us-python.pkg.dev/ml-oss-artifacts-published/jax/simple/ --find-links https://storage.googleapis.com/jax-releases/libtpu_releases.html
-
If you are running on a TPU VM, ensure
torchandtorch_xlawere built and installed correctly:python -c 'import torch_xla; print(torch_xla.device())' # Output: xla:0
-
Set up
clangdso that C++ code completion/navigation works:-
Install
clangd: open any C++ source file in VS Code to trigger a prompt to installclangdin the dev container. Accept the request. Restart VS Code for the change to take effect. -
Generate the compilation database so that
clangdknows how to compile the C++ files:# Run this from a terminal in VS Code, in the pytorch/xla directory # of the workspace. scripts/update_compile_commands.py
This should create the
build/compile_commands.jsonfile, which describes how each C++ source file is compiled. The script may take several minutes the first time. You may need to rerun the script if build rules or file structures have changed. However, subsequent runs are usually much faster.
-
Subsequent builds: after building the packages from source code for the
first time, you may need to build everything again, for example, after a
git pull. You can:
- run
scripts/build_developer.sh -b pytorchto rebuild PyTorch, TorchVision, and PyTorch/XLA, - run
scripts/build_developer.shto rebuild PyTorch/XLA only.
-
Setup Development Docker Image
docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:tpu docker run --privileged --network=host --name ptxla -it -d -e "TERM=xterm-256color" us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:tpu docker exec --privileged -it ptxla /bin/bash
All of the code below will be assumed to be run within the docker.
-
Clone the PyTorch repo as per instructions.
git clone --recursive https://github.com/pytorch/pytorch cd pytorch/ -
Clone the PyTorch/XLA repo:
git clone --recursive https://github.com/pytorch/xla.git
-
Build PyTorch
# pytorch/xla requires pytorch wheel to be presented under pytorch/dist python setup.py bdist_wheel python setup.py develop -
Build PyTorch/XLA
cd xla/ python setup.py develop
In pytorch/xla repo we enforce coding style for both C++ and Python files.
Specifically, we use clang-format-11 with a customized style config to format
C++, and yapf (specially version 0.40.2) with a customized style config
to format Python. Please ensure that your change is formatted properly before
sending out a PR.
The easiest way to do this is to set up a git push hook to automatically
format changed or added C++ and Python files before pushing:
First, install the necessary tools if needed:
cd $WORKSPACE_DIR/pytorch/xla
# If clang-format-11 is not yet installed...
sudo apt install clang-format-11
# If yapf 0.40.2 is not yet installed...
pip install yapf==0.40.2Then, set up the git push hook:
scripts/git_fix.py --set_git_push_hookNow, whenever you run git push, the C++ and Python files will be
automatically formatted according to our style guide.
You can also format the files manually by running
scripts/git_fix.pyTo run the tests, follow one of the options below:
-
Run on local CPU:
export PJRT_DEVICE=CPU -
Run on Cloud TPU:
export PJRT_DEVICE=TPU
For more detail on configuring the runtime, please refer to this doc
If you are planning to be building from source and hence using the latest PyTorch/TPU code base, it is suggested for you to select the Nightly builds when you create a Cloud TPU instance.
Then run test/run_tests.sh and test/cpp/run_tests.sh to verify the setup is working.
- If local changes aren't visible, uninstall existing pytorch/xla with
pip uninstall torch_xlaandpip uninstall torch, then rebuild PyTorch and PyTorch/XLA withpython setup.py developorpython setup.py install. - PJRT errors when running on TPU such as
The PJRT plugin has PJRT API version 0.34. The framework PJRT API version is 0.40. You need to update yourlibtpu.soand ensure it's in yourLD_LIBRARY_PATHenvironmental directory. You can download a newlibtpu.soat Google Cloud, which are sorted by date. Download the newest one and install it atpip install libtpu...whl.
On your Linux machine (not inside the dev container), create a local branch, commit your local changes to it, and push the change to GitHub:
# Assuming that WORKSPACE_DIR is your workspace directory.
cd $WORKSPACE_DIR/pytorch/xla
git checkout -b my-branch
# ... make changes
git add foo/bar.cpp
git commit -m "Implement feature X."
# Push the committed local changes to GitHub.
# You only need to run the next line once.
git config --global push.autoSetupRemote true
git pushThe last command will print a link for creating a PR. Open the link to create the PR.
From time to time, you'll need to bring your forked repos up to date with
the upstream repos. You can do this either by using a convenience script,
or by manually running git commands.
This is the easiest way to update the forked repos. Please run the following commands on your Linux machine (not inside the dev container).
First, create a git sync-main alias to run the scripts/git_sync_main.py script
(you only need to do it once):
cd $WORKSPACE_DIR/pytorch/xla
git config alias.sync-main '!scripts/git_sync_main.py'After that, you can (in the $WORKSPACE_DIR/pytorch/xla directory) run:
# Update the pytorch/xla repo.
git sync-main
# Update the vision and pytorch/xla repos.
git sync-main -b vision
# Update the pytorch, vision, and pytorch/xla repos.
git sync-main -b pytorch
# See the usage of the command.
git sync-main -hYou can also update the forked repos step by step, by running the following commands on your Linux machine (not inside the dev container).
First, for the pytorch repo:
cd $WORKSPACE_DIR/pytorch
# Fetch the latest changes from upstream.
git fetch upstream
git checkout main
# Merge the changes from upstream/main into your local branch.
git merge upstream/main
# Update submodules to match the latest changes.
git submodule update --recursive
# Push the updated branch to your fork on GitHub.
git push origin mainNext, for the vision repo:
cd $WORKSPACE_DIR/vision
git fetch upstream
git checkout main
git merge upstream/main
git push origin mainFinally, for the pytorch/xla repo (note that the primary branch is called
master instead of main in this repo):
cd $WORKSPACE_DIR/pytorch/xla
git fetch upstream
git checkout master
git merge upstream/master
git push origin masterWhile you work on a PR, other PRs may be merged into the upstream repo's default branch, and you may want to make sure your PR works with them. In this case, you'll want to rebase your commits on top of the upstream commits. You can do this by:
cd $WORKSPACE_DIR/pytorch/xla
git checkout your-branch-name
# Update the remote-tracking branches for upstream.
git fetch upstream
# Rebase commits in your PR on top of the upstream master branch.
git rebase upstream/master
# If the above command fails due to merge conflicts, follow the error messages
# to resolve the conflicts.
# When you are done, push the updated branch to your fork on GitHub. This will
# update the PR.
git push --force-with-lease origin your-branch-nameNormally we run git commands outside of the dev container. If we run
a mutating git command inside the dev container, it may change the owner
of some files inside the .git directory to root, which will prevent us from
running git commands outside of the dev container. To fix this, run the
following commands outside of the dev container to fix the file owners:
sudo chown -R $USER .git