fix(kill): use absolute path to ezpz binary in SSH command#132
fix(kill): use absolute path to ezpz binary in SSH command#132
Conversation
`ezpz kill --all-nodes` failed on every remote node with:
[<node>] FAILED (exit 127): zsh:1: command not found: ezpz
Root cause: SSH command-mode invocations (`ssh node "<cmd>"`) don't
source the user's .zshrc/.bashrc, so PATH only includes the system
defaults. The venv's bin/ where `ezpz` lives isn't on the remote
PATH, and the bare command name fails to resolve.
Fix: use sys.argv[0] (the absolute path to the locally-running
ezpz binary) for the remote command instead of the bare name
"ezpz". Every worker mounts the same venv at the same path
(Lustre/NFS), so the absolute path resolves identically everywhere.
Falls back to "ezpz" when sys.argv[0] is empty (defensive — should
never happen in practice).
Added a regression test that mocks sys.argv to /lus/.../venv/bin/ezpz
and asserts the remote command uses that absolute path. The two
existing _ssh_kill tests now also patch sys.argv so they don't
depend on however pytest happens to launch.
Reviewer's GuideUpdates SSH-based kill command to invoke ezpz via its absolute binary path derived from sys.argv[0], ensuring it resolves correctly on remote nodes whose SSH sessions lack the virtualenv path, and adds tests that patch sys.argv to assert this behavior and decouple tests from pytest’s invocation details. Sequence diagram for SSH kill using absolute ezpz binary pathsequenceDiagram
actor User
participant LocalEzpz as Local_ezpz_CLI
participant SSH as SSH_Client
participant RemoteEzpz as Remote_ezpz_CLI
User->>LocalEzpz: run ezpz kill --all-nodes
LocalEzpz->>LocalEzpz: build remote_cmd using sys.argv[0]
LocalEzpz->>SSH: ssh node remote_cmd
SSH->>RemoteEzpz: exec /path/to/venv/bin/ezpz kill --signal SIG
RemoteEzpz->>RemoteEzpz: perform kill logic
RemoteEzpz-->>SSH: returncode, stderr
SSH-->>LocalEzpz: returncode, stderr
LocalEzpz-->>User: aggregated results per node
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Relying directly on
sys.argv[0]assumes it is an absolute, executable path from a console script; consider normalizing it withos.path.abspathand/or validating it exists before falling back to the bareezpzname. - The implementation hard-codes the assumption that all workers mount the same venv path; if that assumption might change, consider making the remote ezpz path configurable (e.g., via an environment variable or option) instead of baking it into
sys.argv[0].
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Relying directly on `sys.argv[0]` assumes it is an absolute, executable path from a console script; consider normalizing it with `os.path.abspath` and/or validating it exists before falling back to the bare `ezpz` name.
- The implementation hard-codes the assumption that all workers mount the same venv path; if that assumption might change, consider making the remote ezpz path configurable (e.g., via an environment variable or option) instead of baking it into `sys.argv[0]`.
## Individual Comments
### Comment 1
<location path="tests/test_kill.py" line_range="215-224" />
<code_context>
+ @patch("sys.argv", ["/lus/.../venv/bin/ezpz"])
</code_context>
<issue_to_address>
**suggestion (testing):** Add a regression test for the fallback behaviour when sys.argv[0] is empty or sys.argv is falsy.
Currently the defensive path (falling back to "ezpz" when sys.argv is empty or sys.argv[0] is falsy) isn’t covered by tests. Please add a regression test that patches sys.argv to [] or [""], and asserts that the constructed remote command starts with bare "ezpz" while still including the expected flags. This will safeguard and document the behaviour for malformed argv environments.
Suggested implementation:
```python
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
@patch("sys.argv", [""])
@patch("subprocess.run")
def test_remote_falls_back_to_bare_ezpz_when_argv0_empty(self, mock_run):
"""Remote command falls back to bare 'ezpz' when sys.argv[0] is empty."""
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
# Exercise the same entry point as the other kill tests
# (this should construct and run the remote ssh command)
kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])
# The last positional argument to ssh should be the remote shell command
ssh_cmd = mock_run.call_args[0][0]
remote_cmd = ssh_cmd[-1]
assert remote_cmd.startswith("ezpz ")
assert "--signal TERM" in remote_cmd
assert "--dry-run" in remote_cmd
@patch("sys.argv", [])
@patch("subprocess.run")
def test_remote_falls_back_to_bare_ezpz_when_argv_empty(self, mock_run):
"""Remote command falls back to bare 'ezpz' when sys.argv is empty."""
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])
ssh_cmd = mock_run.call_args[0][0]
remote_cmd = ssh_cmd[-1]
assert remote_cmd.startswith("ezpz ")
assert "--signal TERM" in remote_cmd
assert "--dry-run" in remote_cmd
```
1. Ensure that `kill_main` is imported at the top of `tests/test_kill.py` and that it is the same entry point used by the existing tests (for example, `from ezpz.kill import main as kill_main`). If the existing tests invoke a different helper to construct the remote ssh command, update the two new tests to call that same helper instead.
2. If the remote ssh command is obtained from `mock_run` differently in your codebase (for example, if the remote command is not the last element of the list passed to `subprocess.run`, or if `shell=True` is used and the entire command is a single string), adjust the `ssh_cmd`/`remote_cmd` extraction accordingly to mirror `test_remote_uses_absolute_path_to_local_ezpz`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @patch("sys.argv", ["/lus/.../venv/bin/ezpz"]) | ||
| @patch("subprocess.run") | ||
| def test_remote_uses_absolute_path_to_local_ezpz(self, mock_run): | ||
| """Remote command uses sys.argv[0] (absolute path), not bare 'ezpz'. | ||
|
|
||
| SSH command-mode invocations don't source the user's shell rc, | ||
| so the venv's bin/ isn't on the remote PATH. The bare-name | ||
| form fails with 'command not found: ezpz' even when the venv | ||
| is mounted at the same path on the remote node. | ||
| """ |
There was a problem hiding this comment.
suggestion (testing): Add a regression test for the fallback behaviour when sys.argv[0] is empty or sys.argv is falsy.
Currently the defensive path (falling back to "ezpz" when sys.argv is empty or sys.argv[0] is falsy) isn’t covered by tests. Please add a regression test that patches sys.argv to [] or [""], and asserts that the constructed remote command starts with bare "ezpz" while still including the expected flags. This will safeguard and document the behaviour for malformed argv environments.
Suggested implementation:
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
@patch("sys.argv", [""])
@patch("subprocess.run")
def test_remote_falls_back_to_bare_ezpz_when_argv0_empty(self, mock_run):
"""Remote command falls back to bare 'ezpz' when sys.argv[0] is empty."""
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
# Exercise the same entry point as the other kill tests
# (this should construct and run the remote ssh command)
kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])
# The last positional argument to ssh should be the remote shell command
ssh_cmd = mock_run.call_args[0][0]
remote_cmd = ssh_cmd[-1]
assert remote_cmd.startswith("ezpz ")
assert "--signal TERM" in remote_cmd
assert "--dry-run" in remote_cmd
@patch("sys.argv", [])
@patch("subprocess.run")
def test_remote_falls_back_to_bare_ezpz_when_argv_empty(self, mock_run):
"""Remote command falls back to bare 'ezpz' when sys.argv is empty."""
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])
ssh_cmd = mock_run.call_args[0][0]
remote_cmd = ssh_cmd[-1]
assert remote_cmd.startswith("ezpz ")
assert "--signal TERM" in remote_cmd
assert "--dry-run" in remote_cmd- Ensure that
kill_mainis imported at the top oftests/test_kill.pyand that it is the same entry point used by the existing tests (for example,from ezpz.kill import main as kill_main). If the existing tests invoke a different helper to construct the remote ssh command, update the two new tests to call that same helper instead. - If the remote ssh command is obtained from
mock_rundifferently in your codebase (for example, if the remote command is not the last element of the list passed tosubprocess.run, or ifshell=Trueis used and the entire command is a single string), adjust thessh_cmd/remote_cmdextraction accordingly to mirrortest_remote_uses_absolute_path_to_local_ezpz.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 502cd7c49e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ezpz_bin = sys.argv[0] if sys.argv and sys.argv[0] else "ezpz" | ||
| remote_cmd = [ezpz_bin, "kill", "--signal", sig_name] |
There was a problem hiding this comment.
Preserve PATH fallback for remote ezpz invocation
Using sys.argv[0] unconditionally here can break --all-nodes in environments where the launcher path on the head node is not valid on workers (for example, symlinked wrappers, node-local venv paths, or different mount points), because the remote command no longer tries ezpz from the worker's PATH and will fail with exit 127 even when ezpz is installed there. This is a regression from the previous behavior for those setups; add a remote fallback path (or validate/normalize sys.argv[0] before forcing it).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR aims to fix ezpz kill --all-nodes failing on remote nodes due to ezpz not being found on non-interactive SSH command execution PATH, by constructing the remote command using the locally-running ezpz executable path and adding/adjusting tests around SSH command construction.
Changes:
- Update
_ssh_killto build the remote command starting withsys.argv[0](fallback to"ezpz"). - Update existing
_ssh_killunit tests to patchsys.argv. - Add a regression test asserting the remote command uses an absolute ezpz path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/ezpz/utils/kill.py |
Changes remote SSH command construction to use sys.argv[0] as the ezpz executable. |
tests/test_kill.py |
Adds/updates unit tests to assert the remote command uses an absolute ezpz path and to patch sys.argv for determinism. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Use the absolute path to the locally-running ezpz binary instead | ||
| # of the bare name `ezpz`. SSH command execution doesn't run an | ||
| # interactive shell, so the user's .zshrc/.bashrc PATH additions | ||
| # (e.g. the venv's bin/) aren't loaded — bare `ezpz` would resolve | ||
| # only if it's on the system PATH, which it usually isn't. The | ||
| # absolute path works because every worker mounts the same venv | ||
| # filesystem (Lustre/NFS) at the same path. | ||
| ezpz_bin = sys.argv[0] if sys.argv and sys.argv[0] else "ezpz" | ||
| remote_cmd = [ezpz_bin, "kill", "--signal", sig_name] |
| @patch("sys.argv", ["/lus/.../venv/bin/ezpz"]) | ||
| @patch("subprocess.run") | ||
| def test_remote_uses_absolute_path_to_local_ezpz(self, mock_run): | ||
| """Remote command uses sys.argv[0] (absolute path), not bare 'ezpz'. | ||
|
|
||
| SSH command-mode invocations don't source the user's shell rc, | ||
| so the venv's bin/ isn't on the remote PATH. The bare-name | ||
| form fails with 'command not found: ezpz' even when the venv | ||
| is mounted at the same path on the remote node. | ||
| """ | ||
| mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="") | ||
| kill_mod._ssh_kill("node01", None, "TERM", dry_run=False) | ||
| remote_cmd = mock_run.call_args[0][0][-1] | ||
| assert "/lus/.../venv/bin/ezpz" in remote_cmd | ||
| # And does NOT lead with the bare name. | ||
| assert not remote_cmd.startswith("ezpz ") |
Summary
`ezpz kill --all-nodes` failed on every remote node with:
```
[] FAILED (exit 127): zsh:1: command not found: ezpz
```
Root cause: SSH command-mode invocations (`ssh node ""`) don't
source the user's `.zshrc` / `.bashrc`, so PATH only includes the
system defaults. The venv's `bin/` where `ezpz` lives isn't on the
remote PATH, and the bare command name fails to resolve.
Fix
Use `sys.argv[0]` (the absolute path to the locally-running ezpz
binary) for the remote command instead of the bare name `ezpz`.
Every worker mounts the same venv at the same path (Lustre/NFS), so
the absolute path resolves identically everywhere.
Falls back to `ezpz` when `sys.argv[0]` is empty (defensive — should
never happen in practice).
Test plan
test asserts the remote command uses the absolute path from
`sys.argv[0]` instead of the bare name. Existing `_ssh_kill` tests
now also patch `sys.argv` so they don't depend on however pytest
happens to launch.
succeeds on every remote node (no `command not found: ezpz`).
Summary by Sourcery
Ensure remote ezpz kill invocations over SSH use a resolvable binary path and are covered by tests.
Bug Fixes:
Tests: