Skip to content

fix(kill): use absolute path to ezpz binary in SSH command#132

Open
saforem2 wants to merge 1 commit intomainfrom
fix/kill-ssh-path
Open

fix(kill): use absolute path to ezpz binary in SSH command#132
saforem2 wants to merge 1 commit intomainfrom
fix/kill-ssh-path

Conversation

@saforem2
Copy link
Copy Markdown
Owner

@saforem2 saforem2 commented May 5, 2026

Summary

`ezpz kill --all-nodes` failed on every remote node with:

```
[] FAILED (exit 127): zsh:1: command not found: ezpz
```

Root cause: SSH command-mode invocations (`ssh node ""`) don't
source the user's `.zshrc` / `.bashrc`, so PATH only includes the
system defaults. The venv's `bin/` where `ezpz` lives isn't on the
remote PATH, and the bare command name fails to resolve.

Fix

Use `sys.argv[0]` (the absolute path to the locally-running ezpz
binary) for the remote command instead of the bare name `ezpz`.
Every worker mounts the same venv at the same path (Lustre/NFS), so
the absolute path resolves identically everywhere.

Falls back to `ezpz` when `sys.argv[0]` is empty (defensive — should
never happen in practice).

Test plan

  • `pytest tests/test_kill.py` — 29 passing (was 28). New regression
    test asserts the remote command uses the absolute path from
    `sys.argv[0]` instead of the bare name. Existing `_ssh_kill` tests
    now also patch `sys.argv` so they don't depend on however pytest
    happens to launch.
  • On a real PBS allocation: `ezpz kill --all-nodes --dry-run`
    succeeds on every remote node (no `command not found: ezpz`).

Summary by Sourcery

Ensure remote ezpz kill invocations over SSH use a resolvable binary path and are covered by tests.

Bug Fixes:

  • Use the absolute path of the locally running ezpz binary in SSH commands so remote kills succeed even when the venv bin/ is not on PATH.

Tests:

  • Extend SSH kill tests to assert the absolute-path binary is used and patch sys.argv to avoid relying on the test runner invocation.

`ezpz kill --all-nodes` failed on every remote node with:

    [<node>] FAILED (exit 127): zsh:1: command not found: ezpz

Root cause: SSH command-mode invocations (`ssh node "<cmd>"`) don't
source the user's .zshrc/.bashrc, so PATH only includes the system
defaults. The venv's bin/ where `ezpz` lives isn't on the remote
PATH, and the bare command name fails to resolve.

Fix: use sys.argv[0] (the absolute path to the locally-running
ezpz binary) for the remote command instead of the bare name
"ezpz". Every worker mounts the same venv at the same path
(Lustre/NFS), so the absolute path resolves identically everywhere.

Falls back to "ezpz" when sys.argv[0] is empty (defensive — should
never happen in practice).

Added a regression test that mocks sys.argv to /lus/.../venv/bin/ezpz
and asserts the remote command uses that absolute path. The two
existing _ssh_kill tests now also patch sys.argv so they don't
depend on however pytest happens to launch.
Copilot AI review requested due to automatic review settings May 5, 2026 18:09
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 5, 2026

Reviewer's Guide

Updates SSH-based kill command to invoke ezpz via its absolute binary path derived from sys.argv[0], ensuring it resolves correctly on remote nodes whose SSH sessions lack the virtualenv path, and adds tests that patch sys.argv to assert this behavior and decouple tests from pytest’s invocation details.

Sequence diagram for SSH kill using absolute ezpz binary path

sequenceDiagram
    actor User
    participant LocalEzpz as Local_ezpz_CLI
    participant SSH as SSH_Client
    participant RemoteEzpz as Remote_ezpz_CLI

    User->>LocalEzpz: run ezpz kill --all-nodes
    LocalEzpz->>LocalEzpz: build remote_cmd using sys.argv[0]
    LocalEzpz->>SSH: ssh node remote_cmd
    SSH->>RemoteEzpz: exec /path/to/venv/bin/ezpz kill --signal SIG
    RemoteEzpz->>RemoteEzpz: perform kill logic
    RemoteEzpz-->>SSH: returncode, stderr
    SSH-->>LocalEzpz: returncode, stderr
    LocalEzpz-->>User: aggregated results per node
Loading

File-Level Changes

Change Details Files
Use absolute ezpz binary path in SSH kill command to avoid PATH issues on remote nodes.
  • Derive ezpz binary path from sys.argv[0], falling back to the bare 'ezpz' name if sys.argv is missing or empty.
  • Replace the previous remote command construction that used 'ezpz' with one that uses the resolved ezpz_bin path for the SSH command.
  • Document the rationale for using an absolute path in comments, noting SSH non-interactive shells do not load user PATH customizations.
src/ezpz/utils/kill.py
Harden and extend SSH kill tests to assert use of absolute ezpz path and avoid dependence on test runner invocation.
  • Patch sys.argv with a fake venv ezpz path in existing _ssh_kill tests so they assert against a known binary path instead of whatever pytest uses.
  • Add a new regression test that verifies the remote SSH command includes the absolute ezpz path from sys.argv[0] and does not start with the bare 'ezpz' command name.
  • Ensure subprocess.run is mocked to prevent real SSH execution and to capture the constructed remote command for assertions.
tests/test_kill.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Relying directly on sys.argv[0] assumes it is an absolute, executable path from a console script; consider normalizing it with os.path.abspath and/or validating it exists before falling back to the bare ezpz name.
  • The implementation hard-codes the assumption that all workers mount the same venv path; if that assumption might change, consider making the remote ezpz path configurable (e.g., via an environment variable or option) instead of baking it into sys.argv[0].
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Relying directly on `sys.argv[0]` assumes it is an absolute, executable path from a console script; consider normalizing it with `os.path.abspath` and/or validating it exists before falling back to the bare `ezpz` name.
- The implementation hard-codes the assumption that all workers mount the same venv path; if that assumption might change, consider making the remote ezpz path configurable (e.g., via an environment variable or option) instead of baking it into `sys.argv[0]`.

## Individual Comments

### Comment 1
<location path="tests/test_kill.py" line_range="215-224" />
<code_context>
+    @patch("sys.argv", ["/lus/.../venv/bin/ezpz"])
</code_context>
<issue_to_address>
**suggestion (testing):** Add a regression test for the fallback behaviour when sys.argv[0] is empty or sys.argv is falsy.

Currently the defensive path (falling back to "ezpz" when sys.argv is empty or sys.argv[0] is falsy) isn’t covered by tests. Please add a regression test that patches sys.argv to [] or [""], and asserts that the constructed remote command starts with bare "ezpz" while still including the expected flags. This will safeguard and document the behaviour for malformed argv environments.

Suggested implementation:

```python
        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")

    @patch("sys.argv", [""])
    @patch("subprocess.run")
    def test_remote_falls_back_to_bare_ezpz_when_argv0_empty(self, mock_run):
        """Remote command falls back to bare 'ezpz' when sys.argv[0] is empty."""
        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")

        # Exercise the same entry point as the other kill tests
        # (this should construct and run the remote ssh command)
        kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])

        # The last positional argument to ssh should be the remote shell command
        ssh_cmd = mock_run.call_args[0][0]
        remote_cmd = ssh_cmd[-1]

        assert remote_cmd.startswith("ezpz ")
        assert "--signal TERM" in remote_cmd
        assert "--dry-run" in remote_cmd

    @patch("sys.argv", [])
    @patch("subprocess.run")
    def test_remote_falls_back_to_bare_ezpz_when_argv_empty(self, mock_run):
        """Remote command falls back to bare 'ezpz' when sys.argv is empty."""
        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")

        kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])

        ssh_cmd = mock_run.call_args[0][0]
        remote_cmd = ssh_cmd[-1]

        assert remote_cmd.startswith("ezpz ")
        assert "--signal TERM" in remote_cmd
        assert "--dry-run" in remote_cmd

```

1. Ensure that `kill_main` is imported at the top of `tests/test_kill.py` and that it is the same entry point used by the existing tests (for example, `from ezpz.kill import main as kill_main`). If the existing tests invoke a different helper to construct the remote ssh command, update the two new tests to call that same helper instead.
2. If the remote ssh command is obtained from `mock_run` differently in your codebase (for example, if the remote command is not the last element of the list passed to `subprocess.run`, or if `shell=True` is used and the entire command is a single string), adjust the `ssh_cmd`/`remote_cmd` extraction accordingly to mirror `test_remote_uses_absolute_path_to_local_ezpz`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/test_kill.py
Comment on lines +215 to +224
@patch("sys.argv", ["/lus/.../venv/bin/ezpz"])
@patch("subprocess.run")
def test_remote_uses_absolute_path_to_local_ezpz(self, mock_run):
"""Remote command uses sys.argv[0] (absolute path), not bare 'ezpz'.

SSH command-mode invocations don't source the user's shell rc,
so the venv's bin/ isn't on the remote PATH. The bare-name
form fails with 'command not found: ezpz' even when the venv
is mounted at the same path on the remote node.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a regression test for the fallback behaviour when sys.argv[0] is empty or sys.argv is falsy.

Currently the defensive path (falling back to "ezpz" when sys.argv is empty or sys.argv[0] is falsy) isn’t covered by tests. Please add a regression test that patches sys.argv to [] or [""], and asserts that the constructed remote command starts with bare "ezpz" while still including the expected flags. This will safeguard and document the behaviour for malformed argv environments.

Suggested implementation:

        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")

    @patch("sys.argv", [""])
    @patch("subprocess.run")
    def test_remote_falls_back_to_bare_ezpz_when_argv0_empty(self, mock_run):
        """Remote command falls back to bare 'ezpz' when sys.argv[0] is empty."""
        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")

        # Exercise the same entry point as the other kill tests
        # (this should construct and run the remote ssh command)
        kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])

        # The last positional argument to ssh should be the remote shell command
        ssh_cmd = mock_run.call_args[0][0]
        remote_cmd = ssh_cmd[-1]

        assert remote_cmd.startswith("ezpz ")
        assert "--signal TERM" in remote_cmd
        assert "--dry-run" in remote_cmd

    @patch("sys.argv", [])
    @patch("subprocess.run")
    def test_remote_falls_back_to_bare_ezpz_when_argv_empty(self, mock_run):
        """Remote command falls back to bare 'ezpz' when sys.argv is empty."""
        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")

        kill_main(["--signal", "TERM", "--dry-run", "some-pattern"])

        ssh_cmd = mock_run.call_args[0][0]
        remote_cmd = ssh_cmd[-1]

        assert remote_cmd.startswith("ezpz ")
        assert "--signal TERM" in remote_cmd
        assert "--dry-run" in remote_cmd
  1. Ensure that kill_main is imported at the top of tests/test_kill.py and that it is the same entry point used by the existing tests (for example, from ezpz.kill import main as kill_main). If the existing tests invoke a different helper to construct the remote ssh command, update the two new tests to call that same helper instead.
  2. If the remote ssh command is obtained from mock_run differently in your codebase (for example, if the remote command is not the last element of the list passed to subprocess.run, or if shell=True is used and the entire command is a single string), adjust the ssh_cmd/remote_cmd extraction accordingly to mirror test_remote_uses_absolute_path_to_local_ezpz.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 502cd7c49e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/ezpz/utils/kill.py
Comment on lines +245 to +246
ezpz_bin = sys.argv[0] if sys.argv and sys.argv[0] else "ezpz"
remote_cmd = [ezpz_bin, "kill", "--signal", sig_name]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve PATH fallback for remote ezpz invocation

Using sys.argv[0] unconditionally here can break --all-nodes in environments where the launcher path on the head node is not valid on workers (for example, symlinked wrappers, node-local venv paths, or different mount points), because the remote command no longer tries ezpz from the worker's PATH and will fail with exit 127 even when ezpz is installed there. This is a regression from the previous behavior for those setups; add a remote fallback path (or validate/normalize sys.argv[0] before forcing it).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix ezpz kill --all-nodes failing on remote nodes due to ezpz not being found on non-interactive SSH command execution PATH, by constructing the remote command using the locally-running ezpz executable path and adding/adjusting tests around SSH command construction.

Changes:

  • Update _ssh_kill to build the remote command starting with sys.argv[0] (fallback to "ezpz").
  • Update existing _ssh_kill unit tests to patch sys.argv.
  • Add a regression test asserting the remote command uses an absolute ezpz path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/ezpz/utils/kill.py Changes remote SSH command construction to use sys.argv[0] as the ezpz executable.
tests/test_kill.py Adds/updates unit tests to assert the remote command uses an absolute ezpz path and to patch sys.argv for determinism.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ezpz/utils/kill.py
Comment on lines +238 to +246
# Use the absolute path to the locally-running ezpz binary instead
# of the bare name `ezpz`. SSH command execution doesn't run an
# interactive shell, so the user's .zshrc/.bashrc PATH additions
# (e.g. the venv's bin/) aren't loaded — bare `ezpz` would resolve
# only if it's on the system PATH, which it usually isn't. The
# absolute path works because every worker mounts the same venv
# filesystem (Lustre/NFS) at the same path.
ezpz_bin = sys.argv[0] if sys.argv and sys.argv[0] else "ezpz"
remote_cmd = [ezpz_bin, "kill", "--signal", sig_name]
Comment thread tests/test_kill.py
Comment on lines +215 to +230
@patch("sys.argv", ["/lus/.../venv/bin/ezpz"])
@patch("subprocess.run")
def test_remote_uses_absolute_path_to_local_ezpz(self, mock_run):
"""Remote command uses sys.argv[0] (absolute path), not bare 'ezpz'.

SSH command-mode invocations don't source the user's shell rc,
so the venv's bin/ isn't on the remote PATH. The bare-name
form fails with 'command not found: ezpz' even when the venv
is mounted at the same path on the remote node.
"""
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
kill_mod._ssh_kill("node01", None, "TERM", dry_run=False)
remote_cmd = mock_run.call_args[0][0][-1]
assert "/lus/.../venv/bin/ezpz" in remote_cmd
# And does NOT lead with the bare name.
assert not remote_cmd.startswith("ezpz ")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants