Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions superbench/benchmarks/micro_benchmarks/nvbandwidth.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,10 +190,17 @@ def _process_raw_line(self, line, parse_status):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re_summary_pattern.match(line) is still evaluated twice — once in this if, then again on the line below to assign match. Copilot raised this in #discussion_r2878517701 and @guoshzhao explicitly asked for it to be addressed in #discussion_r3070415674; please collapse the two calls:

match = self.re_summary_pattern.match(line)
if match:
    value = match.group(2)
    ...

# Parse summary results
if self.re_summary_pattern.match(line):
value = self.re_summary_pattern.match(line).group(2)
test_name = parse_status['test_name']
match = self.re_summary_pattern.match(line)
value = match.group(2)
Comment on lines 192 to +194
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re_summary_pattern.match(line) is evaluated twice (once in the if and again to assign match). Consider assigning once (e.g., match = ...; if match:) to avoid duplicated work and keep the control flow simpler.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NJX-njx Could you please resolve the comments from Copilot? Thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the three new behaviors here (or match.group(1).lower() fallback, 'latency' in test_name inference, if test_name and benchmark_type guard) is exercised by the existing test suite — tests/data/nvbandwidth_results.log and test_nvbandwidth.py::test_nvbandwidth_result_parsing_real_output only cover the happy memcpy ... / memory latency ... header path. Without a regression test, a future refactor can silently undo the fix from issue #748.

Copilot already flagged this in #discussion_r2878517758; please add a unit test using a synthetic raw output that triggers each new path. Minimal example exercising the device_to_device_latency_sm regression:

def test_nvbandwidth_sum_fallback_for_unknown_header(self):
    benchmark_name = 'nvbandwidth'
    (cls, _) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, Platform.CUDA)
    benchmark = cls(benchmark_name, parameters='')
    assert benchmark._preprocess()

    raw_output = (
        'Running device_to_device_latency_sm.\n'
        'Device to Device Latency SM GPU(row) <-> GPU(column) (ns)\n'
        '           0         1\n'
        ' 0    772.58    810.10\n'
        ' 1    810.10    772.58\n'
        '\n'
        'SUM device_to_device_latency_sm 3165.36\n'
    )
    assert benchmark._process_raw_result(0, raw_output)
    assert benchmark.result['device_to_device_latency_sm_sum_lat'][0] == 3165.36
    # \u2191 with the suggested regex broadening (see [comment on L198](https://github.com/microsoft/superbenchmark/pull/782#discussion_r3326258884)), per-cell metrics should also be present.

A second case covering the test_name == '' and benchmark_type is None drop path (asserting the SUM is skipped and a warning is logged) would also be valuable, especially if the logger.warning suggestion on L200 is adopted.

# Use test_name from parse_status, fallback to group(1) from SUM line
test_name = parse_status['test_name'] or match.group(1).lower()
benchmark_type = parse_status['benchmark_type']
parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
# Infer benchmark_type from test_name if not set (e.g., after waived tests)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback fixes the SUM symptom but masks the true root cause of device_to_device_latency_sm_sum_None from issue #748.

Upstream nvbandwidth prints the matrix header for this test as:

Device to Device Latency SM GPU(row) <-> GPU(column) (ns)

(see third_party/nvbandwidth/testcases_sm.cpp:141). That line does NOT start with memcpy or memory latency, so re_matrix_header_line at line 19 never fires for this test. As a consequence:

  1. parse_status['benchmark_type'] stays None.
  2. metrix_row / metrix_col are never extracted.
  3. The matrix-row branch at line 167 is guarded by parse_status['benchmark_type'] and therefore silently skips all per-cell results for device_to_device_latency_sm.

With this PR, the SUM is rescued via the 'latency' in test_name heuristic, but the per-cell device_to_device_latency_sm_gpu0_gpu1_lat (etc.) results are still lost — users will only see the SUM, not the matrix.

A more complete fix is to broaden the header detector itself, e.g.:

re_matrix_header_line = re.compile(r'^(memcpy|memory latency|Device to Device Latency)', re.IGNORECASE)

The existing branch 'bw' if 'bandwidth' in line else 'lat' already classifies correctly for both new and old header lines, so the fallback + inference here can stay as a defense-in-depth net while the regex change recovers the lost matrix data.

if benchmark_type is None:
benchmark_type = 'lat' if 'latency' in test_name else 'bw'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this guard drops a SUM line (either test_name is empty or benchmark_type is still falsy after inference), nothing is logged. That trades a malformed metric name for silent data loss, which makes the next regression (a new upstream output format change) hard to detect from a passing run.

Please emit a warning here so the drop is observable, e.g.:

if test_name and benchmark_type:
    parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
else:
    logger.warning(
        f'nvbandwidth: skipping SUM line, incomplete parse state '
        f'(test_name={test_name!r}, benchmark_type={benchmark_type!r}, line={line!r})'
    )

The warning also gives users a clear signal in the runner logs when a future nvbandwidth release adds a test whose header line doesn't match the existing detection patterns.

# Only add result when we have valid metric name (avoid _sum_None or sum_None)
if test_name and benchmark_type:
parse_status['results'][f'{test_name}_sum_{benchmark_type}'] = float(value)
Comment on lines +195 to +203
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fallback/inference paths for SUM parsing (using match.group(1) when parse_status['test_name'] is empty, and inferring benchmark_type when it's None) aren’t covered by existing nvbandwidth tests (current tests only exercise the full header+matrix path). Adding a unit test with a minimal raw output containing a SUM ... line but no prior matrix header (and/or no preceding Running ...) would help prevent regressions of the _sum_None metric-name bug.

Copilot uses AI. Check for mistakes.

# Reset parsing state for next test
parse_status['test_name'] = ''
Expand Down
Loading