Skip to content

Fix pagerank convergence threshold for large sparse graphs (#1575)#1576

Open
pathway wants to merge 1 commit intoQiskit:mainfrom
pathway:fix/pagerank-convergence-threshold
Open

Fix pagerank convergence threshold for large sparse graphs (#1575)#1576
pathway wants to merge 1 commit intoQiskit:mainfrom
pathway:fix/pagerank-convergence-threshold

Conversation

@pathway
Copy link
Copy Markdown

@pathway pathway commented Apr 14, 2026

Summary

Fixes #1575 — a silent bug where pagerank() returns the initial uniform 1/N distribution on large sparse graphs instead of the actual PageRank.

Root Cause

The convergence check norm < (n as f64) * tol scales the L1 tolerance by graph size. Since L1 distance between probability vectors is bounded by 2, this threshold becomes useless once N > 2/tol (e.g. any N > 2000 with the default tol=1e-6). The first power-iteration step's L1 diff from the uniform starting vector can trivially fall below n * tol, causing the algorithm to report convergence and return the initial uniform vector — with no error or warning.

Minimal repro (included as regression test):

import rustworkx as rx
import numpy as np

g = rx.PyDiGraph()
for _ in range(2000):
    g.add_node(None)
g.add_edge(0, 1, None)
g.add_edge(1, 2, None)

pr = rx.pagerank(g, alpha=0.85)
# Before this patch: pr[0] == pr[1] == pr[2] == 0.0005 (uniform 1/N)  ← BUG
# After this patch:  pr[2] ≈ 0.00128, pr[1] ≈ 0.00092, pr[0] ≈ 0.00050 ✓

Change

One-line fix: norm < (n as f64) * tolnorm < tol. The tol parameter is now an absolute L1 tolerance, matching the docstring semantics ("error tolerance used when checking for convergence in the power method").

Testing

Added test_sparse_large_graph_does_not_return_uniform in tests/digraph/test_pagerank.py. It builds the 2000-node, 2-edge graph from the repro and asserts:

  • pr[2] is at least 2x uniform (mass accumulates at the sink of the path)
  • pr[1] is at least 1.5x uniform (mass flows through it)
  • pr[0] is close to uniform (dangling source)
  • An isolated node (e.g. node 500) is close to uniform

All existing pagerank tests should continue to pass because the new threshold is strictly tighter — anywhere the old check converged correctly, the new check converges at the same iteration or later (never earlier, never worse).

Release Note

Added under releasenotes/notes/fix-pagerank-convergence-threshold-1575.yaml.

Notes

  • NetworkX's pure-Python pagerank uses the same err < N * tol formula. We haven't verified whether NetworkX exhibits the same failure mode (our environment was missing _bz2 so we couldn't run side-by-side), but if it does, that's worth reporting upstream there too. This rustworkx fix stands on its own either way.
  • Discovered while computing personalized PageRank on an 807K-node talent migration graph where rx.pagerank() silently returned uniform 1/N for all nodes.

The previous convergence check `norm < n * tol` scaled the L1 tolerance
by graph size, which made it a useless threshold once N > 2/tol (since
L1 distance between probability vectors is bounded by 2). On large
sparse graphs, the first power-iteration step's L1 diff from the uniform
starting vector could trivially fall below `n * tol`, causing `pagerank`
to return the initial uniform 1/N distribution without any indication of
failure.

Minimal reproduction: a 2000-node graph with 2 edges (path 0->1->2)
returns `pr[2] = 0.0005` (uniform) instead of the correct `pr[2] =
0.00128` (2.6x above uniform).

This patch changes the check to `norm < tol` (absolute L1 tolerance,
matching the docstring semantics) and adds a regression test.

Fixes Qiskit#1575
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pagerank() silently returns uniform distribution on large sparse graphs due to N*tol convergence threshold

2 participants