Skip to content

Commit d59f6ab

Browse files
committed
Fix pagerank convergence threshold for large sparse graphs
The previous convergence check `norm < n * tol` scaled the L1 tolerance by graph size, which made it a useless threshold once N > 2/tol (since L1 distance between probability vectors is bounded by 2). On large sparse graphs, the first power-iteration step's L1 diff from the uniform starting vector could trivially fall below `n * tol`, causing `pagerank` to return the initial uniform 1/N distribution without any indication of failure. Minimal reproduction: a 2000-node graph with 2 edges (path 0->1->2) returns `pr[2] = 0.0005` (uniform) instead of the correct `pr[2] = 0.00128` (2.6x above uniform). This patch changes the check to `norm < tol` (absolute L1 tolerance, matching the docstring semantics) and adds a regression test. Fixes #1575
1 parent 3a64257 commit d59f6ab

3 files changed

Lines changed: 52 additions & 1 deletion

File tree

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
fixes:
3+
- |
4+
Fixed a silent convergence bug in :func:`~rustworkx.pagerank` that caused
5+
the function to return the initial uniform ``1/N`` distribution on large
6+
sparse graphs with no error. The convergence check ``norm < n * tol``
7+
scaled the L1 tolerance by graph size, which (since L1 distance between
8+
probability vectors is bounded by 2) rendered the threshold useless once
9+
``N > 2/tol``. On such graphs the first power-iteration step's L1 diff
10+
from the uniform starting vector could trivially fall below ``n * tol``,
11+
incorrectly reporting convergence. The threshold is now an absolute
12+
``norm < tol``, matching the docstring semantics. See
13+
`#1575 <https://github.com/Qiskit/rustworkx/issues/1575>`__ for details.

src/link_analysis.rs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,14 @@ pub fn pagerank(
199199
let new_popularity =
200200
alpha * ((&a * &popularity) + (dangling_sum * &dangling_weights)) + &damping;
201201
let norm: f64 = new_popularity.l1_dist(&popularity).unwrap();
202-
if norm < (n as f64) * tol {
202+
// The L1 distance between two probability vectors is bounded by 2, so
203+
// `(n as f64) * tol` becomes a useless threshold once N > 2/tol (e.g.
204+
// N > 2000 with the default tol = 1e-6). On large sparse graphs the
205+
// first power-iteration step's L1 diff from the uniform starting
206+
// vector can trivially fall below `n * tol`, causing this check to
207+
// return the initial uniform vector and report convergence silently.
208+
// See https://github.com/Qiskit/rustworkx/issues/1575
209+
if norm < tol {
203210
has_converged = true;
204211
break;
205212
} else {

tests/digraph/test_pagerank.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,3 +321,34 @@ def test_multi_digraph_versus_weighted(self):
321321

322322
for v in multi_graph.node_indices():
323323
self.assertAlmostEqual(ranks_multi[v], ranks_weight[v], delta=1.0e-4)
324+
325+
def test_sparse_large_graph_does_not_return_uniform(self):
326+
"""Regression test for #1575.
327+
328+
On a large graph with very few active edges, the first power-iteration
329+
step's L1 diff from the uniform starting vector can be very small.
330+
The old convergence check `norm < n * tol` would trip on iteration 0
331+
(because `n * tol` grows with graph size) and return the uniform
332+
initial vector, silently corrupting results.
333+
334+
This test builds a 2000-node graph with only 2 edges (path 0->1->2)
335+
and verifies that pagerank returns non-uniform scores — specifically
336+
that node 2 has a higher score than node 0 (since mass flows 0->1->2).
337+
"""
338+
graph = rustworkx.PyDiGraph()
339+
for _ in range(2000):
340+
graph.add_node(None)
341+
graph.add_edge(0, 1, None)
342+
graph.add_edge(1, 2, None)
343+
344+
ranks = rustworkx.pagerank(graph, alpha=0.85)
345+
346+
uniform_value = 1.0 / 2000
347+
# Node 2 should have meaningfully higher PR than uniform
348+
self.assertGreater(ranks[2], uniform_value * 2.0)
349+
# Node 1 should also have higher PR than uniform (mass flows through it)
350+
self.assertGreater(ranks[1], uniform_value * 1.5)
351+
# Node 0 (dangling endpoint of flow) should be near uniform
352+
self.assertAlmostEqual(ranks[0], uniform_value, delta=uniform_value * 0.5)
353+
# A randomly picked isolated node should be uniform
354+
self.assertAlmostEqual(ranks[500], uniform_value, delta=uniform_value * 0.1)

0 commit comments

Comments
 (0)