Skip to content

Commit e0ffd3d

Browse files
authored
fix(test): widen hallucination detection tolerance (#809) (#810)
Logprob-derived scores drift ~0.036 across runs due to inference non-determinism. Widen from abs=3e-2 to abs=5e-2 to absorb jitter while still catching real regressions.
1 parent 8860f64 commit e0ffd3d

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

test/stdlib/components/intrinsic/test_rag.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -158,12 +158,12 @@ def test_hallucination_detection(backend):
158158
result = rag.flag_hallucinated_content(assistant_response, docs, context, backend)
159159
# pytest.approx() chokes on lists of records, so we do this complicated dance.
160160
for r, e in zip(result, expected, strict=True): # type: ignore
161-
assert pytest.approx(r, abs=3e-2) == e
161+
assert pytest.approx(r, abs=5e-2) == e
162162

163163
# Second call hits a different code path from the first one
164164
result = rag.flag_hallucinated_content(assistant_response, docs, context, backend)
165165
for r, e in zip(result, expected, strict=True): # type: ignore
166-
assert pytest.approx(r, abs=3e-2) == e
166+
assert pytest.approx(r, abs=5e-2) == e
167167

168168

169169
@pytest.mark.qualitative

0 commit comments

Comments
 (0)