adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py by francesco-bertolotti · Pull Request #3014 · NVIDIA/TransformerEngine

francesco-bertolotti · 2026-05-20T05:00:21Z

PR splitted from #3013

I have added NVIDIA_TF32_OVERRIDE=0 to test_numerics.py otherwise I would get test failing for small numerical mismatch with layer norms. This has also been done for test_mhc.py.

greptile-apps · 2026-05-20T05:02:05Z

Greptile Summary

This PR adds NVIDIA_TF32_OVERRIDE=0 to the test_numerics.py invocation in test.sh, mirroring the same environment variable already set for test_mhc.py. The intent is to force full FP32 precision on Ampere and later GPUs, preventing TF32-induced numerical mismatches that were causing intermittent failures in layer norm tests.

NVIDIA_TF32_OVERRIDE=0 is appended to the existing set of determinism flags (PYTORCH_JIT=0, NVTE_TORCH_COMPILE=0, NVTE_ALLOW_NONDETERMINISTIC_ALGO=0, NVTE_FUSED_ATTN=0) already used for test_numerics.py.
The sibling test test_mhc.py (line 63) already uses the same flag and includes an inline comment explaining the rationale; no such comment was added alongside the new change.

Confidence Score: 4/5

Safe to merge — the change is a one-line addition of a well-understood environment variable that forces FP32 precision, consistent with how other numerics-sensitive tests in the same script are already configured.

The change is minimal and follows an established pattern in the file. The only gap is a missing inline comment explaining the rationale, which the test_mhc.py line directly below already has. No functional risk is introduced.

No files require special attention. test_cuda_graphs.py shares the same determinism flags but does not get NVIDIA_TF32_OVERRIDE=0; this may be intentional but is worth a quick sanity check.

Important Files Changed

Filename	Overview
qa/L0_pytorch_unittest/test.sh	Adds NVIDIA_TF32_OVERRIDE=0 to the test_numerics.py invocation to prevent TF32-induced numerical mismatches in layer norm tests; mirrors the same pattern already used for test_mhc.py but lacks the inline comment that explains the rationale there.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[test.sh] --> B[test_numerics.py NVIDIA_TF32_OVERRIDE=0 NEW]
    A --> C[test_cuda_graphs.py no NVIDIA_TF32_OVERRIDE]
    A --> D[test_mhc.py NVIDIA_TF32_OVERRIDE=0 + inline comment]
    B -->|disables TF32 on Ampere+| E[full FP32 precision]
    D -->|same effect| E

_{Reviews (1): Last reviewed commit: "adding NVIDIA_TF32_OVERRIDE=0 to test_nu..." | Re-trigger Greptile}

greptile-apps · 2026-05-20T05:02:09Z

 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_custom_recipe.xml $TE_PATH/tests/pytorch/test_custom_recipe.py || test_fail "test_custom_recipe.py"
 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_deferred_init.xml $TE_PATH/tests/pytorch/test_deferred_init.py || test_fail "test_deferred_init.py"
-PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 NVTE_FUSED_ATTN=0 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_numerics.xml $TE_PATH/tests/pytorch/test_numerics.py || test_fail "test_numerics.py"
+PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 NVTE_FUSED_ATTN=0 NVIDIA_TF32_OVERRIDE=0 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_numerics.xml $TE_PATH/tests/pytorch/test_numerics.py || test_fail "test_numerics.py"


The test_mhc.py line directly below carries an inline comment explaining why NVIDIA_TF32_OVERRIDE=0 is needed. Adding a similar comment here would keep the rationale self-documented and help future readers understand why this flag is required for numerical tests.

Suggested change

PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 NVTE_FUSED_ATTN=0 NVIDIA_TF32_OVERRIDE=0 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_numerics.xml $TE_PATH/tests/pytorch/test_numerics.py || test_fail "test_numerics.py"

# Disable TF32 path to fully align with the pytorch reference implementation's precision (avoids layer norm numerical mismatches on Ampere+)

PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 NVTE_FUSED_ATTN=0 NVIDIA_TF32_OVERRIDE=0 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_numerics.xml $TE_PATH/tests/pytorch/test_numerics.py || test_fail "test_numerics.py"

francesco-bertolotti · 2026-05-20T05:11:59Z

I do not know if it helps, these are the tests failing without the tf32 env flag:

FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-True-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-True-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-False-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[True-False-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-True-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-True-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-False-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [50] with -3.5490684509277344 vs -3.555588245391...
FAILED tests/pytorch/test_numerics.py::test_linear_accuracy[False-False-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [61] with 2.6603221893310547 vs 2.65149784088134...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=113. Maximum difference at location [0, 165] with 0.04553138092160225 vs 0.0458293...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-True-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[True-False-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-True-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-True-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-LayerNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-LayerNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=11. Maximum difference at location [0, 21] with 0.029382700100541115 vs 0.02909549...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-RMSNorm-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=56. Maximum difference at location [0, 209] with 0.0030347853899002075 vs 0.003319...
FAILED tests/pytorch/test_numerics.py::test_layernorm_linear_accuracy[False-False-False-RMSNorm-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=28. Maximum difference at location [1, 421] with -0.017969021573662758 vs -0.01827...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-True-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[True-False-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-True-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=14. Maximum difference at location [0, 99] with -0.07819076627492905 vs -0.1000889...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=7. Maximum difference at location [0, 99] with -0.07819075882434845 vs -0.10008895...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=105. Maximum difference at location [75] with -0.04723335802555084 vs -0.014243453...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=22. Maximum difference at location [0] with -1.3210333585739136 vs -1.299375057220...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=50. Maximum difference at location [100] with 0.18327626585960388 vs 0.38112670183...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=121. Maximum difference at location [1, 15] with -0.0071725500747561455 vs 0.01547...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=351. Maximum difference at location [77] with -0.04859239235520363 vs -0.076162673...
FAILED tests/pytorch/test_numerics.py::test_layernorm_mlp_accuracy[False-False-RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=263. Maximum difference at location [56] with 0.03622261434793472 vs 0.12370993196...

adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py

117ede4

francesco-bertolotti mentioned this pull request May 20, 2026

mnnvl guard #3013

Open

greptile-apps Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py#3014

adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py#3014
francesco-bertolotti wants to merge 1 commit into
NVIDIA:mainfrom
francesco-bertolotti:f14-tf32override

francesco-bertolotti commented May 20, 2026

Uh oh!

greptile-apps Bot commented May 20, 2026

Uh oh!

greptile-apps Bot May 20, 2026

Uh oh!

francesco-bertolotti commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 NVTE_FUSED_ATTN=0 NVIDIA_TF32_OVERRIDE=0 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_numerics.xml $TE_PATH/tests/pytorch/test_numerics.py \|\| test_fail "test_numerics.py"
	# Disable TF32 path to fully align with the pytorch reference implementation's precision (avoids layer norm numerical mismatches on Ampere+)
	PYTORCH_JIT=0 NVTE_TORCH_COMPILE=0 NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 NVTE_FUSED_ATTN=0 NVIDIA_TF32_OVERRIDE=0 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_numerics.xml $TE_PATH/tests/pytorch/test_numerics.py \|\| test_fail "test_numerics.py"

Conversation

francesco-bertolotti commented May 20, 2026

Uh oh!

greptile-apps Bot commented May 20, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

francesco-bertolotti commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant