Skip to content

Commit dbd432c

Browse files
authored
[BUG] race condition in OpenMLSplitTest when running tests in parallel (#1643)
## Problem When running tests in parallel with pytest-xdist (e.g., "pytest -n 3 tests/test_tasks/test_split.py"), one test under OpenMLSplitTest fails intermittently with an EOFError during pickle.load(). This was identified in CI job/63346513831 and reproduces roughly 1 out of 10 runs locally. ## Analysis The root cause is that all test instances share the same pickle cache file path (`self.pd_filename`). When multiple workers run concurrently: 1. Worker A creates the pickle cache file during test execution 2. Worker B reads the pickle cache file 3. Worker A's tearDown() deletes the file 4. Worker B's pickle.load() encounters a partially deleted file → EOFError This is a classic race condition on shared filesystem state. ## Solution Use `tempfile.mkdtemp()` to create a unique temporary directory for each test instance, then copy the ARFF source file there. This ensures: - Each test worker has its own isolated pickle cache file - No shared state between parallel workers - Automatic cleanup via shutil.rmtree() in tearDown() The fix is minimal (10 insertions, 3 deletions) and doesn't change the test logic - only the test isolation mechanism. ## Benchmarks / Testing Ran 5 consecutive parallel test executions: ``` pytest -n 4 tests/test_tasks/test_split.py # 5 times ``` All 15 test runs (3 tests × 5 runs) passed successfully. Before the fix, failures occurred ~10% of the time with parallel execution. Fixes #1641
1 parent 7feb2a3 commit dbd432c

1 file changed

Lines changed: 10 additions & 3 deletions

File tree

tests/test_tasks/test_split.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33

44
import inspect
55
import os
6+
import shutil
7+
import tempfile
68
from pathlib import Path
79

810
import numpy as np
@@ -19,7 +21,7 @@ def setUp(self):
1921
__file__ = inspect.getfile(OpenMLSplitTest)
2022
self.directory = os.path.dirname(__file__)
2123
# This is for dataset
22-
self.arff_filepath = (
24+
source_arff = (
2325
Path(self.directory).parent
2426
/ "files"
2527
/ "org"
@@ -29,13 +31,18 @@ def setUp(self):
2931
/ "1882"
3032
/ "datasplits.arff"
3133
)
34+
# Use a unique temp directory for each test to avoid race conditions
35+
# when running tests in parallel (see issue #1641)
36+
self._temp_dir = tempfile.TemporaryDirectory()
37+
self.arff_filepath = Path(self._temp_dir.name) / "datasplits.arff"
38+
shutil.copy(source_arff, self.arff_filepath)
3239
self.pd_filename = self.arff_filepath.with_suffix(".pkl.py3")
3340

3441
def tearDown(self):
42+
# Clean up the entire temp directory
3543
try:
36-
os.remove(self.pd_filename)
44+
self._temp_dir.cleanup()
3745
except (OSError, FileNotFoundError):
38-
# Replaced bare except. Not sure why these exceptions are acceptable.
3946
pass
4047

4148
def test_eq(self):

0 commit comments

Comments
 (0)