Skip to content

Commit 7df7bd5

Browse files
authored
fix: prevent stdlib pyc files from invalidating runtime repos (bazel-contrib#3661)
The runtime repositories are being constantly invalidated due to pyc creation under Bazel 9 because, starting in Bazel 9, `glob()` functions implicitly register `repository_ctx.watch()` calls on the files and directories they match. Thus, the directories where `__pycache__` directories are created end up being considered changed (either directly because their mtimes change, or indirectly, because their directory listing changes), which then invalidates the repo, causing it to re-run. This glob-induced-watching seems to occur even if an `exclude` would have excluded the file. Note that this only seems to occur if `reproducible=False`, which generally wouldn't occur, but could occur if a user is registering their own runtime and doesn't care about the sha. Regardless, this still seems worthwhile because it allows pyc to be more safely be generated without causing repo invalidations, while allowing them to be persisted between repo-phase invocations. To fix, create `__pycache__` directories ahead of time and symlink them to a location that Bazel isn't watching, i.e. outside the repository's directory. I tried creating a separate top-level folder that wasn't matched by any globs and symlinking to it, but Bazel would read through the symlinks and watch the underlying locations. This also has a side-bonus that allows pyc files to be re-used in between repository-phase invocations. Fixes bazel-contrib#3643
1 parent 9369ac3 commit 7df7bd5

5 files changed

Lines changed: 196 additions & 19 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,12 @@ Other changes:
7474
* (zipapp) Resolve issue passing through compression settings in
7575
`py_zippapp_binary` targets
7676
([#3646](https://github.com/bazel-contrib/rules_python/issues/3646)).
77+
* (toolchains) The pyc created at runtime in the stdlib should no longer
78+
cause the Python runtime repository to be invalidated. The stdlib pyc files
79+
_may_ be reused in between invocations, depending upon the sandboxing
80+
configuration. See the {any}`RULES_PYTHON_PYCACHE_DIR` environment variable
81+
for more information.
82+
([#3643](https://github.com/bazel-contrib/rules_python/issues/3643)).
7783

7884
{#v0-0-0-added}
7985
### Added

docs/environment-variables.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,34 @@ Valid values:
116116
* Other non-empty values mean to use isolated mode.
117117
:::
118118

119+
:::{envvar} RULES_PYTHON_PYCACHE_DIR
120+
121+
Determines the directory that runtime-generated pyc cache files will
122+
be stored in.
123+
124+
This directory may be reused between invocations, depending on the sandboxing
125+
configuration. Setting it to `/dev/null` will, in effect, disable runtime
126+
pyc caching. By setting e.g.
127+
`--sandbox_add_mount_pair=/tmp/rules_python_pycache`, it's possible for pyc
128+
caching to persist across invocations.
129+
130+
**Behavior specific to downloaded runtimes:**
131+
First `RULES_PYTHON_PYCACHE_DIR` is checked. If set, it is used as-is for
132+
the root pycache directory.
133+
134+
Otherwise, the following environment variables are checked in the following
135+
order. Their values will have `rules_python_pycache` appended to them to form
136+
the root pycache directory:
137+
1. `XDG_CACHE_HOME`.
138+
2. `TMP` (non-Windows) or `TEMP` (Windows).
139+
3. The common platform-specific temporary directory (`/tmp` (non-Windows) or
140+
`C:\Temp` (Windows)).
141+
142+
If such a diretory cannot be found, or created, then `/dev/null` will be used,
143+
which will effectively disable pyc caching.
144+
145+
:::
146+
119147
:::{envvar} RULES_PYTHON_REPO_DEBUG
120148

121149
When `1`, repository rules will print debug information about what they're

python/private/hermetic_runtime_repo_setup.bzl

Lines changed: 24 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -58,30 +58,35 @@ def define_hermetic_runtime_toolchain_impl(
5858
"major": version_info.release[0],
5959
"minor": version_info.release[1],
6060
}
61+
files_include = [
62+
"bin/**",
63+
"extensions/**",
64+
"include/**",
65+
"libs/**",
66+
"share/**",
67+
]
68+
files_include += extra_files_glob_include
69+
files_exclude = [
70+
# Unused shared libraries. `python` executable and the `:libpython` target
71+
# depend on `libpython{python_version}.so.1.0`.
72+
"lib/libpython{major}.{minor}*.so".format(**version_dict),
73+
# static libraries
74+
"lib/**/*.a",
75+
# tests for the standard libraries.
76+
"lib/python{major}.{minor}*/**/test/**".format(**version_dict),
77+
"lib/python{major}.{minor}*/**/tests/**".format(**version_dict),
78+
# During pyc creation, temp files named *.pyc.NNN are created
79+
"**/__pycache__/*.pyc.*",
80+
]
81+
files_exclude += extra_files_glob_exclude
82+
6183
native.filegroup(
6284
name = "files",
6385
srcs = native.glob(
64-
include = [
65-
"bin/**",
66-
"extensions/**",
67-
"include/**",
68-
"libs/**",
69-
"share/**",
70-
] + extra_files_glob_include,
86+
include = files_include,
7187
# Platform-agnostic filegroup can't match on all patterns.
7288
allow_empty = True,
73-
exclude = [
74-
# Unused shared libraries. `python` executable and the `:libpython` target
75-
# depend on `libpython{python_version}.so.1.0`.
76-
"lib/libpython{major}.{minor}*.so".format(**version_dict),
77-
# static libraries
78-
"lib/**/*.a",
79-
# tests for the standard libraries.
80-
"lib/python{major}.{minor}*/**/test/**".format(**version_dict),
81-
"lib/python{major}.{minor}*/**/tests/**".format(**version_dict),
82-
# During pyc creation, temp files named *.pyc.NNN are created
83-
"**/__pycache__/*.pyc.*",
84-
] + extra_files_glob_exclude,
89+
exclude = files_exclude,
8590
),
8691
)
8792
cc_import(

python/private/python_repository.bzl

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,98 @@ def is_standalone_interpreter(rctx, python_interpreter_path, *, logger = None):
5252
logger = logger,
5353
).return_code == 0
5454

55+
def _get_pycache_root(rctx):
56+
"""Calculates and creates the pycache root directory.
57+
58+
Returns:
59+
{type}`path | None` The path to the pycache root, or None if it couldn't
60+
be created.
61+
"""
62+
os_name = repo_utils.get_platforms_os_name(rctx)
63+
is_windows = os_name == "windows"
64+
65+
# 1. RULES_PYTHON_PYCACHE_DIR
66+
res = rctx.getenv("RULES_PYTHON_PYCACHE_DIR")
67+
if res:
68+
res = res + "/" + rctx.name
69+
return repo_utils.mkdir(rctx, res)
70+
71+
# Suffix for cases 2-4
72+
# The first level directory is static and documented so that it is easy to
73+
# use with e.g. --sandbox_add_mount_pair=/tmp/rules_python_pycache
74+
suffix = "rules_python_pycache/{}/{}".format(hash(str(rctx.workspace_root)), rctx.name)
75+
76+
# 2. XDG_CACHE_HOME
77+
res = rctx.getenv("XDG_CACHE_HOME")
78+
if res:
79+
path = repo_utils.mkdir(rctx, rctx.path(res).get_child(suffix))
80+
if path:
81+
return path
82+
83+
# 3. TMP or TEMP
84+
res = rctx.getenv("TMP") or rctx.getenv("TEMP")
85+
if res:
86+
path = repo_utils.mkdir(rctx, rctx.path(res).get_child(suffix))
87+
if path:
88+
return path
89+
90+
# 4. /tmp or Windows equivalent
91+
if is_windows:
92+
path = rctx.path("C:/Temp").get_child(suffix)
93+
else:
94+
path = rctx.path("/tmp").get_child(suffix)
95+
96+
return repo_utils.mkdir(rctx, path)
97+
98+
def _create_pycache_symlinks(rctx, logger):
99+
"""Finds all directories with a .py file and creates __pycache__ symlinks.
100+
101+
Args:
102+
rctx: {type}`repository_ctx` The repository rule's context object.
103+
logger: Optional logger to use for operations.
104+
"""
105+
pycache_root = _get_pycache_root(rctx)
106+
logger.info(lambda: "pycache root: {}".format(pycache_root))
107+
pycache_root_str = str(pycache_root) if pycache_root else None
108+
109+
os_name = repo_utils.get_platforms_os_name(rctx)
110+
null_device = "NUL" if os_name == "windows" else "/dev/null"
111+
112+
queue = [rctx.path(".")]
113+
114+
# Starlark doesn't support recursion, use a loop with a queue.
115+
# Using a large range as a safeguard.
116+
for _ in range(1000000):
117+
if not queue:
118+
break
119+
p = queue.pop()
120+
121+
has_py = False
122+
for child in p.readdir():
123+
# Skip hidden files and directories
124+
if child.basename.startswith("."):
125+
continue
126+
127+
if child.is_dir:
128+
if child.basename == "__pycache__" or str(child) == pycache_root_str:
129+
continue
130+
queue.append(child)
131+
elif child.basename.endswith(".py"):
132+
has_py = True
133+
134+
if has_py:
135+
pycache_dir = p.get_child("__pycache__")
136+
if pycache_root:
137+
pycache_relative = repo_utils.repo_root_relative_path(rctx, pycache_dir)
138+
target_dir = pycache_root.get_child(pycache_relative)
139+
140+
repo_utils.mkdir(rctx, target_dir)
141+
rctx.delete(pycache_dir)
142+
rctx.symlink(target_dir, pycache_dir)
143+
else:
144+
rctx.delete(pycache_dir)
145+
rctx.symlink(null_device, pycache_dir)
146+
55147
def _python_repository_impl(rctx):
56148
if rctx.attr.distutils and rctx.attr.distutils_content:
57149
fail("Only one of (distutils, distutils_content) should be set.")
@@ -123,6 +215,7 @@ def _python_repository_impl(rctx):
123215
logger = logger,
124216
)
125217

218+
_create_pycache_symlinks(rctx, logger)
126219
python_bin = "python.exe" if ("windows" in platform) else "bin/python3"
127220

128221
if "linux" in platform:

python/private/repo_utils.bzl

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,49 @@ def _which_describe_failure(binary_name, path):
319319
path = path,
320320
)
321321

322+
def _mkdir(mrctx, path):
323+
path = mrctx.path(path)
324+
if path.exists:
325+
return path
326+
327+
repo_root = str(mrctx.path("."))
328+
path_str = str(path)
329+
330+
if not path_str.startswith(repo_root):
331+
mkdir_bin = mrctx.which("mkdir")
332+
if not mkdir_bin:
333+
return None
334+
res = mrctx.execute([mkdir_bin, "-p", path_str])
335+
if res.return_code != 0:
336+
return None
337+
return path
338+
else:
339+
placeholder = path.get_child(".placeholder")
340+
mrctx.file(placeholder)
341+
mrctx.delete(placeholder)
342+
return path
343+
344+
def _repo_root_relative_path(mrctx, path):
345+
"""Takes a path object and returns a repo-relative path string.
346+
347+
Args:
348+
mrctx: module_ctx or repository_ctx
349+
path: {type}`path` a path within `mrctx`
350+
351+
Returns:
352+
{type}`str` a repo-root-relative path string.
353+
"""
354+
repo_root = str(mrctx.path("."))
355+
path_str = str(path)
356+
relative_path = path_str[len(repo_root):]
357+
if relative_path[0] != "/":
358+
fail("{path} not under {repo_root}".format(
359+
path = path,
360+
repo_root = repo_root,
361+
))
362+
relative_path = relative_path[1:]
363+
return relative_path
364+
322365
def _args_to_str(arguments):
323366
return " ".join([_arg_repr(a) for a in arguments])
324367

@@ -465,6 +508,8 @@ repo_utils = struct(
465508
get_platforms_os_name = _get_platforms_os_name,
466509
is_repo_debug_enabled = _is_repo_debug_enabled,
467510
logger = _logger,
511+
mkdir = _mkdir,
512+
repo_root_relative_path = _repo_root_relative_path,
468513
which_checked = _which_checked,
469514
which_unchecked = _which_unchecked,
470515
)

0 commit comments

Comments
 (0)