feat(config): DJ_STORES env var + DJ_IGNORE_CONFIG_FILE flag + arbitrary .secrets/ attrs — 2.3#1452
Open
dimitri-yatsenko wants to merge 2 commits into
Open
feat(config): DJ_STORES env var + DJ_IGNORE_CONFIG_FILE flag + arbitrary .secrets/ attrs — 2.3#1452dimitri-yatsenko wants to merge 2 commits into
dimitri-yatsenko wants to merge 2 commits into
Conversation
Two changes: 1. settings.py — Config._load_secrets() Was hardcoded to only auto-load stores.<name>.access_key and stores.<name>.secret_key from the secrets directory. That whitelist made sense when only the built-in s3 / gcs / azure stores existed, but it's wrong for plugin-registered adapters that define their own secret fields. Concrete example: a plugin that wraps Databricks Unity Catalog Volumes authenticates via a Bearer token. Its store spec wants a `token` field. Today the file .secrets/stores.uc.token is silently ignored, forcing users to either rename to a misleading "access_key" or to set the value programmatically at notebook startup (defeating the auto-load story). This change drops the whitelist. Any stores.<name>.<attr> file under the secrets directory is loaded into dj.config["stores"][<name>][<attr>]. The existing "don't override pre-configured values" precedence is preserved — config file and env vars still win. New test: TestStoreSecrets.test_load_store_arbitrary_attr confirms that `token` and `api_key` fields load via the same path as access_key/secret_key. 2. storage_adapter.py — _discover_adapters() Removed the try/except wrapping importlib.metadata.entry_points(). The fallback path was for Python < 3.10 (where entry_points() returned a SelectableGroups dict instead of accepting the group= kwarg), but DataJoint's minimum supported Python is 3.10 (per requires-python in pyproject.toml). The dead branch was also producing a mypy type error (SelectableGroups.get signature).
dimitri-yatsenko
added a commit
that referenced
this pull request
May 19, 2026
Same fix #1383 applied to the Job table's antijoin in refresh(), now applied to AutoPopulate._populate_direct's antijoin and the progress() fallback path. The two-arg subtract `key_source - self` triggers QueryExpression.__sub__ which calls .restrict(Not(...)) with semantic_check=True by default. The semantic-check requirement is wrong here: this antijoin is a plain set-difference, not a join — we ask "which key_source rows aren't yet in self." Whether the same-named PK attribute carries the same source-table lineage tag on both sides is irrelevant. Where it bites: dj.Imported / dj.Computed tables whose primary key is fully inherited from a single FK, with no own-table PK attributes. On those, self.proj() returns the PK attribute with lineage=None (or pointing to self rather than the FK parent), while key_source's matching attribute carries the parent's lineage tag. The semantic-check fails with: Cannot join on attribute 'X': different lineages (schema.parent.X vs None). Use .proj() to rename one of the attributes. This pattern is legitimate ("one row downstream per parent row, no intermediate ID") but rare in typical Elements / SciOps pipelines, which extend the inherited PK with own-table attributes (trial_id, experiment_id, etc.) that anchor proj()'s lineage. That's why the existing #1405 test suite didn't surface it. Changes: - src/datajoint/autopopulate.py - Import Not from .condition at module top. - _populate_direct: replace `(LHS - self.proj())` with `LHS.restrict(Not(self.proj()), semantic_check=False)`. - progress(): same swap on the no-common-attrs fallback branch. - tests/integration/test_autopopulate.py - New test_populate_antijoin_fk_inherited_pk regression test: Spec(Manual) -> Item(Imported with only -> Spec) — the minimal shape that triggers the bug. Without the fix Item.populate() raises DataJointError; with the fix it populates correctly, progress() reports correct counts, and partial-then-full populate works. Stacked on top of #1452 (the secrets-loading + dead-code fix); rebase to master after that lands.
4 tasks
ttngu207
previously approved these changes
May 20, 2026
4 tasks
… 2.3 Adds env-var configuration for object stores so the DataJoint platform — and any env-var-only deployment — can configure plugin-registered storage adapters (Databricks Unity Catalog Volumes, custom HTTP stores, lab archive systems) without files on disk. - DJ_STORES (JSON-encoded) carries the entire `stores` dict in the same shape used in `datajoint.json`. Replaces the file's `stores` block when set. - DJ_IGNORE_CONFIG_FILE (default false) skips `datajoint.json`, the project `.secrets/`, and `/run/secrets/datajoint/` entirely. Hard guarantee that no file on disk leaks into config. Implementation notes: - New `Config.ignore_config_file` field (validation_alias DJ_IGNORE_CONFIG_FILE) auto-bound by pydantic-settings. - The `stores` field receives a `validation_alias` placeholder so pydantic-settings does NOT auto-bind DJ_STORES at Config() construction. Otherwise its built-in JSON parser intercepts before precedence logic runs and reports SettingsError instead of a clean ValueError. - New `Config._apply_stores_env()` parses DJ_STORES JSON, replaces self.stores wholesale, raises ValueError on bad JSON or non-object payloads. - `_create_config()` restructured to: skip file + secrets when ignore_config_file is set; apply DJ_STORES between file load and secrets fill so env wins over file and secrets only fill gaps. Functional precedence (high to low): programmatic > DJ_STORES > config file > `.secrets/stores.<name>.<attr>` (fills missing attrs only). Tests: new TestStoreEnv class with 8 tests; existing TestStoreSecrets and test_storage_adapter tests still pass. Companion docs: datajoint/datajoint-docs#172
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the env-var gap for object-store configuration. Until now the DataJoint platform (and any deployment that ships container images without
datajoint.json) could injectDJ_HOST/DJ_USER/DJ_PASSbut had no env-var path forstores.<name>.*. Plugin-registered adapters (Databricks Unity Catalog Volumes, custom HTTP-based protocols, lab archive systems) with credential fields liketoken,api_key,workspace_urlcouldn't be used on the platform at all.This PR adds two env vars and broadens the file-based secrets loader, all targeting DataJoint 2.3:
DJ_STORES— JSON-encoded copy of thestoresdict, same shape as indatajoint.json. Replaces the file'sstoresblock when set. A single env var avoids per-field delimiter / underscore-stripping ambiguity (access_keyvsaccesskey,raw_datavsrawdata) and accepts arbitrary adapter-defined attr names without negotiating an env-var schema.DJ_IGNORE_CONFIG_FILE(defaultfalse) — whentrue, skipsdatajoint.json, the project.secrets/, and/run/secrets/datajoint/entirely. Hard guarantee that no file on disk leaks into config..secrets/stores.<name>.<attr>now accepts any attribute, not onlyaccess_key/secret_key. (This was the original scope of the PR; kept and expanded.)Motivation
A plugin that wraps Databricks Unity Catalog Volumes authenticates via a Bearer token — its store spec wants a
tokenfield, notaccess_key/secret_key. With the previous whitelist,.secrets/stores.uc.tokenwas silently ignored, and there was no env-var path fortokenat all. Users had to either rename to a misleadingaccess_keyor set the value programmatically at notebook startup — neither of which works on the DataJoint platform, which has nodatajoint.jsonand no.secrets/on disk.After this PR, the same adapter is configured identically across:
datajoint.json+.secrets/stores.uc.tokendatajoint.json+ env vars for credentialsDJ_IGNORE_CONFIG_FILE=true+DJ_STORES='{"uc": {...}}'What changed
src/datajoint/settings.pyConfig.ignore_config_file: boolfield (env:DJ_IGNORE_CONFIG_FILE, defaultFalse) — pydantic-settings auto-binds it viavalidation_alias.validation_aliasplaceholder on thestoresfield so pydantic-settings does not auto-bindDJ_STORESatConfig()time. Otherwise its built-in JSON parser intercepts before precedence logic runs and reportsSettingsErrorinstead of a cleanValueError.Config._apply_stores_env()— parsesDJ_STORESJSON, replacesself.storeswholesale, raisesValueErroron bad JSON or non-object payloads._create_config(): skip file + secrets whenignore_config_fileis set; applyDJ_STORESbetween file load and secrets fill so env wins over file and secrets only fill gaps.access_key/secret_keywhitelist from_load_secrets()'s per-store branch — anystores.<name>.<attr>file is loaded.save_template()docstring to mentionDJ_STORES/DJ_IGNORE_CONFIG_FILE.src/datajoint/storage_adapter.pyIncidental cleanup (unchanged from the prior PR scope): removed a
try/except TypeErrorfallback aroundentry_points(group=...)for Python <3.10.requires-python = ">=3.10", so the branch was dead code (and was producing a stale mypy error).tests/unit/test_settings.pyNew
TestStoreEnvclass with 8 tests:test_dj_stores_sets_stores_dict— happy-path JSON parse with no config file.test_dj_stores_overrides_config_file— env wins over file.test_dj_stores_invalid_json_raises— clearValueErroron bad JSON.test_dj_stores_non_object_raises— clearValueErroron non-object JSON.test_dj_stores_plus_secrets_dir— secrets fill attrs DJ_STORES omits.test_ignore_config_file_skips_json—DJ_IGNORE_CONFIG_FILE=trueblocksdatajoint.json.test_ignore_config_file_skips_secrets— same blocks.secrets/.test_ignore_config_file_default_loads_both— default behavior preserved.Existing
TestStoreSecretstests (incl. the newtest_load_store_arbitrary_attrfor the whitelist removal) and all 23test_storage_adaptertests still pass.Functional precedence
dj.config["stores"][...](programmatic)DJ_STORESenv varstoresblock ofdatajoint.json.secrets/stores.<name>.<attr>filesWhen
DJ_IGNORE_CONFIG_FILE=true, sources 3 and 4 are skipped entirely.Companion PR
Test plan
tests/unit/test_settings.py(including the 8 newTestStoreEnvtests)tests/unit/test_storage_adapter.pyDJ_IGNORE_CONFIG_FILE=true DJ_STORES='{...}' python -c "import datajoint as dj; print(dj.config['stores'])"from an empty directory shows the env-supplied stores and emits no "No datajoint.json found" warning.DJ_IGNORE_CONFIG_FILE=trueand adatajoint.jsonadjacent to cwd containing{"database": {"host": "should-not-load"}},dj.config["database.host"]is the default (localhost), notshould-not-load.Release notes (2.3)
DJ_STORESenv var — JSON-encodedstoresdictDJ_IGNORE_CONFIG_FILEenv var — skip file-based config sources (defaultfalse).secrets/stores.<name>.<attr>accepts any attribute, not onlyaccess_key/secret_keydatajoint.storageentry-point group) formalized as part of the public API; see the spec in the companion docs PR.