feat(validator): salted-hash keeper for density tapering#290
Merged
Conversation
Pick the kept validator_request per (asset, bucket) by md5(id || THINNING_SALT) instead of the smallest id, so the kept request is spread across the bucket rather than always its earliest member. The hash is stable per row, so tapering stays idempotent: the global-min-hash row, once eligible, is always rn=1 and never thinned, and every bucket converges to exactly one keeper before scoring range. THINNING_SALT is optional (documented in .env.example); when unset the handler warns once and falls back to md5(id). Tests assert the keeper-agnostic invariant (one survivor per bucket), determinism across runs, the md5(id || salt) selection, and the unset fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The keeper is now a salted-hash pick, so the two latest_per_asset tests can no longer assume the smallest-id row is the keeper: - drops_protection_once_latest_ages_past_time_length (the CI failure): both rows are past time_length, so latest_per_asset is empty and the bucket collapses to one randomized keeper. Assert exactly one survivor instead of naming it — two survivors would mean the stale latest was wrongly protected. - preserves_latest_request_per_asset_during_gap: if the latest wins the hash draw the older (unprotected) row is thinned, so "both alive" no longer holds. Assert the freshest row is always alive (as keeper or via latest_per_asset), which is what the test is named to prove. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Updates validator density tapering to pick the per-(asset, bucket) keeper request via a salted MD5 of the validator_request id, instead of always keeping the smallest id, so the kept request is spread across the bucket while remaining deterministic across re-runs.
Changes:
- Change tapering keeper selection to
ORDER BY md5(vr_id::text || :salt)with optionalTHINNING_SALT(fallback tomd5(id)when unset). - Rewrite/extend thinning tests to be keeper-agnostic where appropriate, plus add determinism and salted-hash keeper assertions.
- Add
THINNING_SALTdocumentation to.env.exampleand set a stable salt in the test suite.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
synth/validator/miner_data_handler.py |
Implements salted-hash keeper selection and logs when salt is missing. |
tests/test_miner_data_handler.py |
Updates thinning assertions to be keeper-agnostic and adds new keeper/determinism tests. |
tests/conftest.py |
Sets a stable THINNING_SALT for reproducible tests. |
.env.example |
Documents optional THINNING_SALT configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+89
to
+99
| # Optional salt for density-tapering keeper selection. The kept | ||
| # validator_request per bucket is picked by md5(id || salt), so the | ||
| # keeper is spread across the bucket rather than always its earliest | ||
| # member. Unset falls back to md5(id); warn once so that's not | ||
| # silent. | ||
| self.thinning_salt = os.getenv("THINNING_SALT", "") | ||
| if not self.thinning_salt: | ||
| bt.logging.warning( | ||
| "THINNING_SALT is not set; density-tapering keeper " | ||
| "selection falls back to md5(id)." | ||
| ) |
Comment on lines
+807
to
+812
| The keeper is the row with the smallest `md5(id || thinning_salt)`, | ||
| not the smallest id, so the kept request is spread across the bucket | ||
| rather than always being its earliest member. The hash is stable per | ||
| row (idempotent across runs): the global-min-hash row, once eligible, | ||
| is always rn=1 and never thinned, so every bucket keeps exactly one | ||
| row and converges to that keeper well before it enters scoring range. |
Comment on lines
+36
to
+41
| @pytest.fixture(scope="session", autouse=True) | ||
| def thinning_salt_env(): | ||
| # Provide a stable THINNING_SALT so the salted-hash keeper selection is | ||
| # reproducible across the suite. | ||
| os.environ["THINNING_SALT"] = "test-thinning-salt" | ||
| yield |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Density tapering keeps one
validator_requestper(asset, bucket)and soft-deletes the rest. Until now the keeper was the smallest id — i.e. always the earliest request in the bucket. This changes the keeper to the row with the smallestmd5(id || THINNING_SALT), so the kept request is spread across the bucket instead of always being its earliest member.Why it's safe
The hash is stable per row, so tapering stays idempotent: the global-min-hash row, once eligible, is always
rn=1and is never thinned; a transient keeper covers the bucket until then. So every bucket always keeps ≥1 row and converges to exactly one keeper well before it enters scoring range (~24h later for LOW). Thelatest_per_assetprotection for the low-latency feed is unchanged. Applies to both LOW and HIGH cycles.Config
THINNING_SALTis optional (added to.env.example). Set → salted selection; unset → the handler logs a one-time warning and falls back tomd5(id). No crash either way.Tests
(asset, bucket), rest tombstoned) via a_split_alive_thinnedhelper.argmin(md5(id || salt)), and the unset-saltmd5(id)fallback.THINNING_SALTfor reproducibility.🤖 Generated with Claude Code