Skip to content

fix: filter out datasets with inconsistent database and LakeFS records#5171

Open
xuang7 wants to merge 3 commits into
apache:mainfrom
xuang7:fix/filter-mismatched-datasets
Open

fix: filter out datasets with inconsistent database and LakeFS records#5171
xuang7 wants to merge 3 commits into
apache:mainfrom
xuang7:fix/filter-mismatched-datasets

Conversation

@xuang7
Copy link
Copy Markdown
Contributor

@xuang7 xuang7 commented May 24, 2026

What changes were proposed in this PR?

This PR fixes an issue where dataset listings fail when dataset records in the database and LakeFS repositories are inconsistent. This breaks the workflow dataset picker and can also affect Hub dataset listings. The fix updates the dataset listing endpoints to first fetch existing LakeFS repository names and filter out dataset records whose repositories are missing, so valid datasets can still be returned normally.

Demo:

Before After
Before: dataset listing error After: dataset picker loads valid datasets

Any related issues, documentation, discussions?

Closes #5106

How was this PR tested?

Added two tests.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

@github-actions github-actions Bot added engine fix common platform Non-amber Scala service paths labels May 24, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 24, 2026

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.81%. Comparing base (c435aa7) to head (c4a945d).

Files with missing lines Patch % Lines
...exera/web/resource/dashboard/hub/HubResource.scala 0.00% 2 Missing ⚠️
.../amber/core/storage/util/LakeFSStorageClient.scala 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5171      +/-   ##
============================================
- Coverage     47.13%   45.81%   -1.33%     
- Complexity     2344     2345       +1     
============================================
  Files          1042     1046       +4     
  Lines         39989    40033      +44     
  Branches       4260     4258       -2     
============================================
- Hits          18849    18341     -508     
- Misses        20015    20582     +567     
+ Partials       1125     1110      -15     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø)
agent-service 33.74% <ø> (-0.03%) ⬇️ Carriedforward from 662c8cb
amber 50.31% <0.00%> (-0.02%) ⬇️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 32.89% <100.00%> (+0.70%) ⬆️
frontend 34.62% <ø> (-3.20%) ⬇️ Carriedforward from 662c8cb
python 90.50% <ø> (ø) Carriedforward from 662c8cb
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xuang7 xuang7 requested a review from aicam May 24, 2026 00:44
@chenlica chenlica requested a review from mengw15 May 25, 2026 07:08
Copy link
Copy Markdown
Contributor

@mengw15 mengw15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment

): List[DashboardDataset] = {
val uid = user.getUid
// Drop DB rows whose LakeFS repo is missing.
val existingRepos = LakeFSStorageClient.listAllRepoNames()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOCTOU note: listAllRepoNames() is a snapshot taken before the .map that calls retrieveRepositorySize per row. If a concurrent admin / orchestrator deletes a LakeFS repo between the snapshot and the per-row size lookup, the request will still 500 on the now-stale "exists" check. The window is small but non-zero.

Worth knowing the existing dataset-search path (DatasetSearchQueryBuilder.toEntryImpl at lines 127-137 on main) already handles this with a try { retrieveRepositorySize(...) } catch (ApiException) { return null } pattern, logging and silently dropping the orphan. After this PR the two read paths have two different defenses for the same underlying inconsistency.

Could we use try-catch here too? That would close the race window, drop the need for the new listAllRepoNames() helper entirely, and unify the orphan defense with the existing search path. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common engine fix platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset file selection fails when LakeFS repository and database records are inconsistent

3 participants