Skip to content

Add support for handling scenarios where end time is invalid during RetentionManager run#18148

Open
9aman wants to merge 1 commit intoapache:masterfrom
9aman:retention_manager_improvement_in_case_of_missing_start_end_time
Open

Add support for handling scenarios where end time is invalid during RetentionManager run#18148
9aman wants to merge 1 commit intoapache:masterfrom
9aman:retention_manager_improvement_in_case_of_missing_start_end_time

Conversation

@9aman
Copy link
Copy Markdown
Contributor

@9aman 9aman commented Apr 9, 2026

Summary

  • When segment end time is invalid, the RetentionManager currently skips the segment entirely — it is never deleted regardless of the retention policy. This adds an optional fallback to use segmentZKMetadata.getCreationTime() instead, so segments with missing/invalid end times can still be cleaned up.
  • Gated behind cluster config controller.retentionManager.enableCreationTimeFallback (default false) — no behavior change unless explicitly opted in.
  • Supports dynamic config updates via the existing cluster config change listener — no controller restart needed.

Test plan

  • TimeRetentionStrategyTest#testCreationTimeFallback — unit tests covering: fallback disabled (existing behavior preserved), fallback enabled with valid/recent/invalid/zero creation time, valid end time takes priority over fallback
  • RetentionManagerTest#testCreationTimeFallbackOnChange — verifies dynamic config toggle via onChange()
  • RetentionManagerTest#testRetentionWithInvalidEndTimeAndCreationTimeFallback — end-to-end: segment with invalid end time is deleted when fallback is enabled and creation time exceeds retention

return false; // Incomplete segments don't have final end time and should not be purged
}

return isPurgeable(tableNameWithType, segmentZKMetadata.getSegmentName(), segmentZKMetadata.getEndTimeMs());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the new checks inside this method so that this method can also handle fallback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally kept that super simple.
It just takes in timestamps while this function handles all the complex logic.
The caller can choose to keep things simple using the other function or use this function if a logical interpretation of the ZK metadata is needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that a good idea to leave to caller to understand the difference. Caller can call any method and both method should be consistent.
Passing invalid timestamp in second method behaves differently than the other method.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.02%. Comparing base (2e80bff) to head (d456597).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...troller/helix/core/retention/RetentionManager.java 88.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18148      +/-   ##
============================================
- Coverage     63.04%   63.02%   -0.03%     
  Complexity     1617     1617              
============================================
  Files          3202     3202              
  Lines        194718   194752      +34     
  Branches      30047    30055       +8     
============================================
- Hits         122760   122736      -24     
- Misses        62233    62269      +36     
- Partials       9725     9747      +22     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 62.99% <94.59%> (-0.02%) ⬇️
java-21 63.00% <94.59%> (-0.02%) ⬇️
temurin 63.02% <94.59%> (-0.03%) ⬇️
unittests 63.01% <94.59%> (-0.03%) ⬇️
unittests1 55.54% <ø> (-0.03%) ⬇️
unittests2 33.43% <94.59%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants