Skip to content

feat(inkless): KC-72 Reconcile stale records after diskless switch#612

Open
viktorsomogyi wants to merge 1 commit into
mainfrom
svv/ts-unification-reconcile
Open

feat(inkless): KC-72 Reconcile stale records after diskless switch#612
viktorsomogyi wants to merge 1 commit into
mainfrom
svv/ts-unification-reconcile

Conversation

@viktorsomogyi
Copy link
Copy Markdown
Contributor

@viktorsomogyi viktorsomogyi commented May 26, 2026

When a partition switches to diskless, a replica can still hold uncommitted records above the high watermark. If the switch is quick and consolidation is enabled, the consolidation fetchers would start appending on top of that stale tail. This commit implements the following:

  • Truncate a newly switched partition's local log down to the just-committed seal (classic-to-diskless start offset). This runs after makeLeader/makeFollower and before any fetcher starts, so catch-up and consolidation always initialize from the truncated LEO.
  • Add ConsolidationReconciler to decide when a consolidating partition joins the consolidation fetcher, covering both the initial switch and resume after failover, restart, or reassignment. ReplicaManager now delegates consolidation fetcher startup to it.
  • Hand off from the classic ReplicaFetcher to consolidation once a follower reaches the seal, via the fetcher's self-eviction path.
  • Gate pruning of switched diskless partitions behind a prune floor and skip partitions whose switch is still pending; born-consolidated partitions keep pruning directly.
  • Transform the per-partition diskless watermark into a single monotonic safeConsolidationPruningFloor (gate + progress tracker), replacing lastAppliedDisklessLogStartOffset.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “consolidation reconciliation” step to prevent consolidating diskless fetchers from appending onto stale local log data that may exist above the high watermark after a classic→diskless switch.

Changes:

  • Added ConsolidationReconciler to reconcile/truncate local logs (when needed) before starting consolidation fetchers on leader/follower transitions.
  • Introduced a per-partition “safe pruning floor” used to gate diskless-log pruning and avoid unsafe prune requests.
  • Expanded/adjusted unit tests around consolidating diskless fetch behavior and pruning eligibility.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
core/src/main/scala/kafka/server/ReplicaManager.scala Wires the new reconciler into leader/follower transitions before starting consolidation fetchers (and includes an unrelated FileMerger disablement).
core/src/main/scala/io/aiven/inkless/consolidation/ConsolidationReconciler.scala New reconciliation logic to determine fetch start offset and optionally truncate local logs based on control-plane log info.
core/src/main/scala/io/aiven/inkless/consolidation/ConsolidatedDisklessLogPruner.scala Tightens prune eligibility by filtering switch-pending partitions and requiring a “safe prune offset” from Partition.
core/src/main/scala/kafka/cluster/Partition.scala Adds safeConsolidationPruningFloor with getters/setters to gate pruning.
core/src/main/scala/io/aiven/inkless/consolidation/ReconciliationException.scala Adds a reconciler-specific exception type.
core/src/main/scala/io/aiven/inkless/consolidation/DisklessLeaderEndPoint.scala Removes an outdated TODO comment.
core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala Adds/adjusts tests to ensure unified-log reads vs diskless path selection for consolidating diskless fetches.
core/src/test/scala/io/aiven/inkless/consolidation/ConsolidationReconcilerTest.scala New unit tests for reconciliation/truncation decisions and failure handling.
core/src/test/scala/io/aiven/inkless/consolidation/ConsolidatedDisklessLogPrunerTest.scala Updates tests for the new prune “safe floor” behavior and switch-pending filtering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
Comment thread core/src/main/scala/io/aiven/inkless/consolidation/ConsolidationReconciler.scala Outdated
Comment thread core/src/main/scala/io/aiven/inkless/consolidation/ConsolidationReconciler.scala Outdated
Comment thread core/src/main/scala/io/aiven/inkless/consolidation/ConsolidationReconciler.scala Outdated
Comment thread core/src/main/scala/io/aiven/inkless/consolidation/ReconciliationException.scala Outdated
Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
@viktorsomogyi viktorsomogyi force-pushed the svv/ts-unification-reconcile branch from 30d1d15 to 82d0fa7 Compare May 26, 2026 14:01
@viktorsomogyi viktorsomogyi force-pushed the svv/ts-unification-delete-tiered-diskless branch from 331715d to 5f47cc2 Compare May 27, 2026 14:57
Base automatically changed from svv/ts-unification-delete-tiered-diskless to main May 27, 2026 19:20
@viktorsomogyi viktorsomogyi force-pushed the svv/ts-unification-reconcile branch 2 times, most recently from d09af83 to 51dadb3 Compare May 29, 2026 14:04
@viktorsomogyi viktorsomogyi changed the title feat(inkless): KC-72 Reconcile stale commits in old follower feat(inkless): KC-72 Reconcile stale records after diskless switch May 29, 2026
@viktorsomogyi viktorsomogyi requested a review from Copilot May 29, 2026 14:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
Comment on lines +117 to +131
// LEO >= seal. This covers both the initial switch (LEO == seal, nothing consolidated
// yet) and resuming an already-progressed partition after a restart, leadership
// failover, or reassignment (the local log either kept its consolidated frontier or
// was rehydrated from tiered storage). In every case we resume from the current local
// LEO so we never re-consolidate or skip data the local log already holds.
//
// The prune floor is the higher of the seal and the current log start offset:
// - at first switch logStartOffset is still the classic prefix start, so the floor is
// the seal, which blocks pruning the diskless region until consolidation has tiered
// past the boundary;
// - on resume logStartOffset has advanced past the seal as consolidated segments were
// tiered and deleted, so it reflects real pruning progress.
val pruneFloor = math.max(seal, log.logStartOffset)
partition.maybeAdvanceConsolidationPruneFloor(pruneFloor)
ConsolidationStartState.Ready(log.logEndOffset)
Copy link
Copy Markdown
Contributor Author

@viktorsomogyi viktorsomogyi May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is valid, but would require us to track epoch possibly or persist the consolidation frontier. Will defer to a follow-up PR.

@viktorsomogyi viktorsomogyi force-pushed the svv/ts-unification-reconcile branch from 51dadb3 to 58dfd23 Compare May 29, 2026 15:01
When a partition switches to diskless, a replica can still hold
uncommitted records above the high watermark. If the switch is quick and
consolidation is enabled, the consolidation fetchers would start
appending on top of that stale tail. This commit implements the following:

- Truncate a newly switched partition's local log down to the
  just-committed seal (classic-to-diskless start offset). This runs after
  makeLeader/makeFollower and before any fetcher starts, so catch-up and
  consolidation always initialize from the truncated LEO.
- Add ConsolidationReconciler to decide when a consolidating partition
  joins the consolidation fetcher, covering both the initial switch and
  resume after failover, restart, or reassignment. ReplicaManager now
  delegates consolidation fetcher startup to it.
- Hand off from the classic ReplicaFetcher to consolidation once a
  follower reaches the seal, via the fetcher's self-eviction path.
- Gate pruning of switched diskless partitions behind a prune floor and
  skip partitions whose switch is still pending; born-consolidated
  partitions keep pruning directly.
- Transform the per-partition diskless watermark into a single monotonic
  safeConsolidationPruningFloor (gate + progress tracker), replacing
  lastAppliedDisklessLogStartOffset.
@viktorsomogyi viktorsomogyi force-pushed the svv/ts-unification-reconcile branch from 58dfd23 to a366d0d Compare May 29, 2026 16:06
@viktorsomogyi viktorsomogyi marked this pull request as ready for review May 29, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants