Skip to content

fix: [hotfix] await background load futures during cancellation and fix loadingTimeoutMs default#48881

Closed
sparknack wants to merge 3 commits intomilvus-io:hotfix-2.6.14from
sparknack:cl-fix-hf-2.6.14
Closed

fix: [hotfix] await background load futures during cancellation and fix loadingTimeoutMs default#48881
sparknack wants to merge 3 commits intomilvus-io:hotfix-2.6.14from
sparknack:cl-fix-hf-2.6.14

Conversation

@sparknack
Copy link
Copy Markdown
Contributor

Summary

  • Await background load futures in GroupChunkTranslator and ManifestGroupTranslator cancellation cleanup paths to prevent use-after-free / dangling futures
  • Add CheckCancellation calls in LoadCellBatchAsync loop for prompt cancellation
  • Fix tieredStorage.loadingTimeoutMs default from 0 (immediate failure) to -1 (no timeout)

issue: #48854

Test plan

  • Added unit test CancellationStopsMidBatchPush verifying mid-batch cancellation stops loading and raises FollyCancel
  • CI passes

…nslators

Ensure background load futures are properly awaited during cancellation
cleanup in both GroupChunkTranslator and ManifestGroupTranslator to
prevent dangling async operations.

Also add CheckCancellation calls in LoadCellBatchAsync to support early
termination during cell loading.

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
@sre-ci-robot sre-ci-robot added the size/M Denotes a PR that changes 30-99 lines. label Apr 9, 2026
@sparknack sparknack changed the title fix: [2.6.14-hf] await background load futures during cancellation and fix loadingTimeoutMs default fix: [hotfix] await background load futures during cancellation and fix loadingTimeoutMs default Apr 9, 2026
@mergify mergify Bot added the dco-passed DCO check passed. label Apr 9, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 9, 2026

@sparknack Please associate the related pr of master to the body of your Pull Request. (eg. "pr: #")

@mergify mergify Bot added do-not-merge/missing-related-pr kind/bug Issues or changes related a bug labels Apr 9, 2026
@sre-ci-robot sre-ci-robot added do-not-merge/need-merge-master-first any pr merge to release branch need to merge master first do-not-merge/need-milestone generate by v2-label-manager labels Apr 9, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[WARNING] No dependent PR reference found

  • Target branch '2.6' requires a PR merged to master first
  • Please add reference in format 'pr: #number'

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results d6c8112

Stage Result Duration Tests

Total: 4min | Pipeline | Artifacts

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 65.11628% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.47%. Comparing base (8222eca) to head (d6c8112).

Files with missing lines Patch % Lines
...gcore/storagev2translator/GroupChunkTranslator.cpp 0.00% 7 Missing ⚠️
...re/storagev2translator/ManifestGroupTranslator.cpp 0.00% 7 Missing ⚠️
internal/core/src/segcore/MemoryPlannerTest.cpp 96.29% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                @@
##           hotfix-2.6.14   #48881      +/-   ##
=================================================
- Coverage          82.48%   82.47%   -0.01%     
=================================================
  Files                555      555              
  Lines              88075    88118      +43     
=================================================
+ Hits               72646    72674      +28     
- Misses             15377    15392      +15     
  Partials              52       52              
Components Coverage Δ
Client ∅ <ø> (∅)
Core 83.34% <65.11%> (-0.01%) ⬇️
Go ∅ <ø> (∅)
Files with missing lines Coverage Δ
internal/core/src/segcore/memory_planner.cpp 89.38% <100.00%> (+0.09%) ⬆️
internal/core/src/segcore/MemoryPlannerTest.cpp 99.52% <96.29%> (-0.48%) ⬇️
...gcore/storagev2translator/GroupChunkTranslator.cpp 88.36% <0.00%> (-2.75%) ⬇️
...re/storagev2translator/ManifestGroupTranslator.cpp 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label Apr 9, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

✅ CI Loop Results d6c8112

Stage Result Duration Tests
✅ Build SUCCESS 4.7min -

Total: 9min | Pipeline | Artifacts

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sparknack, zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[WARNING] No dependent PR reference found

  • Target branch '2.6' requires a PR merged to master first
  • Please add reference in format 'pr: #number'

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sparknack sparknack changed the base branch from 2.6 to hotfix-2.6.14 April 13, 2026 09:33
@aoiasd
Copy link
Copy Markdown
Contributor

aoiasd commented Apr 13, 2026

/lgtm

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[WARNING] No dependent PR reference found

  • Target branch '2.6' requires a PR merged to master first
  • Please add reference in format 'pr: #number'

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sparknack
Copy link
Copy Markdown
Contributor Author

/retest

@sparknack
Copy link
Copy Markdown
Contributor Author

Superseded by #48990 (re-submitted against hotfix-2.6.14 to trigger fresh CI).

@sparknack sparknack closed this Apr 13, 2026
sre-ci-robot pushed a commit that referenced this pull request Apr 14, 2026
…n and fix loadingTimeoutMs default (#48990)

## Summary
Re-submission of #48881 to trigger fresh CI against `hotfix-2.6.14`
base. Content unchanged.

- Await background load futures in `GroupChunkTranslator` and
`ManifestGroupTranslator` cancellation cleanup paths to prevent
use-after-free / dangling futures
- Add `CheckCancellation` calls in `LoadCellBatchAsync` loop for prompt
cancellation
- Fix `tieredStorage.loadingTimeoutMs` default from `0` (immediate
failure) to `-1` (no timeout)

issue: #48854
pr: #48880

## Test plan
- [x] Added unit test `CancellationStopsMidBatchPush` verifying
mid-batch cancellation stops loading and raises `FollyCancel`
- [ ] CI passes

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-passed DCO check passed. do-not-merge/missing-related-pr do-not-merge/need-merge-master-first any pr merge to release branch need to merge master first do-not-merge/need-milestone generate by v2-label-manager kind/bug Issues or changes related a bug lgtm low-code-coverage add test-label from zhikun, diff coverage > 80% size/M Denotes a PR that changes 30-99 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants