Skip to content

fix: [3.0] bind file resource refCnt to collection lifecycle to prevent panic#48894

Merged
sre-ci-robot merged 4 commits intomilvus-io:3.0from
aoiasd:file_resource_panic_3.0
Apr 23, 2026
Merged

fix: [3.0] bind file resource refCnt to collection lifecycle to prevent panic#48894
sre-ci-robot merged 4 commits intomilvus-io:3.0from
aoiasd:file_resource_panic_3.0

Conversation

@aoiasd
Copy link
Copy Markdown
Contributor

@aoiasd aoiasd commented Apr 9, 2026

Summary

  • Increment fileResourceRefCnt during validateSchema instead of in the async ack callback's AddCollection, closing the TOCTOU race where RemoveFileResource could delete a resource between validation and AddCollection
  • On failure before Broadcast, refCnt is decremented immediately; on restart, refCnt for pending broadcast tasks is recovered from etcd before rootcoord becomes Healthy
  • Remove refCnt++ from addCollectionMeta since it's now done at validation time (reload path unchanged)

Test plan

issue: #48612
pr: #48893

🤖 Generated with Claude Code

@sre-ci-robot sre-ci-robot requested review from chyezh and sunby April 9, 2026 11:52
@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Apr 9, 2026
@mergify mergify Bot added the dco-passed DCO check passed. label Apr 9, 2026
@sre-ci-robot sre-ci-robot added the do-not-merge/need-merge-master-first any pr merge to release branch need to merge master first label Apr 9, 2026
@mergify mergify Bot added the kind/bug Issues or changes related a bug label Apr 9, 2026
@sre-ci-robot sre-ci-robot added the do-not-merge/need-milestone generate by v2-label-manager label Apr 9, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: ci-v2 system is enabled for this PR (3.0 branch).

To rerun ci-v2 checks, comment with:

  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests)
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e tests)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests)

If you have any questions or requests, please contact @zhikunyao.

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results 5487d79

Stage Result Duration Tests
✅ Build SUCCESS 10.5min -
✅ Code-Check SUCCESS 6.5min -
❌ UT-GO FAILURE 18.7min 926 passed
✅ UT-Integration SUCCESS 24.4min 46 passed
✅ UT-CPP-Cov SUCCESS 43.1min 7069 passed

Total: 62min | Pipeline | Artifacts

Failed Test Logs:

@sre-ci-robot
Copy link
Copy Markdown
Contributor

✅ CI Loop Results 5487d79

Stage Result Duration Tests
✅ Build SUCCESS 8.2min -

Total: 14min | Pipeline | Artifacts

@sre-ci-robot
Copy link
Copy Markdown
Contributor

✅ CI Loop Results 5487d79

Stage Result Duration Tests
✅ Build SUCCESS 8.1min -

Total: 12min | Pipeline | Artifacts

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results 5487d79

Stage Result Duration Tests
✅ Build SUCCESS 10.6min -
✅ Code-Check SUCCESS 7.0min -
❌ UT-GO FAILURE 19.6min 926 passed
✅ UT-Integration SUCCESS 24.3min 46 passed
✅ UT-CPP-Cov SUCCESS 43.5min 7069 passed

Total: 66min | Pipeline | Artifacts

Failed Test Logs:

@aoiasd aoiasd force-pushed the file_resource_panic_3.0 branch from 5487d79 to 44956b1 Compare April 13, 2026 07:28
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results 44956b1

Stage Result Duration Tests
✅ Build SUCCESS 8.2min -
✅ Code-Check SUCCESS 5.4min -
❌ UT-GO FAILURE 17.3min 926 passed
✅ UT-Integration SUCCESS 23.9min 46 passed
✅ UT-CPP-Cov SUCCESS 36.7min 7069 passed

Total: 57min | Pipeline | Artifacts

Failed Test Logs:

@aoiasd aoiasd changed the title fix: bind file resource refCnt to collection lifecycle to prevent panic fix: [3.0] bind file resource refCnt to collection lifecycle to prevent panic Apr 14, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results 44956b1

Stage Result Duration Tests
✅ Build SUCCESS 8.4min -
✅ Code-Check SUCCESS 5.4min -
❌ UT-GO FAILURE 17.3min 926 passed
✅ UT-Integration SUCCESS 23.8min 46 passed
✅ UT-CPP-Cov SUCCESS 32.1min 7069 passed

Total: 57min | Pipeline | Artifacts

Failed Test Logs:

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results 44956b1

Stage Result Duration Tests
✅ Build SUCCESS 10.4min -
✅ Code-Check SUCCESS 6.2min -
❌ UT-GO FAILURE 17.6min 926 passed
✅ UT-Integration SUCCESS 24.4min 46 passed
✅ UT-CPP-Cov SUCCESS 44.2min 7069 passed

Total: 68min | Pipeline | Artifacts

Failed Test Logs:

@aoiasd aoiasd force-pushed the file_resource_panic_3.0 branch from 44956b1 to d92306e Compare April 16, 2026 02:39
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

✅ CI Loop Results d92306e

Stage Result Duration Tests
✅ Build SUCCESS 11.5min -
✅ Code-Check SUCCESS 8.1min -
✅ UT-GO SUCCESS 22.3min 926 passed
✅ UT-Integration SUCCESS 24.3min 46 passed
✅ UT-CPP-Cov SUCCESS 44.9min 7071 passed

Total: 69min | Pipeline | Artifacts

@mergify mergify Bot added the ci-passed label Apr 16, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results ceb10aa

Stage Result Duration Tests

Total: 9min | Pipeline | Artifacts

@aoiasd aoiasd force-pushed the file_resource_panic_3.0 branch from ceb10aa to 2241ded Compare April 16, 2026 08:20
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results 2241ded

Stage Result Duration Tests

Total: 8min | Pipeline | Artifacts

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results aee90f3

Stage Result Duration Tests

Total: 8min | Pipeline | Artifacts

@aoiasd aoiasd force-pushed the file_resource_panic_3.0 branch from aee90f3 to 79f4d90 Compare April 22, 2026 08:06
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[FAILED] PR #48893 not merged

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot
Copy link
Copy Markdown
Contributor

❌ CI Loop Results 79f4d90

Stage Result Duration Tests

Total: 13min | Pipeline | Artifacts

@sre-ci-robot sre-ci-robot removed the do-not-merge/need-merge-master-first any pr merge to release branch need to merge master first label Apr 23, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #48893 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

aoiasd and others added 4 commits April 23, 2026 11:24
Increment fileResourceRefCnt during validateSchema (when file resource
IDs are resolved), rather than in the async ack callback's AddCollection.
This closes the TOCTOU race window where RemoveFileResource could delete
a resource between validation and AddCollection, causing streaming node
to panic when creating the tokenizer.

On failure before Broadcast, refCnt is decremented immediately. On
restart, refCnt for pending broadcast tasks is recovered from etcd
before rootcoord becomes Healthy.

issue: milvus-io#48612

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
…nt panic (milvus-io#48893)

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
@aoiasd aoiasd force-pushed the file_resource_panic_3.0 branch from b51d657 to 4360bdf Compare April 23, 2026 03:24
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #48893 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@zhengbuqian zhengbuqian added this to the 3.0 milestone Apr 23, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

✅ CI Loop Results b51d657

Stage Result Duration Tests
✅ Build SUCCESS 8.2min -
✅ Code-Check SUCCESS 6.4min -
✅ UT-GO SUCCESS 20.8min 927 passed
✅ UT-Integration SUCCESS 24.5min 46 passed
✅ UT-CPP-Cov SUCCESS 44.3min 7113 passed

Total: 67min | Pipeline | Artifacts

@sre-ci-robot
Copy link
Copy Markdown
Contributor

✅ CI Loop Results 4360bdf

Stage Result Duration Tests
✅ Build SUCCESS 10.8min -
✅ Code-Check SUCCESS 6.4min -
✅ UT-GO SUCCESS 20.0min 927 passed
✅ UT-Integration SUCCESS 24.5min 46 passed
✅ UT-CPP-Cov SUCCESS 43.6min 7134 passed

Total: 69min | Pipeline | Artifacts

@mergify mergify Bot added the ci-passed label Apr 23, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aoiasd, zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhengbuqian zhengbuqian removed the do-not-merge/need-milestone generate by v2-label-manager label Apr 23, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #48893 merged to master

Use /refresh-label to update related check and label manually

@zhengbuqian
Copy link
Copy Markdown
Collaborator

/lgtm

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #48893 merged to master

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot merged commit 305d9e4 into milvus-io:3.0 Apr 23, 2026
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/L Denotes a PR that changes 100-499 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants