feat(index): share IVF partition scans across batch vector queries by sezruby · Pull Request #2 · sezruby/lance

sezruby · 2026-06-15T23:08:08Z

Implements #6822: extend batch vector queries to indexed/ANN search.

Summary

Batch vector search (#6821, PR lance-format#6828) made indexed multi-query search work by looping the full single-query plan once per query vector (re-opening the index and rebuilding the prefilter each time) and unioning the results. This PR makes the indexed/ANN path share index-level state across the batch: it reads each IVF partition's storage once and scores every query that probes it, with the prefilter built once and shared.

Approach

VectorIndex trait (rust/lance-index/src/vector.rs): add defaulted supports_batch_partition_search() and search_partitions_batch(...) (default returns not_supported), so non-IVF indices remain explicitly unsupported.
IVFIndex (rust/lance/src/index/vector/ivf/v2.rs): implement batch search for flat-style sub-indices (IVF_FLAT/PQ/SQ/RQ, i.e. supports_global_topk_heap()). Invert per-query partition lists, load each distinct partition once, accumulate one top-k heap per query, reusing accumulate_prepared_partition_search / global_heap_to_batch.
ANNIvfBatchExec (rust/lance/src/io/exec/knn.rs): ranks every query against the centroids, runs the shared-scan batch search per delta, merges per-query top-k across deltas, emits {query_index, _distance, _rowid} sorted by (query_index, _distance, _rowid).
Routing (rust/lance/src/dataset/scanner.rs): take the fast path only when it is provably equivalent to repeated single-query search (see below); otherwise fall back to the per-query loop. No behavior regression.

Cosine correctness

Each query vector is normalized independently (normalize_batch_query_for_index). Normalizing the concatenated batch key with a single global norm would scale each vector by a batch-composition-dependent factor and break equivalence with single-query search for cosine.

nprobes equivalence gate (correctness, not just perf)

The shared-scan path searches exactly minimum_nprobes partitions per query. The single-query path is adaptive: it applies a k-dependent early_pruning floor and then expands probes up to maximum_nprobes (late search) when a query has fewer than k results. These differ unless nprobes is fixed, so the fast path is gated to minimum_nprobes == maximum_nprobes (what nprobes=N sets) — provably identical to single-query. Adaptive nprobes falls back to the per-query loop, which reuses the real adaptive search and stays exact.

Measured before gating, an unpinned batch query diverged from repeated single-query on every query vector; after gating both the pinned (fast-path) and unpinned (fallback) cases match exactly.

Alternatives considered for adaptive nprobes (deferred follow-ups)

Batched early/late search — phase-1 early search across the shared partition loads, track each query's found-count, then a phase-2 pass that expands only the queries still short of k, loading any new partitions once. Fully general (keeps scan-sharing for adaptive nprobes) but reimplements the adaptive loop in batched form; largest surface, best saved for a follow-up.
Smarter fallback — run the existing adaptive per-query search but share the opened index + prefilter (no explicit scan-sharing; the IVF partition cache still serves overlapping partitions). Exact recall, simpler than the above; a cheap improvement to the current fallback.
Over-probe to maximum_nprobes — rejected: with the default maximum_nprobes = None this degenerates to scanning all partitions for every query, defeating IVF pruning.

Other known limitations (deferred)

HNSW batch sharing, batch refine_factor/reranking, and batch + unindexed-fragment combine fall back to the per-query loop.

Test plan

cargo test -p lance --lib test_batch_knn — 9 tests: plan shape, exact batch-vs-repeated-single equivalence (nprobes pinned so it is deterministic), cosine regression, shared prefilter, multi-delta cross-delta merge, and fallbacks for refine and adaptive nprobes.
cargo test -p lance --lib dataset::scanner::test::test_knn (29) and index::vector::ivf::v2 (88) — no regressions.
cargo fmt --all && cargo clippy -p lance -p lance-index --tests --benches -- -D warnings.
Python: uv run pytest python/tests/test_vector_index.py -k batch → 8 passed (L2 + cosine × three/single queries). ruff clean repo-wide; pyright clean on changed lines.
Benchmark (benchmarks/test_search.py): batch vs repeated-single ANN. Standalone timing (50k rows, dim 128, IVF_PQ 64 partitions, m=32, k=10, nprobes=10): 2.48× speedup.

Closes lance-format#6822

Extend batch vector search (lance-format#6821) to the indexed/ANN path so a single multi-query request reads each IVF partition's storage once and scores every query that probes it, instead of re-running a full single-query plan per vector and unioning the results (which re-opens the index and rebuilds the prefilter for each query). - Add `VectorIndex::search_partitions_batch` + `supports_batch_partition_search` (defaulted so non-IVF indices stay explicitly unsupported). - Implement them for `IVFIndex` with a flat-style sub-index (IVF_FLAT/PQ/SQ/RQ): load each distinct partition once and accumulate one top-k heap per query, sharing the prefilter across the whole batch. - Add `ANNIvfBatchExec`, which ranks every query against the centroids, runs the shared-scan batch search, merges per-query top-k across deltas, and emits `query_index`-tagged results; route to it from `Scanner::batch_indexed_vector_search` when the gate below holds. - Normalize each query vector independently for cosine (`normalize_batch_query_for_index`): normalizing the concatenated batch key with one global norm would scale each vector by a batch-composition-dependent factor and break equivalence with single-query search. The shared-scan fast path is gated to cases that are provably equivalent to repeated single-query search: fixed nprobes (`minimum_nprobes == maximum_nprobes`), no refine step, an IVF flat-style index, and fully-indexed fragments. With adaptive nprobes the single-query path applies an `early_pruning` floor and late-search expansion that the batch path does not, so those queries fall back to the per-query loop, which stays exact. HNSW, refine, and mixed indexed/unindexed scans also fall back. Tests: plan shape; exact batch-vs-repeated-single equivalence (nprobes pinned); cosine regression; shared prefilter; multi-delta cross-delta merge; and fallbacks for refine and adaptive nprobes. Python parametrized over L2 + cosine; a batch-vs-repeated-single ANN benchmark. Closes lance-format#6822 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added enhancement New feature or request A-index A-python labels Jun 15, 2026

sezruby force-pushed the knn-batch-6822 branch from 262e00e to dd01c55 Compare June 22, 2026 05:46

sezruby force-pushed the knn-batch-6822 branch from dd01c55 to 35e21ad Compare June 22, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(index): share IVF partition scans across batch vector queries#2

feat(index): share IVF partition scans across batch vector queries#2
sezruby wants to merge 1 commit into
mainfrom
knn-batch-6822

sezruby commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sezruby commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Cosine correctness

nprobes equivalence gate (correctness, not just perf)

Alternatives considered for adaptive nprobes (deferred follow-ups)

Other known limitations (deferred)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sezruby commented Jun 15, 2026 •

edited

Loading