Skip to content

Add full-text search benchmark support#794

Open
jamesgao-jpg wants to merge 83 commits into
zilliztech:mainfrom
jamesgao-jpg:fts_impl_only
Open

Add full-text search benchmark support#794
jamesgao-jpg wants to merge 83 commits into
zilliztech:mainfrom
jamesgao-jpg:fts_impl_only

Conversation

@jamesgao-jpg

@jamesgao-jpg jamesgao-jpg commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Context

VDBBench did not have a dedicated native full-text search benchmark path. This PR adds FTS as a first-class benchmark workload so BM25-based text search can be evaluated through the same task, runner, and result pipeline used by the rest of VDBBench.

Summary

  • Add full-text search benchmark support centered on BM25 text retrieval.
  • Introduce FTS benchmark cases that load text documents, run text queries, and report comparable performance results.
  • Wire FTS through backend execution and Streamlit task generation so cases can be launched from the existing benchmark interfaces.

Backends Covered

  • Milvus: native BM25 full-text indexing/search configuration and execution path.
  • ElasticCloud / Elasticsearch: BM25 text indexing/search path using Elasticsearch defaults.
  • Vespa: text schema/query path for BM25-style FTS benchmarking.
  • Turbopuffer: namespace-based full-text benchmark path.

Testing Infra Touched

  • Dataset layer: add FTS dataset definitions and ir_datasets-based document/query/ground-truth loading.
  • Case layer: add FTS performance case definitions and task assembly support.
  • Runner layer: support FTS document loading plus serial and concurrent text-query search execution.
  • Frontend layer: expose FTS cases and generate backend-specific FTS task configs from Streamlit.
  • Result layer: format FTS benchmark outputs alongside existing VDBBench results.

Datasets Supported

  • MS MARCO: small 100K, medium 1M, large 8.8M documents.
  • HotpotQA: small 100K, medium 1M, large 5.2M documents.

Metrics

  • Search metric type: BM25.
  • Accuracy metric: recall@k against dataset relevance labels.
  • Performance metrics: serial latency p95/p99, concurrent QPS, load duration, and optimize duration.

Denise2004 and others added 30 commits June 1, 2026 04:00
Co-authored-by: zilliz <zilliz@zillizdeMacBook-Pro.local>
@sre-ci-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jamesgao-jpg
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jamesgao-jpg jamesgao-jpg changed the title Add full-text search benchmark support [WIP] Add full-text search benchmark support Jun 5, 2026
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
@jamesgao-jpg jamesgao-jpg changed the title [WIP] Add full-text search benchmark support Add full-text search benchmark support Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants