feat: Add vectordb-compare app and fix benchmark measurement discrepancies#260
Closed
luisremis wants to merge 1 commit into
Closed
feat: Add vectordb-compare app and fix benchmark measurement discrepancies#260luisremis wants to merge 1 commit into
luisremis wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a new apps/vectordb-compare benchmark workflow that ingests, verifies, and runs KNN latency/throughput comparisons across multiple vector database engines (ApertureDB, Pinecone, Weaviate, Qdrant, LanceDB), with changes intended to make timing measurements more comparable across engines.
Changes:
- Adds a full benchmark app (Dockerfile/compose, config, ingestion, verification, KNN runners, dataset download tooling).
- Implements per-engine ingestion + verification modules with dynamic imports to allow partial dependency installation.
- Updates KNN workers so query payload extraction happens outside the timed block (to avoid measuring Python data generation overhead).
Reviewed changes
Copilot reviewed 35 out of 35 changed files in this pull request and generated 28 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/vectordb-compare/test.sh | CI-style docker test runner for the workflow |
| apps/vectordb-compare/requirements.txt | Python dependencies for benchmark/engines |
| apps/vectordb-compare/README.md | Usage and methodology documentation for the benchmark app |
| apps/vectordb-compare/Dockerfile | Container build for the workflow image |
| apps/vectordb-compare/compose.yml | Local docker compose runner |
| apps/vectordb-compare/.env.sample | Sample environment configuration |
| apps/vectordb-compare/.dockerignore | Docker ignore rules for local artifacts |
| apps/vectordb-compare/app/app.sh | Entrypoint script orchestrating download/ingest/verify/knn/plot |
| apps/vectordb-compare/app/config.py | Centralized config/env/arg parsing |
| apps/vectordb-compare/app/utils.py | Shared dataset loaders, connectors, and result helpers |
| apps/vectordb-compare/app/download_data.sh | S3 dataset download helper |
| apps/vectordb-compare/app/hdf5.py | HDF5 inspection helper script |
| apps/vectordb-compare/app/ingest.py | Main ingestion runner (dynamic engine loading + timing summary) |
| apps/vectordb-compare/app/ingest_base.py | Base ingestion engine utilities (sizes/datasets/timing) |
| apps/vectordb-compare/app/ingest_aperturedb.py | ApertureDB ingestion implementation |
| apps/vectordb-compare/app/ingest_pinecone.py | Pinecone ingestion implementation |
| apps/vectordb-compare/app/ingest_weaviate.py | Weaviate ingestion implementation |
| apps/vectordb-compare/app/ingest_qdrant.py | Qdrant ingestion implementation |
| apps/vectordb-compare/app/ingest_lancedb.py | LanceDB ingestion implementation |
| apps/vectordb-compare/app/verify.py | Main verification runner + timing summaries |
| apps/vectordb-compare/app/verify_base.py | Base verification engine utilities |
| apps/vectordb-compare/app/verify_aperturedb.py | ApertureDB ingestion verification |
| apps/vectordb-compare/app/verify_pinecone.py | Pinecone ingestion verification |
| apps/vectordb-compare/app/verify_weaviate.py | Weaviate ingestion verification |
| apps/vectordb-compare/app/verify_qdrant.py | Qdrant ingestion verification |
| apps/vectordb-compare/app/verify_lancedb.py | LanceDB ingestion verification |
| apps/vectordb-compare/app/knn.py | Main KNN benchmark runner (dynamic engine loading) |
| apps/vectordb-compare/app/knn_base.py | Base KNN engine + query generator logic |
| apps/vectordb-compare/app/knn_aperturedb.py | ApertureDB KNN implementation |
| apps/vectordb-compare/app/knn_pinecone.py | Pinecone KNN implementation |
| apps/vectordb-compare/app/knn_weaviate.py | Weaviate KNN implementation |
| apps/vectordb-compare/app/knn_qdrant.py | Qdrant KNN implementation |
| apps/vectordb-compare/app/knn_lancedb.py | LanceDB KNN implementation |
| apps/vectordb-compare/app/plot.py | Plot generation wrapper using dbeval |
| apps/vectordb-compare/app/test.sh | Local test helper script |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+58
to
+64
| if dataset == "deepimage96": | ||
| dataset_obj = u.DatasetDeepImage96(max=n_queries) | ||
| self.dataset = dataset_obj.test # Use test vectors for queries | ||
| elif dataset == "yfcc100m": | ||
| dataset_obj = u.DatasetYFCC100M(max=n_queries) | ||
| self.dataset = dataset_obj.test # Use test vectors for queries | ||
|
|
Comment on lines
+83
to
+92
| # Load query dataset - will now use TEST vectors via updated base class | ||
| if params.dataset == "deepimage96": | ||
| query_data = u.DatasetDeepImage96( | ||
| max=params.total_queries).test | ||
| elif params.dataset == "yfcc100m": | ||
| query_data = u.DatasetYFCC100M(max=params.total_queries).test | ||
| else: | ||
| raise ValueError(f"Unknown dataset: {params.dataset}") | ||
|
|
||
| print(f"Using {len(query_data)} test vectors as queries") |
Comment on lines
+27
to
+34
| def worker_pinecone(index, generator, namespace, knn_samples, start_index, end_index, times, results): | ||
| """Worker function for Pinecone threading.""" | ||
| for i in range(start_index, end_index + 1): | ||
| if i >= len(generator): | ||
| break | ||
|
|
||
| query_vector = generator[i] | ||
|
|
Comment on lines
+76
to
+85
| th_queue_size = len(generator) // c | ||
|
|
||
| index = pc.Index(index_name) | ||
| for i in range(c): | ||
| start_index = i * len(generator) // c | ||
| end_index = min( | ||
| start_index + th_queue_size, len(generator)) | ||
|
|
||
| t = threading.Thread(target=worker_pinecone, args=( | ||
| index, generator, namespace, params.knn_samples, start_index, end_index, times, results)) |
Comment on lines
+27
to
+34
| def worker_qdrant(client, generator, collection_name, knn_samples, start_index, end_index, times, results): | ||
| """Worker function for Qdrant threading.""" | ||
| for i in range(start_index, end_index + 1): | ||
| if i >= len(generator): | ||
| break | ||
|
|
||
| query_vector = generator[i] | ||
|
|
Comment on lines
+56
to
+59
| - **`ingest.py`** - Data ingestion pipeline with engine selection | ||
| - **`verify_ingestion.py`** - Verify data was loaded correctly | ||
| - **`knn.py`** - KNN performance benchmarking | ||
| - **`utils.py`** - Shared utilities with dynamic imports |
|
|
||
| 1. **Data Ingestion**: | ||
| ```bash | ||
| python ingest.py -engines "adb,pc,wv,qd,ldb" -source "deepimage96" |
|
|
||
| 2. **Verify Ingestion**: | ||
| ```bash | ||
| python verify_ingestion.py -engines "adb,pc,wv,qd,ldb" -source "deepimage96" |
| | Variable | Description | Default | | ||
| |----------|-------------|---------| | ||
| | `ENGINES` | Comma-separated engine list | `"adb,pc,wv,qd,ldb"` | | ||
| | `SOURCE` | Dataset source | `"deepimage96"` | |
| #!/bin/bash | ||
| set -e | ||
|
|
||
| mkdir -p input |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR ports the
vectordb-compareapp from the internal workflows repository and fixes critical discrepancies in how the benchmarks measured database latency:Fixes
query_vector = generator[i]) happened inside the measuredstart_time-end_timeblock for most databases, and in the case of ApertureDB it was being evaluated twice per iteration. This PR moves the payload extraction outside thetime.time()block for all databases (ApertureDB, Pinecone, Qdrant, Weaviate), guaranteeing that the benchmark accurately measures database query latency, unpolluted by Python data generation overhead.recordsconstruction inside thetime.time()block.wait=Falsetowait=Trueon theupsertcall to ensure the benchmark measures synchronous persistence, making it fair when compared to Weaviate, Pinecone, and ApertureDB.These fixes make the performance results across all vectors databases truly comparable.