Skip to content

Auto select CAGRA build algorithm for hnsw::build#1719

Merged
rapids-bot[bot] merged 27 commits into
NVIDIA:mainfrom
tfeher:auto_selec_cagra_build
Jun 22, 2026
Merged

Auto select CAGRA build algorithm for hnsw::build#1719
rapids-bot[bot] merged 27 commits into
NVIDIA:mainfrom
tfeher:auto_selec_cagra_build

Conversation

@tfeher

@tfeher tfeher commented Jan 21, 2026

Copy link
Copy Markdown
Contributor

Configuring HNSW graph build using CAGRA is complicated, because CAGRA offers multiple build algorithms. This PR implements an automatic algorithm selection. The goal is to have a simplified API, where the user needs to set only two parameters that control graph size and quality (M and ef_construction respectively). This shall be familiar for HNSW users, and allows easier adaption of cuvs accelerated HNSW graph building.

  hnsw::index_params params;
  params.M               = 24;
  params.ef_construction = 200;
  params.hierarchy       = cuvs::neighbors::hnsw::HnswHierarchy::GPU;

  auto hnsw_index = hnsw::build(res, params, dataset_host_view);
  cuvs::neighbors::hnsw::serialize(res, "hnsw_index.bin", *hnsw_index);

If we have enough memory (host and GPU) to do both the KNN graph building and optimization in memory, then we choose in memory build, and let cagra::index_params::from_hnsw_params derive the additional configuration parameters.

If the build would require more memory then available, then we choose ACE method and let the number of partitions derived using #1603.

For host we query the os for available memory, for GPU it is assumed that the whole device memory is available.

@copy-pr-bot

copy-pr-bot Bot commented Jan 21, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@tfeher tfeher force-pushed the auto_selec_cagra_build branch from bb78635 to 23a0b16 Compare January 21, 2026 17:43
@tfeher tfeher removed request for a team January 21, 2026 17:46
@tfeher tfeher added breaking Introduces a breaking change improvement Improves an existing functionality labels Jan 21, 2026
Comment thread cpp/src/neighbors/detail/cagra/cagra_helpers.cpp Outdated
@tfeher tfeher requested a review from mfoerste4 January 21, 2026 17:53
Comment thread examples/cpp/src/hnsw_openai_example.cu Outdated
Comment thread examples/cpp/src/hnsw_openai_example.cu

@mfoerste4 mfoerste4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not go over all memory estimates in detail but suggest to align predictions with real data.

Is autotuning of ACE params part of a different PR? Besides the open question on the file location we might want to at least set the number of partitions dynamically.

raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_));
}

auto dataset_view = raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the data expected to always reside in host memory?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACE only supports host memory right now. The main reasons is that we expect the data size to be large and memory-mapped. Further, we do the partitioning and reordering on the host since there is no benefit of moving it to the GPU only to write it to disk afterwards.

Anyways, I think we can support device datasets easily since these should not end up using ACE with this heuristic. @tfeher What do you think?

Comment thread cpp/include/cuvs/neighbors/cagra.hpp Outdated
Comment on lines +100 to +101
// ACE build and search example.
cagra_build_search_ace(res);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we want to rename this to something generic now that the selection is hidden from the user.

Comment thread examples/cpp/src/hnsw_openai_example.cu
Comment thread cpp/src/neighbors/detail/hnsw.hpp

@julianmi julianmi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get a chance to fully review the memory heuristics yet. I wonder how we can test it though. Should max_host_memory_gb and max_gpu_memory_gb be optional HNSW parameters that we could use to test that the expected algorithm is used based on memory limits set?

Comment thread cpp/src/neighbors/ivf_pq_index.cu Outdated
Comment thread cpp/src/neighbors/detail/hnsw.hpp Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_helpers.cpp Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_helpers.cpp Outdated
Comment thread cpp/include/cuvs/neighbors/ivf_pq.hpp Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_helpers.cpp Outdated
@cjnolet cjnolet moved this from Todo to In Progress in Unstructured Data Processing Mar 24, 2026
@achirkin achirkin requested a review from a team as a code owner March 30, 2026 14:40
@achirkin achirkin requested a review from msarahan March 30, 2026 14:40
@tfeher tfeher changed the title Auto select CAGRA build algorithom for hnsw::build Auto select CAGRA build algorithm for hnsw::build Mar 31, 2026
tfeher and others added 3 commits May 21, 2026 10:11
Comment thread examples/cpp/CMakeLists.txt Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh
Comment thread python/cuvs/cuvs/tests/test_hnsw_ace.py Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated

@julianmi julianmi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mfoerste4. These are great improvements!

Comment thread cpp/src/neighbors/detail/cagra/cagra_helpers.cpp Outdated
Comment thread examples/build.sh
Comment thread cpp/bench/ann/src/cuvs/cuvs_cagra_hnswlib.cu Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_helpers.cpp
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
Comment thread cpp/src/neighbors/detail/hnsw.hpp Outdated
Comment thread cpp/src/neighbors/detail/hnsw.hpp Outdated
Comment thread cpp/src/neighbors/detail/hnsw.hpp
Comment thread cpp/src/neighbors/detail/hnsw.hpp
Comment thread examples/cpp/CMakeLists.txt Outdated
@julianmi

julianmi commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

I added 2GB static memory consumption for both GPU & Host. This is probably too conservative given that we already reduce the available usage to 80% of the actual value. Especially on small host memory machines this increases the number of partitions drastically. We might want to get rid of one of the limits. On the GPU I would prefer the constant 2GB (e.g. for workspace memory), but on the host side the percentage seems more natural. What do you think?

I agree that the host might stick to the percentage. We have much less control over the host which has many other processes running. Also, high memory pressure reduces the performance significantly. We might want to test with increasing the limit to 90% given the added 2 GB static memory.

Comment thread cpp/src/neighbors/detail/hnsw.hpp Outdated

@achirkin achirkin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A few small suggestions below.

Comment thread cpp/include/cuvs/neighbors/cagra.hpp Outdated
Comment thread cpp/include/cuvs/neighbors/cagra.hpp Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh
Comment thread cpp/src/neighbors/detail/cagra/graph_core.cuh Outdated
Comment thread cpp/src/neighbors/detail/hnsw.hpp Outdated

@tfeher tfeher left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed few smaller issues.

Comment thread cpp/src/neighbors/detail/hnsw.hpp
Comment thread cpp/src/neighbors/ivf_pq_index.cu
Comment thread cpp/src/neighbors/ivf_pq_index.cu
Comment thread examples/cpp/CMakeLists.txt Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_helpers.cpp Outdated

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small request, otherwise looks good

Comment thread examples/cpp/CMakeLists.txt
Comment thread examples/cpp/CMakeLists.txt

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as I will be out for the first few days of next week and don't want to hold this up. Please address my above comment before merging.

@achirkin achirkin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! lgtm

@tfeher

tfeher commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 0b090ba into NVIDIA:main Jun 22, 2026
272 of 283 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Introduces a breaking change improvement Improves an existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants