Skip to content

[2/N] add cuda imatrix support for custom RL model#377

Open
yiakwy-xpu-ml-framework-team wants to merge 1 commit into
antirez:mainfrom
yiakwy-xpu-ml-framework-team:add_cuda_imatrix_support
Open

[2/N] add cuda imatrix support for custom RL model#377
yiakwy-xpu-ml-framework-team wants to merge 1 commit into
antirez:mainfrom
yiakwy-xpu-ml-framework-team:add_cuda_imatrix_support

Conversation

@yiakwy-xpu-ml-framework-team

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Jun 10, 2026

Copy link
Copy Markdown

Introduction

Previously we imatrix was assumed to be tuned on Metal machine (yes we have one M3/2 Ultra) but it is handy to tune the file directly on Hopper platform to avoid network traffices for a 250 GB model.

This is a follow up of #368

Usage:

DS4_CUDA_IMATRIX_GPU_COLLECT=1 ./ds4 --backend cuda
-m gguf/DeepSeek-V4-Flash-Q4KExperts-F16HC-F16Compressor-F16Indexer-Q8Attn-Q8Shared-Q8Out-chat-v2-imatrix.gguf
--backend cuda
--imatrix-dataset gguf-tools/imatrix/dataset/rendered_prompts.txt
--imatrix-out gguf/DeepSeek-V4-Flash-chat-v2-routed-moe-ds4.dat \

截屏2026-06-10 03 35 50

Note DS4_CUDA_IMATRIX_GPU_COLLECT=0 , means we will use legacy imatrix logics without gpu acceleration to compute imatrix scores.

@yiakwy-xpu-ml-framework-team

Copy link
Copy Markdown
Author

@antirez I guess this is a useful feature , wish your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant