question regarding adding vision capabilities #565

arsalanrzp · 2026-06-01T09:37:33Z

arsalanrzp
Jun 1, 2026

Hi,

I am building a Vision-Language Model (VLM) using BitNet b1.58 as the frozen text decoder, with a lightweight adapter module connecting a vision encoder to the decoder.

I have two questions regarding the choice of model variant:

For training: I am currently using microsoft/bitnet-b1.58-2B-4T-bf16 as the decoder backbone, keeping its weights frozen and only training the adapter. Is this the correct variant for this use case?
For deployment: Once the adapter is trained, would it be safe to swap the bf16 decoder for microsoft/bitnet-b1.58-2B-4T (the packed 1.58-bit variant) without retraining, given that both variants represent the same underlying model mathematically?

Thank you for your time and for open-sourcing this work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question regarding adding vision capabilities #565

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

question regarding adding vision capabilities #565

Uh oh!

arsalanrzp Jun 1, 2026

Replies: 0 comments

arsalanrzp
Jun 1, 2026