question regarding adding vision capabilities #565
Unanswered
arsalanrzp
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am building a Vision-Language Model (VLM) using BitNet b1.58 as the frozen text decoder, with a lightweight adapter module connecting a vision encoder to the decoder.
I have two questions regarding the choice of model variant:
For training: I am currently using
microsoft/bitnet-b1.58-2B-4T-bf16as the decoder backbone, keeping its weights frozen and only training the adapter. Is this the correct variant for this use case?For deployment: Once the adapter is trained, would it be safe to swap the bf16 decoder for
microsoft/bitnet-b1.58-2B-4T(the packed 1.58-bit variant) without retraining, given that both variants represent the same underlying model mathematically?Thank you for your time and for open-sourcing this work.
Beta Was this translation helpful? Give feedback.
All reactions