🚀 The feature, motivation and pitch
Is sparse attention implemented?
What I mean by sparse attention is that $q, k, v$ are dense, but the attention mask is represented in COO or CSR format, and most importantly the attention score matrix is not materialized in a dense form (which sometimes does not fit into VRAM).
I did searched on the PyG library and this library but did not find any. Correct me if there is an existing implementation.
Alternatives
No response
Additional context
No response
🚀 The feature, motivation and pitch
Is sparse attention implemented?
What I mean by sparse attention is that$q, k, v$ are dense, but the attention mask is represented in COO or CSR format, and most importantly the attention score matrix is not materialized in a dense form (which sometimes does not fit into VRAM).
I did searched on the PyG library and this library but did not find any. Correct me if there is an existing implementation.
Alternatives
No response
Additional context
No response