Is sparse attention implemented?

### 🚀 The feature, motivation and pitch

Is sparse attention implemented?

What I mean by sparse attention is that $q, k, v$ are dense, but the **attention mask** is represented in COO or CSR format, and most importantly the attention score matrix is not materialized in a dense form (which sometimes does not fit into VRAM).

I did searched on the PyG library and this library but did not find any. Correct me if there is an existing implementation.

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is sparse attention implemented? #458

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is sparse attention implemented? #458

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions