Commit 5f21cbb
committed
fix(compute): use dequant+cuBLAS for Q4_K when K%256!=0
Q4_K GEMV requires K to be a multiple of 256 (super-block size). For
models where hidden_size is not 256-aligned (e.g., Gemma3-1B with
hidden_size=1152, 1152%256=128), all Q4_K matmuls fell back to CPU.
Remove the hard k%256!=0 → CPU fallback. Instead, only use the GEMV
fast path when k%256==0, and fall through to the dequant+cuBLAS path
(DequantQ4KF32 + SgemmNT) for unaligned K. The dequant kernel handles
ceil(K/256) super-blocks, and cuBLAS handles any dimensions.1 parent d0d3a82 commit 5f21cbb
1 file changed
Lines changed: 5 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1439 | 1439 | | |
1440 | 1440 | | |
1441 | 1441 | | |
1442 | | - | |
1443 | | - | |
1444 | | - | |
1445 | | - | |
1446 | | - | |
1447 | 1442 | | |
1448 | 1443 | | |
1449 | 1444 | | |
| |||
1467 | 1462 | | |
1468 | 1463 | | |
1469 | 1464 | | |
1470 | | - | |
1471 | | - | |
| 1465 | + | |
| 1466 | + | |
1472 | 1467 | | |
1473 | 1468 | | |
1474 | 1469 | | |
| |||
1554 | 1549 | | |
1555 | 1550 | | |
1556 | 1551 | | |
1557 | | - | |
1558 | | - | |
1559 | | - | |
1560 | | - | |
1561 | | - | |
1562 | 1552 | | |
1563 | 1553 | | |
1564 | 1554 | | |
| |||
1589 | 1579 | | |
1590 | 1580 | | |
1591 | 1581 | | |
1592 | | - | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
1593 | 1585 | | |
1594 | 1586 | | |
1595 | 1587 | | |
| |||
0 commit comments